from stackoverflow, I think it will be great to have a model that can detect if text is truly asking for help or just information text in same filed
What it does
it take a text or array of text and returns type is a question (help needed) or not
How I built it
I have use the reddit open data the post in help subjects and all other data of no help asking, I clean the data, I just toke lenght of 100 to 2000 char text, remove all punctuation except ? and ! also all digital.
Challenges I ran into
the challenge I ran into the big one is the data, spcifec data for this problem, the scond one it was my laptop it's taking so long to train and produce the result (2.2GH in CPU, 8Gb in ram), i have tryed to use the pertraind models but the are so large, so the solution is to train my model first i used all the data i have 10Gb :( not good , i break the data into samples and use them to train my model and pick the profect one.
Accomplishments that I'm proud of
know my model it's working fine, and the best part it can detect result just with 5 words, it base in words like "what, help, problem, ..)
What I learned
I learned that truly the data science it's about the data and how to collect them and clean it the 80% of work it's true data cleaning and collecting
What's next for classifie texts (help needed or not)
the next step now it's to cluster the same text or to find the text that tooks about the same prblematique, in a social platforme it's can use to recommend user with the same problem to the user how post the question for the can colab to find a solution.