Ads pop out when we are using the Internet. We want to know if that has valuable information for us. That inspires us to do a text classification to determine if one is news or not.
What it does
Our algorithm trains 80% of the Brown corpus (contains 500 texts, news and non-news) for text classification. And test on the rest 20% for accuracy.
How we built it
We implement Naive Bayes Classifier in natural language toolkit to train our designed features
Challenges we ran into
Initial training wasn't good enough. We found out that second-round training based on first-round result improves accuracy.
Accomplishments that we're proud of
It reaches 97% accuracy!
What we learned
Team work rocks!
What's next for News or Not
It has tons of application, like spam filter, author classification...