Ads pop out when we are using the Internet. We want to know if that has valuable information for us. That inspires us to do a text classification to determine if one is news or not.

What it does

Our algorithm trains 80% of the Brown corpus (contains 500 texts, news and non-news) for text classification. And test on the rest 20% for accuracy.

How we built it

We implement Naive Bayes Classifier in natural language toolkit to train our designed features

Challenges we ran into

Initial training wasn't good enough. We found out that second-round training based on first-round result improves accuracy.

Accomplishments that we're proud of

It reaches 97% accuracy!

What we learned

Team work rocks!

What's next for News or Not

It has tons of application, like spam filter, author classification...

Share this project: