NLP Classification using TF-IDF and Logistic Regression

This project solves three NLP classification challenges using a simple and effective machine learning approach.

Comment

GitHub repository containing the complete NLP hackathon solution including code, README, and submission files for all three challenges.

Inspiration - The goal was to solve real-world NLP problems using a simple and effective machine learning approach. This hackathon provided an opportunity to learn how text data can be processed and classified efficiently, even with minimal prior experience.

What it does - This project performs text classification for three different challenges:

Disaster tweet classification
Fake news detection
Toxic comment classification

It takes text input and predicts the correct label using a trained machine learning model.

How we built it - We used TF-IDF vectorization to convert text into numerical features and trained a Logistic Regression model for classification. The data was cleaned by handling missing values, fixing encoding issues, and ensuring correct label formatting.

The same pipeline was applied across all three challenges for consistency and reproducibility.

Challenges we ran into - We faced several challenges such as encoding errors, missing values, inconsistent label formats, and parsing issues in datasets. Handling these real-world data problems was a key learning experience.

Accomplishments that we're proud of - We successfully built and submitted solutions for all three challenges using a simple yet effective approach. Completing the entire pipeline from data preprocessing to model prediction was a major achievement.

What we learned - We learned how to handle real-world text data, perform data cleaning, apply feature extraction techniques like TF-IDF, and train machine learning models for classification tasks.

What's next for NLP Classification using TF-IDF and Logistic Regression - In the future, we plan to improve the model using advanced techniques like deep learning (BERT, transformers) and optimize performance for better accuracy.

Built With

Updates

Hitesh Kureel started this project — Apr 25, 2026 07:28 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.