Inspiration - The goal was to solve real-world NLP problems using a simple and effective machine learning approach. This hackathon provided an opportunity to learn how text data can be processed and classified efficiently, even with minimal prior experience.

What it does - This project performs text classification for three different challenges:

  1. Disaster tweet classification
  2. Fake news detection
  3. Toxic comment classification

It takes text input and predicts the correct label using a trained machine learning model.

How we built it - We used TF-IDF vectorization to convert text into numerical features and trained a Logistic Regression model for classification. The data was cleaned by handling missing values, fixing encoding issues, and ensuring correct label formatting.

The same pipeline was applied across all three challenges for consistency and reproducibility.

Challenges we ran into - We faced several challenges such as encoding errors, missing values, inconsistent label formats, and parsing issues in datasets. Handling these real-world data problems was a key learning experience.

Accomplishments that we're proud of - We successfully built and submitted solutions for all three challenges using a simple yet effective approach. Completing the entire pipeline from data preprocessing to model prediction was a major achievement.

What we learned - We learned how to handle real-world text data, perform data cleaning, apply feature extraction techniques like TF-IDF, and train machine learning models for classification tasks.

What's next for NLP Classification using TF-IDF and Logistic Regression - In the future, we plan to improve the model using advanced techniques like deep learning (BERT, transformers) and optimize performance for better accuracy.

Share this project:

Updates