Inspiration

Online question-and-answer platforms contain millions of questions, yet many receive no responses at all.
Instead of studying popularity or engagement, this project focuses on silence — questions that are asked but never answered.

The idea was inspired by the belief that absence itself can be a signal, and that unanswered questions can reveal hidden patterns about clarity, structure, and participation in online knowledge-sharing communities.


What it does

Unanswered but Asked analyzes questions from a public StackExchange dataset and uses machine learning to predict whether a question is likely to remain unanswered.

The project studies structural characteristics such as:

  • Question length
  • Level of detail in the description
  • Number of tags used

It then models how these factors influence the probability of silence.


How we built it

  • Used a public StackExchange data dump from the 3D Printing community
  • Parsed raw XML data to extract only question posts
  • Labeled questions as answered or unanswered
  • Cleaned HTML content from titles and bodies
  • Engineered simple numeric features such as word counts and tag counts
  • Trained a Logistic Regression model for interpretability
  • Evaluated performance using precision, recall, and F1-score

All analysis was performed in Python using Jupyter notebooks.


Challenges we ran into

  • Processing large XML files efficiently without exhausting memory
  • Cleaning noisy HTML content into usable plain text
  • Handling class imbalance between answered and unanswered questions
  • Keeping the model simple while still extracting meaningful insights

Accomplishments that we're proud of

  • Successfully modeling silence as a measurable outcome
  • Demonstrating that unanswered questions are not random
  • Extracting interpretable insights using a simple machine learning model
  • Completing the full data-to-insight pipeline within a short hackathon timeframe

What we learned

  • Absence of interaction can be as informative as interaction itself
  • Simple features often reveal strong signals before complex models are needed
  • Interpretability is critical when telling a data story
  • Real-world datasets require careful cleaning and thoughtful assumptions

What's next for Unanswered but Asked

  • Extend the analysis across multiple StackExchange communities
  • Incorporate text-based features such as TF-IDF
  • Analyze time-based effects like posting hour and day
  • Build a lightweight interface to evaluate new questions

Built With

Share this project:

Updates