Inspiration
Online question-and-answer platforms contain millions of questions, yet many receive no responses at all.
Instead of studying popularity or engagement, this project focuses on silence — questions that are asked but never answered.
The idea was inspired by the belief that absence itself can be a signal, and that unanswered questions can reveal hidden patterns about clarity, structure, and participation in online knowledge-sharing communities.
What it does
Unanswered but Asked analyzes questions from a public StackExchange dataset and uses machine learning to predict whether a question is likely to remain unanswered.
The project studies structural characteristics such as:
- Question length
- Level of detail in the description
- Number of tags used
It then models how these factors influence the probability of silence.
How we built it
- Used a public StackExchange data dump from the 3D Printing community
- Parsed raw XML data to extract only question posts
- Labeled questions as answered or unanswered
- Cleaned HTML content from titles and bodies
- Engineered simple numeric features such as word counts and tag counts
- Trained a Logistic Regression model for interpretability
- Evaluated performance using precision, recall, and F1-score
All analysis was performed in Python using Jupyter notebooks.
Challenges we ran into
- Processing large XML files efficiently without exhausting memory
- Cleaning noisy HTML content into usable plain text
- Handling class imbalance between answered and unanswered questions
- Keeping the model simple while still extracting meaningful insights
Accomplishments that we're proud of
- Successfully modeling silence as a measurable outcome
- Demonstrating that unanswered questions are not random
- Extracting interpretable insights using a simple machine learning model
- Completing the full data-to-insight pipeline within a short hackathon timeframe
What we learned
- Absence of interaction can be as informative as interaction itself
- Simple features often reveal strong signals before complex models are needed
- Interpretability is critical when telling a data story
- Real-world datasets require careful cleaning and thoughtful assumptions
What's next for Unanswered but Asked
- Extend the analysis across multiple StackExchange communities
- Incorporate text-based features such as TF-IDF
- Analyze time-based effects like posting hour and day
- Build a lightweight interface to evaluate new questions
Built With
- beautiful-soup
- jupyter
- notebook
- pandas
- python
- scikit-learn


Log in or sign up for Devpost to join the conversation.