Here's a comprehensive "About the Project" section for your hackathon submission:

About the Project

Inspiration

The inspiration came from a simple observation: students struggling academically often don't know how to improve, even when they're motivated to do better. Traditional academic advising relies on general advice, but what if we could find students with similar learning styles who actually succeeded and analyze exactly what they did differently?

This led to the core question: Can we use data science to identify specific, actionable changes that historically work for students like you?

The power of graph databases to model complex relationships between students, combined with modern AI's ability to generate personalized insights, presented an opportunity to create something more sophisticated than typical recommendation systems.

What We Learned

Technical Discoveries

  • Graph databases excel at similarity analysis - Neo4j's relationship-first approach made finding student patterns intuitive compared to traditional SQL joins
  • Local AI models are practical - DialoGPT provided contextual recommendations without external API dependencies or privacy concerns
  • Synthetic data can be surprisingly realistic - Our generated dataset with 9,920+ relationships created believable academic scenarios for testing

Data Science Insights

  • Learning style similarity ≠ performance similarity - Students with identical learning preferences can have vastly different outcomes
  • Small performance gaps matter - Even 0.3 GPA differences often reflect meaningful behavioral patterns
  • Peer analysis beats general advice - Recommendations based on actual similar students felt more actionable than generic study tips

System Architecture Lessons

  • Flask + React + Neo4j integration required careful attention to data flow and error handling
  • Local AI processing introduced latency considerations but provided complete control over recommendations
  • Responsive design became crucial when testing on different devices during development

How We Built It

Phase 1: Data Foundation

We started with Neo4j as our core technology choice, recognizing that student relationships are inherently graph-based. The synthetic dataset generation involved:

# Core relationship modeling
MATCH (s1:Student)-[:SIMILAR_LEARNING_STYLE]->(s2:Student)
WHERE s1.learningStyle = s2.learningStyle 
AND similarity_score(s1, s2) > 0.7

This created 500 students with realistic course histories, learning preferences, and performance patterns across Computer Science and Biology programs.

Phase 2: AI Integration

We integrated Microsoft's DialoGPT for local text generation:

def generate_recommendations(student_context, peer_analysis):
    prompt = f"Study advice for {learning_style} learner: {context}"
    return ai_model.generate(prompt, max_length=150)

The AI processes structured data about performance gaps and generates contextual advice, with rule-based fallbacks ensuring reliability.

Phase 3: Full-Stack Implementation

  • Backend: Flask API with Neo4j driver integration and CORS support
  • Frontend: React dashboard with Recharts for data visualization
  • Architecture: RESTful API design enabling clean separation between data processing and presentation

Challenges Faced

Technical Hurdles

Neo4j Schema Mismatches: The generated dataset properties didn't always match our query expectations. Properties like difficulty and timeSpent were stored as mixed data types (booleans, strings, numbers), causing type errors in aggregation queries.

Solution: Created schema validation and type conversion functions, plus simplified queries that avoided problematic properties.

AI Model Memory Constraints: Initial attempts to use Llama2 failed due to the 8.4GB memory requirement exceeding available system RAM (8.1GB).

Solution: Switched to DialoGPT-small (much smaller footprint) and designed the system to be model-agnostic for future upgrades.

Frontend Build Conflicts: Tailwind CSS integration with Create React App caused module resolution errors.

Solution: Replaced Tailwind with custom CSS utility classes, maintaining the desired design while eliminating dependency conflicts.

Algorithm Challenges

Similarity Threshold Tuning: Finding the right balance for student similarity scores. Too low (< 0.5) returned irrelevant peers; too high (> 0.9) found too few matches.

Solution: Settled on 0.7 similarity threshold through empirical testing, with fallback logic for edge cases.

Performance Gap Analysis: Determining meaningful differences between target students and successful peers required careful statistical consideration.

Solution: Used 0.2+ GPA gaps and percentage-based comparisons rather than absolute differences.

Integration Complexity

Data Flow Coordination: Ensuring smooth communication between Neo4j → Python AI → Flask → React required careful error handling at each stage.

State Management: Managing loading states, error conditions, and data updates across the full stack while maintaining responsive user experience.

Statistical Foundation

The recommendation engine relies on comparative analysis

Where similarity thresholds and weighting factors were tuned based on synthetic data patterns to prioritize actionable insights over statistical significance.

Impact and Future Potential

This system demonstrates how graph databases can transform educational analytics. Rather than treating students as isolated data points, we model the complex web of academic relationships to generate insights that feel personal and actionable.

The architecture supports natural extensions: real course catalogs, actual student data (with privacy safeguards), integration with learning management systems, and more sophisticated AI models as they become available.

Most importantly, we built something that addresses a real problem - students want to improve but don't know how - using cutting-edge technology in a privacy-respecting, locally-processed approach that could scale to real institutional deployments.

Share this project:

Updates