StudyForge

landing page (dark mode)
landing page (dark mode)
landing page (dark mode)
landing page (dark mode)
homepage (dark mode)
dashboard (light mode)
sample question bank (light mode)
sample question analysis (light mode)
question bank (light mode)
quiz mode (light mode)
test mode (light mode)
quiz summary (light mode)
quiz summary with ai tutor (light mode)
flashcards (light mode)
wrong question notebook (light mode)
achievements (light mode)
analytics dashboard (light mode)
analytics dashboard (light mode)

Inspiration

It started with a frustrated friend. She was preparing for her midterm exams and complained that the sample questions her professor provided were useless - the difficulty level was way off, the question style didn't match what would actually be on the exam, and she felt completely unprepared despite studying for hours.

She tried using ChatGPT to generate practice questions, but it was painfully inefficient. She had to copy-paste her notes, carefully craft prompts, manually format the output, and the questions it generated were often too generic or missed the key concepts her professor emphasized. Other study websites like Quizlet or Anki required her to manually create every flashcard - they didn't offer AI generation that could match her professor's testing style.

I realized there was a gap: no tool existed that could analyze your study materials AND learn the specific style of questions you need. Not just generic quiz generation, but questions that match the difficulty, format, and focus areas of your actual exams.

The Kiroween "Frankenstein" theme pushed me further. I decided to stitch together technologies that seem incompatible: 1987 spaced repetition algorithms (SM-2 from SuperMemo), modern multi-agent AI (8 specialized agents working together), Web3 blockchain verification (Base L2 + IPFS for permanent achievement records), and RAG pipelines with semantic chunking. The result is a chimera of educational technology - each piece from a different era, but together they create something more powerful than any single approach.

What it does

StudyForge transforms any document into personalized study materials that match YOUR professor's testing style:

Core Features:

Document Upload & Processing: Upload PDFs, images, Word docs, or even handwritten notes. The AI extracts and understands the content.
Style-Matched Question Generation: Upload sample questions from your professor, and the Analysis Agent learns the style - difficulty level, question format, emphasis areas. All future questions match that style.
8 Specialized AI Agents:
- Controller Agent: Orchestrates the workflow
- Analysis Agent: Learns question styles from samples
- Generation Agent: Creates questions matching learned styles
- Grading Agent: Provides partial credit with detailed feedback
- Handwriting Agent: Extracts text from handwritten notes (GPT-4o Vision)
- Explanation Agent: Generates concept explanations when you're stuck
- Chapter Agent: Organizes documents into logical sections
- SLM Router: Routes simple tasks to cheaper models for cost optimization

Study Modes:

Quiz Mode: Multiple choice, true/false, short answer, written response - with AI grading and partial credit
Flashcard Mode: Spaced repetition scheduling using SM-2 algorithm for optimal retention
Notebook: Tracks your wrong answers for targeted review

Blockchain-Verified Achievements:

Earn badges for study milestones (accuracy streaks, volume, mastery)
Certificates stored permanently on IPFS
Proof anchored to Base L2 blockchain (~$0.001 per verification)
Verifiable forever - even if StudyForge disappears, your achievements persist

Analytics Dashboard:

Learning Score: AI-calculated metric based on accuracy, consistency, and improvement
Performance trends over time with interactive charts
Hardest questions identification for focused review
Category-by-category progress tracking

How I built it

Frontend:

React 18 with TypeScript for type safety
Vite for fast development builds
Tailwind CSS for responsive styling
ECharts for interactive analytics visualizations
Framer Motion for smooth animations
React Router for navigation

Backend:

FastAPI (Python 3.11) with async/await throughout
SQLAlchemy 2.0 with async support for database operations
PostgreSQL 15 with pgvector extension for semantic search
Alembic for database migrations
Pydantic 2.5 for request/response validation
Structlog for structured logging

AI Integration:

Multi-provider support: Claude (Anthropic), GPT-4o (OpenAI), Groq (Llama 3.3 70B), Moonshot
Intelligent routing: Simple tasks → cheap models, complex tasks → powerful models
RAG pipeline with semantic chunking (MC-Indexing/SCAN algorithms)
Vector embeddings for concept similarity matching

Blockchain:

Pinata for IPFS uploads (certificate storage)
Base L2 for on-chain anchoring (proof of achievement)
Self-send transactions with IPFS hash in data field (minimal gas cost)

Infrastructure:

Docker & Docker Compose for local development
AWS Elastic Beanstalk for backend hosting
AWS RDS for PostgreSQL database
AWS S3 for frontend hosting
Sentry for error tracking and performance monitoring

Kiro Integration:

12 steering documents guiding architecture decisions
11 automation hooks for testing, migrations, code quality
Extensive vibe coding sessions for rapid prototyping

Challenges I ran into

1. Multi-Agent Coordination Getting 8 specialized agents to work together without conflicts was harder than expected. The Controller Agent needed careful design to route tasks appropriately and combine results. I solved this with a clear interface contract defined in the steering docs.

2. Style Learning Accuracy The Analysis Agent needed to extract subtle patterns from sample questions - not just "multiple choice" but the specific way a professor phrases distractors, the depth of knowledge tested, the balance of recall vs. application questions. I iterated through many prompt versions before achieving reliable style extraction.

3. Handwriting Recognition GPT-4o Vision is powerful but inconsistent with messy handwriting. Mathematical notation, arrows, and diagrams were particularly challenging. I added preprocessing guidance and structured prompts to improve accuracy.

4. Blockchain Gas Optimization Initially, I considered minting NFTs for achievements, but gas costs were prohibitive for a study app. So I pivoted to a simpler approach: store the full certificate on IPFS (free), then anchor just the IPFS hash to Base L2 using a minimal self-send transaction (~$0.001). Same verifiability, 100x cheaper.

5. RAG Pipeline Tuning Finding the right chunk size and overlap for question generation was tricky. Too small and questions lack context; too large and the AI gets overwhelmed. I implemented adaptive chunking based on document structure (headers, paragraphs) rather than fixed character counts.

6. Cost Management AI API costs add up quickly. I implemented SLM (Small Language Model) routing - simple tasks like formatting or basic classification go to Groq's Llama 3.3 70B (fast and cheap), while complex reasoning tasks go to Claude or GPT-4o. This reduced the API costs by ~60%.

Accomplishments that I'm proud of

8 specialized AI agents working in harmony with clear interfaces
Style-matched question generation - questions that actually feel like your professor wrote them
Blockchain-verified achievements - first study platform with permanent, verifiable on-chain proof of learning
Handwriting recognition that handles real student notes, not just perfect text
< $0.001 per achievement verification cost on Base L2
23 Kiro configuration files (12 steering docs + 11 hooks) demonstrating deep platform integration
Production deployment on AWS with full monitoring and error tracking
60% cost reduction through intelligent SLM routing
Sub-second response times for most operations despite complex AI pipelines

What I learned

Steering docs are game-changers: Documenting my architecture patterns in .kiro/steering/ meant Kiro consistently generated code that fit my conventions. No more "that's not how I do it here" corrections.
Agent hooks save hours daily: The check-api-contracts hook alone probably prevented 20+ bugs by catching frontend-backend mismatches before they hit production.
Multi-agent > monolithic prompts: Breaking AI tasks into specialized agents (analysis, generation, grading) produced better results than one massive prompt trying to do everything.
Web3 + Education has real potential: Students genuinely care about verifiable credentials. The blockchain verification feature got the most excited reactions in user testing.
Small models punch above their weight: Groq's Llama 3.3 70B handles 80% of the workload at 10% of GPT-4 costs. Smart routing matters more than always using the biggest model.
Kiro accelerates iteration speed dramatically: What would have taken weeks of manual coding was accomplished in days through vibe coding sessions with proper steering context.

What's next for StudyForge

Browser-Based Local AI (Transformers.js/WebLLM): Run small models directly in the browser for offline study mode and zero-latency hints. Students could study on airplanes or in areas with poor connectivity.
Voice Quiz Mode: Take quizzes hands-free using Web Speech API. Perfect for studying while commuting, exercising, or doing chores.
NFT Achievement Badges: Mint actual NFTs on Base for legendary achievements. Share your verified accomplishments on social media or LinkedIn.
Collaborative Study Groups: Share categories with classmates, compete on leaderboards, and see aggregated analytics for your study group.
Professor Dashboard: Let professors upload their question bank style, then students automatically get style-matched practice questions.
Mobile App: React Native version for true mobile-first studying with push notification reminders for spaced repetition reviews.

Built With

alembic
amazon-web-services
anthropic-claude
aws-elastic-beanstalk
aws-rds
base-l2
docker
echarts
fastapi
framer-motion
groq
ipfs
openai-gpt-4o
pgvector
pinata
postgresql
pydantic
python
react
sentry
sqlalchemy
structlog
tailwind-css
typescript
vite

Updates

Tianbao Yang started this project — Dec 04, 2025 03:13 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.