Cognita

Inspiration ✨

Intern onboarding often feels overwhelming. New hires are handed hundreds of pages of PDFs, manuals, and policies, and are expected to absorb them quickly. We’ve gone through this ourselves and realized that traditional onboarding is both time-consuming for HR and ineffective for interns. We wanted to build a system that transforms static documentation into personalized, interactive lessons that make learning faster and more engaging.

What does it do? 🤔

Cognita takes company documents and turns them into interactive learning experiences. Companies upload training materials, and interns can then ask questions about their roles or processes. Instead of a generic answer, Cognita generates a short, tailored lesson with multiple sections. Each section can include interactive components such as multiple choice questions, fill-in-the-blanks, code examples, drag-and-drop activities, or diagrams. Progress is tracked automatically so both interns and companies can see how knowledge is being absorbed.

How we built it 🏰

We designed Cognita around a RAG pipeline. PyMuPDF handles PDF extraction with text chunking that preserves context. We use ChromaDB and Gemini Embeddings for embedding storage and semantic search. The AI core is powered by Google’s Gemini 2.5 Pro, which generates structured JSON lessons through carefully engineered prompts. The frontend parses this JSON and renders each lesson as React components that manage state, track progress, and log interactions. Analytics are collected for each component and fed back to the system, enabling Cognita to generate targeted follow-up lessons when interns struggle with a concept.

Challenges we ran into 😠

Our biggest challenge was database state management. Each lesson contains multiple interactive components, and we needed a schema that could track progress in real time, not just right or wrong answers, but attempts, time spent, and completion status, while keeping state consistent across the frontend and backend. Synchronizing these updates proved difficult, since every user interaction had to feed back into the database without losing context. On top of that, we also had to refine prompts to prevent the AI Agent from giving overly short answers, implement retry logic, and experiment with PDF chunking strategies to preserve formatting while staying within token limits.

Accomplishments that we’re proud of 👏

We built a complete pipeline where documents are processed, lessons are generated, and adaptive feedback loops are maintained. The frontend renders lessons as clean, interactive modules rather than raw AI output. Every user interaction is logged, allowing the system to detect knowledge gaps and generate follow-up content automatically. This makes Cognita more than just a content delivery tool as it actively adapts to the learner.

What we learned ✏️

Building an adaptive system requires strong coordination between retrieval models, state tracking, and user analytics. We learned how to design schemas that can store lesson metadata, interaction logs, and progress in a consistent way. Prompt engineering was critical to ensure the AI generated structured, assessable lessons. The hardest part was creating a real-time state management system that tied frontend interactions back into the backend analytics, which in turn informed future lessons.

What’s next for Cognita 👣

We plan to extend support beyond PDFs to include PowerPoints, videos, and web pages, making it easier for companies to reuse all their training resources. Integrations with HR systems like Workday and BambooHR will help embed Cognita into existing workflows. Multi-language support is also on our roadmap to help global companies onboard interns in their native languages. In the long term, we want to add features like customizable certification systems and automated quiz generation for compliance and ongoing training.

Built With

chromadb
flask
gemini
langchain
mongodb
python
react
tailwind
typescript

Submitted to

VTHacks 13
- Winner VTHacks - 2nd Place Prize

Created by

Built the RAG pipeline and Gemini Agent with tool integrations, handling document chunking, retrieval, and structured lesson generation.

Saahas Pulivarthi
I wrote the frontend code and added the interactables, and designed the JSON format.

Timothy James Nickerson
Pedro Rabadan Ribeiro
Arda Serhatli