Aristo

Inspiration

Self-studying can often feel like a solitary and passive process. We read textbooks and highlight notes, but we frequently fall into the illusion of competence—thinking we know a topic just because we read it. However, educational psychology and the Feynman Technique tell us something different: explaining what you've learned is vastly more effective than studying alone or passively reading.

We thought, "Wouldn't it be great to have an AI that helps you self-study not by giving you the answers, but by letting you explain things to it?" We wanted to build a proactive study partner—an AI that listens to your explanations, catches your misconceptions, and asks probing questions to solidify your understanding. That idea became the foundation of Aristo.

What it does

Aristo is an intelligent, conversational self-study companion designed to flip the traditional learning dynamic.

Instead of acting as a simple search engine, Aristo allows users to upload their own study materials (such as large PDF textbooks or notes). The system parses these documents and organizes the knowledge into structured learning paths (Exam sets, Sections, and Nodes).

During a study session, the user takes the role of the teacher. As the user explains the concepts, Aristo uses Retrieval-Augmented Generation (RAG) based on the uploaded materials to evaluate the user's explanation, correct any inaccuracies, and ask follow-up questions. It turns solitary self-study into an active, engaging, and highly effective dialogue.

How we built it

We built Aristo using a modern, multi-tier architecture to handle both robust application logic and heavy AI processing:

Backend (Node.js/Express): We implemented a structured layered architecture (Routes - Controller - Service - Repository) to manage users and study sessions.

Database & Authentication (Firebase/Firestore): We utilized Firebase for reliable NoSQL data persistence and integrated Google Login for seamless user access.

AI Engine (Python): We built a dedicated Python AI Server. It handles the extraction of text from uploaded PDFs (chunking data into 20-page increments to manage memory) and runs the RAG pipeline.

Infrastructure: The entire ecosystem is deployed on Google Cloud, ensuring scalable communication between the Node.js backend and the Python AI server.

Challenges we ran into

Building a dual-backend system with complex data relationships in a NoSQL environment brought several challenges:

Complex Data Relationships in Firestore: Mapping our hierarchical study structures \( \text{Exam Sets} \rightarrow \text{Roots} \rightarrow \text{Sections} \rightarrow \text{Nodes} \) required careful transaction management. For example, ensuring safe cascading deletes without violating logical foreign key dependencies took significant debugging and refinement. Document Processing Pipeline: Parsing large, complex PDFs is notoriously difficult. We had to engineer a reliable method to extract raw data and merge text files efficiently in batches to prevent server timeouts. Testing and Reliability: Refactoring our database schema to Firebase meant our service layer had to be thoroughly rewritten. Writing comprehensive Jest test suites for our Auth, RAG, and Session APIs was challenging but necessary to ensure system stability.

Accomplishments that we're proud of

Seamless Google Cloud Integration: We successfully bridged a Node.js API server and a Python ML server, getting them to communicate flawlessly in a deployed cloud environment. Robust Database Architecture: Overcoming Firestore upload bugs and mastering NoSQL transaction logic (especially the cascading deletion logic) taught us how to build truly resilient backend systems. The "Explain-to-Learn" AI Model: We are incredibly proud of shifting the AI from a traditional "question-answering" bot to an interactive partner that actually evaluates user comprehension based on customized RAG context.

What we learned

We learned that prompting an AI to act as a listener and an evaluator is fundamentally different from building a standard chatbot; it requires highly specific prompt engineering and clean vector embeddings. On the backend, we deepened our understanding of MVC-style layered architectures and how to effectively manage asynchronous database transactions and mock testing environments using Jest.

What's next for Aristo

Patent Exploration: We are currently analyzing the unique technical descriptions and algorithms of our RAG pipeline and study-structure generation to explore potential intellectual property protection. Voice Integration: We want to add Speech-to-Text (STT) capabilities so users can literally talk out loud to Aristo, making the "explain-to-learn" experience even more natural. Adaptive Learning Curves: Enhancing the AI to automatically track a user's mastery over time, using spaced repetition algorithms to revisit concepts they struggled to explain in the past.