Inspiration
Sincerely, Mira AI Tutor was a result of sheer academic desperation. During my 2nd year of college, overwhelmed by thick theory subjects at 2 AM, I was totally lost attempting to understand hard stuff. I had questions, but no one to ask them to—my professor was asleep, and my classmates were just as stuck.
That loneliness stung. I didn't merely want to be reading the words—I wanted to actually hear them. I recall thinking, "What if I could speak with someone who truly understood this paper, perhaps even the author?" That's when the concept hit me.
Mira wasn't created out of a market research; it was born out of a real need—a tool that would dissect any document and have a substantial, human-like discussion about it. Putting a face and voice to it made it less of a chatbot and more of a study buddy. Not only a tool, but a study buddy for those long, solitary late-night learning sessions.
- Upload: A user logs in and uploads any document, including a research paper, a textbook chapter, or a legal agreement.
- Process: Our platform processes the document, applying Retrieval-Augmented Generation (RAG) to develop a vectorized knowledge base from its text.
- Personalize: It then creates a personalized AI persona—a digital tutor—that possesses expert-level knowledge of the document that has been uploaded.
- Interact: The user may then interact with "Mira," this AI tutor. They may pose complicated questions, ask for summaries, and debate complex subjects from the document using both a normal text chat and an innovative video chat feature, where an AI-generated avatar of Mira delivers the responses.
The platform retains session history, enabling learners to continue learning where they last stopped.
How we built it
We developed Mira AI Tutor with a contemporary, full-stack design, incorporating numerous state-of-the-art technologies:
- Frontend: Dynamic and responsive user interface constructed with Next.js, React, TypeScript, and styled using Tailwind CSS and shadcn/ui components.
- Backend: A robust and scalable backend service constructed with Python and the FastAPI framework, all containerized with Docker for smooth deployment and scalability.
- Authentication: Smooth and secure user management is achieved with Clerk, responsible for user sign-up, sign-in, and session management.
- AI and Machine Learning:
- AI Persona & Video: The central video persona is fueled by the Tavus AI API, which produces natural-sounding talking head videos from text, bestowing Mira with a face and voice.
- Document Intelligence: We used a Retrieval-Augmented Generation (RAG) pipeline to give the AI a rich understanding of the uploaded documents. This includes employing a vector database to cache document embeddings and an LLM to create contextually-informed answers.
- Database: We utilize SQLite for storing application data such as user data and chat history.
Challenges we encountered
- Real-Time Video Generation Latency: It was difficult to integrate the AI video generation. The operation is asynchronous and may take some time. We needed to create a system that polled for the status of the video and showed it to the user when complete, without blocking the UI.
- State Management Complexity: The frontend had to keep a lot of dynamic state up-to-date: user authentication state from Clerk, uploaded document metadata, list of documents available, live chat messages, and the state of the video generation currently. Multitasking this in a neat manner was a big challenge.
- Integrating Multiple Services Securely: Coordinating the secure passing of data between our FastAPI backend, Clerk for authentication, Next.js frontend, and the Tavus AI API was problematic. We established a token-based auth scheme in which JWTs from Clerk are forwarded to our backend for securely authorizing API requests.
Things we're proud of accomplishing
- A New User Experience: We were able to effectively develop a tool that transcends typical chatbots. The AI-driven video personality is something new and special that greatly enhances the learning experience and makes it much more personal.
- Smooth Full-Stack Integration: We are happy to have effectively knitted together a colorful stack of cutting-edge technologies (Next.js, FastAPI, Clerk, Tavus) into one, unified, and functional product.
- Personalized AI, Dynamic: The app's central feature—dynamically generating an intelligent and interactive AI tutor from any document a user uploads—is functional and offers real value.
- End-to-End Application: We developed a comprehensive, production-quality application, from secure authentication of the user through to the final interactive chat interface.
What we learned
- Multimodal AI Power: Merging language models (for comprehension) with generative video AI (for engagement) may bring forth strong and innovative applications that were impossible to conceive before.
- Contemporary Security Practices: We acquired extensive experience in deploying contemporary authentication trends, in particular, how to handle and transmit JWTs from a frontend service such as Clerk to a backend API to secure resources.
- Managing Asynchronicity in AI: We learned the hard way the pitfalls and workarounds in creating user-friendly interfaces over AI services that take considerable, non-trivial processing time.
What's next for Mira Ai Tutor
We're looking forward to the future and have numerous features in store:
- Increased Document Support: Support more file types, such as web pages, PowerPoint documents, and audio files.
- More Memory: Enrich the AI's memory to retain context of prior conversations on various documents.
- Collaborative Learning: Implement multi-user sessions wherein a study group or team can simultaneously interact with the same document tutor.
- Customizable Avatars: Make it possible for users to select the looks and voice of their tutor to better customize their learning experience.
Built With
- clerk
- docker
- fastapi
- langchain
- next.js
- python
- rag
- sqlite
- tavus
- typescript
Log in or sign up for Devpost to join the conversation.