Inspiration

My teen son was researching materials online to take his driving test, which was very difficult, and AI often hallucinated rules from different states. Recognizing that 16 million Americans fail their driving test every year, often because they study generic materials instead of their specific state handbooks, I wanted to build a solution. Existing tools weren't financially aligned with actually helping users pass. I was inspired to create a personalized, hallucination-free AI coach that guarantees state-specific accuracy.

## What it does
CoDriver is an AI-powered mobile coach built on Gemini and Supabase that delivers state-specific micro-lessons, instant handbook answers, and adaptive quizzes. When a user selects their state, every feature in the app is grounded entirely in that state's official DMV handbook.
* **Ask the Coach:** A Gemini-powered chat interface that answers driving questions by citing exact handbook page numbers.
* **Micro-Lesson Cards:** TikTok-style swipeable lesson cards featuring AI narration.
* **Live Voice Coach:** A real-time Gemini Live voice session that acts as a personal tutor, narrating lessons and answering spoken questions.
* **Infinite Quiz Generator:** Generates unique, varied quiz questions on any topic, never repeating the same set twice.

## How I built it
I built CoDriver using a two-tier architecture: a React Native mobile frontend and a Python FastAPI backend.
* **Frontend:** Built with Expo, React Native, and TypeScript. I used `expo-av` and `expo-audio` to handle real-time audio playback and recording.
* **Backend:** A Python FastAPI application hosted on Google Cloud Run. It handles RAG retrieval, quiz generation, and WebSocket bridging to Gemini Live.
* **Data & RAG:** State handbooks are chunked (1,000 characters with 200-character overlap) and embedded using `gemini-embedding-001`. These are stored in Supabase's `pgvector`.
* **AI Models:** I used `gemini-2.0-flash` for chat, quiz generation, and user audio transcription. For the real-time voice coach, I utilized `gemini-2.5-flash-native-audio-preview` via the Gemini Live API.

## Challenges I ran into
* **Audio VAD Reliability:** The Gemini Live native audio model didn't reliably trigger Voice Activity Detection with batch audio blobs. I solved this by transcribing user audio with `gemini-2.0-flash` first and injecting the text transcript into the live session, adding minor latency but ensuring 100% reliability.
* **WebSocket Reconnect Loops:** The Gemini Live SDK's receive generator exhausts after each turn. Fixing this required careful connection management to prevent infinite reconnect loops when navigating away from the screen.
* **UI/UX Quirks:** Nested ScrollViews on Android pushed fixed-position elements (like the mic button) off the screen, requiring layout restructuring.
* **Ad Latency:** Loading AdMob rewards at display time caused blank screens, so I implemented a system to preload ads at module import time.

## Accomplishments that I'm proud of
I am incredibly proud of making state-specific RAG a hard technical guarantee. By injecting exact page-number metadata inline into the context, the AI accurately cites its sources and completely eliminates hallucinations. I'm also proud of successfully integrating Gemini Live's native audio model to transform a standard flashcard app into an interactive, empathetic personal tutor. Building a fully functional freemium mobile app with a complex real-time audio pipeline in a short timeframe is a massive accomplishment.

## What I learned
* **Chunking is Everything:** A RAG pipeline is only as good as its chunking strategy. Getting the chunks right was far more impactful than tweaking the LLM prompt.
* **Voice is an Interface, Not a Feature:** Adding Gemini Live narration didn't just add audio; it fundamentally changed the user experience to feel like a one-on-one coaching session.
* **Specificity Wins:** State specificity became the core value proposition. "This is what YOUR state says" builds far more user trust than generic advice.
* **Sequence Matters:** Shipping the difficult RAG grounding pipeline first was critical, as the quizzes, chat, and lessons all depended on its accuracy.

## What's next for Codriver Agent
I plan to refine the audio architecture to achieve true native audio-to-audio streaming once VAD handling improves, eliminating the transcription delay. Currently the RAG database has Washington State Handbook. I also aim to expand the RAG database to cover all 50 states, implement fallback web-scrapers for when state DMVs break their PDF URLs, and refine the premium subscription model to help more first-time drivers pass their tests confidently and safely.

Built With

Share this project:

Updates