Inspiration
Moving is one of the most stressful decisions people make. Yet comparing Connecticut towns means downloading dense 100+ page scanned budget PDFs, cross-referencing spreadsheets and trying to make sense of mill rates and department allocations all without any guidance. I wanted to fix that. What if you could just ask out loud "Is Wallingford good for my family?" and get a warm, honest, data-backed answer in seconds?
What it does
Penny is a real-time voice AI advisor that helps Connecticut residents compare towns. Users speak naturally to Penny, who responds instantly in voice while simultaneously rendering interactive charts and highlighting towns on a live CT map. Penny covers Wallingford, North Haven, and Cheshire each with a unique persona and answers questions like "Which town has the lowest tax?", "Compare education spending", or "What's my annual tax on a $500k home?" Penny can be interrupted mid-sentence and pivots instantly, making the conversation feel genuinely live and natural.
How we built it
Penny is built on Gemini Live API for real-time voice interaction with full duplex audio and interruption handling. All three CT town budget PDFs (2025-26) were processed using Gemini Vision, extracting structured JSON with department spending, mill rates, and key facts. This JSON is embedded directly into Penny's system prompt, no vector DB needed giving her instant, grounded access to all town data. Town avatars were generated using Gemini image generation. The frontend is built with Streamlit and Plotly for dynamic charts, deployed on Google Cloud Run. The backend uses Python and FastAPI with the Google GenAI SDK.
Challenges we ran into
The biggest challenge was getting Gemini Live API to return structured JSON alongside voice responses. I needed both audio output and chart data simultaneously. I solved this by enforcing a strict JSON response schema in the system prompt and parsing the text transcription in parallel with audio playback. CT town budget PDFs are all scanned documents with no machine-readable text, requiring Gemini Vision to extract structured data from raw images across 88+ pages per town.
Accomplishments that we're proud of
I'm proud that Penny never hallucinates every number she cites is pulled directly from real 2025-26 municipal budget documents. The interruption handling feels genuinely live users can cut Penny off mid-sentence and she pivots instantly without any awkward pause. I'm also proud of the end-to-end multimodal experience: voice in, voice out, charts rendering in sync, and a live CT map animating as towns are mentioned all working together seamlessly. Deployed and live on Google Cloud Run within 3 days.
What we learned
Gemini's large token context window is powerful enough to hold entire municipal budget datasets, no vector DB required at this scale. Enforcing structured JSON output from a Live API voice session requires careful system prompt engineering. Audio architecture for voice agents is fundamentally different on the server vs the browser moving audio handling to the client side via WebRTC is the right approach. Most importantly: a focused, well-executed idea beats a feature-heavy but buggy one every time.
What's next for Penny - Connecticut T Town Advisor
I plan to expand Penny to cover all 169 Connecticut towns making her the definitive voice AI for CT civic data. l'm planning add school performance data, crime statistics, and commute times alongside budget data for richer town comparisons. Long term, the architecture generalizes to any US municipality making government budget data accessible to every citizen through voice, not just those who know how to read a PDF.
Built With
- fastapi
- gemini-2.5-flash
- gemini-live-api
- google-cloud
- google-cloud-run
- google-genai-sdk
- plotly
- python
- streamlit
Log in or sign up for Devpost to join the conversation.