Penny - Connecticut's Town Advisor

Architecture
GC build logs

Inspiration

Moving is one of the most stressful decisions people make. Yet comparing Connecticut towns means downloading dense 100+ page scanned budget PDFs, cross-referencing spreadsheets and trying to make sense of mill rates and department allocations all without any guidance. I wanted to fix that. What if you could just ask out loud "Is Wallingford good for my family?" and get a warm, honest, data-backed answer in seconds?

What it does

Penny is a real-time voice AI advisor that helps Connecticut residents compare towns. Users speak naturally to Penny, who responds instantly in voice while simultaneously rendering interactive charts and highlighting towns on a live CT map. Penny covers Wallingford, North Haven, and Cheshire each with a unique persona and answers questions like "Which town has the lowest tax?", "Compare education spending", or "What's my annual tax on a $500k home?" Penny can be interrupted mid-sentence and pivots instantly, making the conversation feel genuinely live and natural.

How we built it

Penny is built on Gemini Live API for real-time voice interaction with full duplex audio and interruption handling. All three CT town budget PDFs (2025-26) were processed using Gemini Vision, extracting structured JSON with department spending, mill rates, and key facts. This JSON is embedded directly into Penny's system prompt, no vector DB needed giving her instant, grounded access to all town data. Town avatars were generated using Gemini image generation. The frontend is built with Streamlit and Plotly for dynamic charts, deployed on Google Cloud Run. The backend uses Python and FastAPI with the Google GenAI SDK.

Challenges we ran into

The biggest challenge was getting Gemini Live API to return structured JSON alongside voice responses. I needed both audio output and chart data simultaneously. I solved this by enforcing a strict JSON response schema in the system prompt and parsing the text transcription in parallel with audio playback. CT town budget PDFs are all scanned documents with no machine-readable text, requiring Gemini Vision to extract structured data from raw images across 88+ pages per town.

Accomplishments that we're proud of

I'm proud that Penny never hallucinates every number she cites is pulled directly from real 2025-26 municipal budget documents. The interruption handling feels genuinely live users can cut Penny off mid-sentence and she pivots instantly without any awkward pause. I'm also proud of the end-to-end multimodal experience: voice in, voice out, charts rendering in sync, and a live CT map animating as towns are mentioned all working together seamlessly. Deployed and live on Google Cloud Run within 3 days.

What we learned

Gemini's large token context window is powerful enough to hold entire municipal budget datasets, no vector DB required at this scale. Enforcing structured JSON output from a Live API voice session requires careful system prompt engineering. Audio architecture for voice agents is fundamentally different on the server vs the browser moving audio handling to the client side via WebRTC is the right approach. Most importantly: a focused, well-executed idea beats a feature-heavy but buggy one every time.

What's next for Penny - Connecticut T Town Advisor

I plan to expand Penny to cover all 169 Connecticut towns making her the definitive voice AI for CT civic data. l'm planning add school performance data, crime statistics, and commute times alongside budget data for richer town comparisons. Long term, the architecture generalizes to any US municipality making government budget data accessible to every citizen through voice, not just those who know how to read a PDF.

Built With

fastapi
gemini-2.5-flash
gemini-live-api
google-cloud
google-cloud-run
google-genai-sdk
plotly
python
streamlit

Submitted to

Gemini Live Agent Challenge

Created by

I single-handedly designed and built Penny - CT Town Advisor from scratch over 3 days as a solo project.

My contributions include:
- Designed the full system architecture: PDF ingestion pipeline, Gemini Live API voice layer, structured JSON context strategy, and Streamlit frontend

- Built the data pipeline: processed 3 scanned CT municipal budget PDFs (300+ pages total) using Gemini Vision, extracting structured JSON with department spending, mill rates, and
town personas

- Engineered the voice layer: integrated Gemini Live API with full duplex audio, interruption handling via built-in VAD and parallel JSON parsing alongside audio playback

- Built the frontend: Streamlit UI with live Plotly charts, CT choropleth map, AI-generated town avatars, property tax
calculator, and Zillow listings integration

- Deployed the full stack on Google Cloud Run with Secret
Manager for API key management

- Wrote all prompts, system instructions, and persona
definitions that make Penny feel warm, grounded, and
genuinely helpful

Karthi Kishore Sounder

Updates

Karthi Kishore Sounder started this project — Mar 15, 2026 10:16 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.