Expenso — The Voice-Native Financial Copilot

💡 Inspiration: The Financial Literacy Barrier

Expenso was born from a problem every Indian household, college dorm, and small business silently suffers through: money is confusing, not because it's complex, but because every tool
assumes you already know what you're doing.

The spark was small — a chaotic shared rent-and-grocery spreadsheet in a college house that no one wanted to maintain. But behind it sat a much larger truth:

▎ 70% of Indians never track personal finances. Not from lack of intent, but from cognitive overload.

Existing apps (Splitwise, Walnut, MoneyView, even global tools like Mint and YNAB) demand the same ritual: open app → tap → categorize → repeat. That friction is why financial literacy in India is stuck at 27% (NCFE, 2023) despite having the world's largest digital payment infrastructure (UPI).

We didn't want to build another tracker. We wanted to remove the keyboard, the menu, the form — and replace them with the most natural human interface: speech.

Expenso's thesis is simple: the next billion users won't type their way to financial wellness. They'll talk.


🚀 What it does: The Voice-Native Financial Copilot

Expenso is a voice-first financial OS built around Niva, an AI assistant that doesn't just answer — it acts.

A user says:

▎ "Niva, I spent ₹800 on groceries with Saurav, split it equally, and remind me if I spend more than ₹3,000 on food this week."

In under 2 seconds, Niva:

  • Logs the expense with the right category
  • Splits it 50/50 and updates Saurav's shared balance in real time
  • Sets a contextual budget guardrail
  • Recalculates the user's Financial Health Score (0–100) — a single, gamified number that replaces twelve confusing dashboards

Niva is not a chatbot. It's an agentic financial operator that works across three integrated layers:

┌──────────────┬───────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────┐
│ Layer │ What it does │ Who it's for │
├──────────────┼───────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
│ Personal │ Voice-driven expense capture, budgeting, insights, multi-currency │ Individuals, students, professionals │ ├──────────────┼───────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤ │ Expenso Biz │ Voice bookkeeping ("Mark ₹2,000 cash sale, GST 18%") │ Kirana stores, freelancers, micro-businesses │
├──────────────┼───────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
│ Shared Rooms │ Real-time group expense sync with AI-suggested settlements │ Roommates, families, trips, teams │
└──────────────┴───────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────┘

Niva also answers:

  • "Am I overspending this month compared to last?"
  • "How much can I afford to save if I cancel two subscriptions?"
  • "What did I spend on food deliveries in March?"

…in conversation, with charts rendered on demand, and citations to the actual transactions.


⚙️ How we built it: Enterprise Tech for Everyday Users

Expenso is engineered as a production-ready, offline-first, privacy-aware mobile platform.

Mobile & Sync

  • Flutter — single codebase, Android + iOS
  • Hive — offline-first local store; every action works without connectivity and reconciles on reconnect
  • Supabase (Postgres + Realtime + Row-Level Security) — multi-device sync with millisecond latency for shared rooms
  • pgvector — semantic memory of past transactions so Niva can answer fuzzy questions ("that big dinner last week")

The Niva AI Stack

  • Gemini 2.0 + Llama 3 (via Groq) — dual-model routing: Gemini for complex reasoning and tool planning, Groq for sub-300 ms quick responses
  • Custom ToolExecutor framework — a strict, schema-validated bridge between natural language and the database. Niva cannot hallucinate a transaction; every action is a typed, auditable
    tool call
  • Deepgram (STT) + ElevenLabs (TTS) — streaming voice in both directions, so Niva starts speaking before it finishes thinking
  • Vapi — real-time voice orchestration with <500 ms turn latency

Privacy & Edge ML

  • On-device TF-IDF + Logistic Regression classifier for instant, private expense categorization — no transaction text leaves the phone unless the user invokes Niva
  • Local-first encryption for sensitive fields; cloud sync uses zero-knowledge keys

Why this matters
The combination — agentic LLM + strict tool schemas + offline-first sync + on-device ML — is what lets Expenso work in a Tier-3 town with patchy 4G and feel as fast as ChatGPT in Bangalore.


🚧 Challenges we ran into

  1. Agentic Hallucination on Money — LLMs love to invent. We built a typed ToolExecutor with JSON-schema validation, dry-run previews, and undo on every mutation. Niva can never silently corrupt a balance.
  2. Sub-second Voice Loop — Stitching STT → LLM → tool call → DB write → TTS in under 1.5 seconds required streaming at every layer and speculative TTS playback while tool calls were still resolving.
  3. Real-time Multi-user State — Shared Rooms needed conflict-free updates across 5+ devices simultaneously. We used Supabase Realtime + a CRDT-inspired merge layer for offline edits.
  4. Trust on Day One — Users won't say "transfer ₹5,000" to an AI they just met. We solved this with an explicit confirm-before-execute layer, transaction previews, and a visible audit log of every Niva action.

🏆 Accomplishments that we're proud of

  • Zero-UI Mode — a user can onboard, log a month of expenses, settle a group, and check their financial health without ever touching a button
  • Financial Health Score — collapses 30+ metrics into one explainable number, with Niva narrating why it changed
  • Offline-first AI — categorization, balances, and basic queries all work with no internet; Niva degrades gracefully to a local SLM for queries
  • One-shot multi-action commands — most assistants do one thing per turn; Niva chains 4–6 tool calls in a single utterance
  • Multi-currency, zero extra cost — built into the schema from day one, no exchange-rate API tax

🧠 What we learned

  • The interface is the product. Removing the form was worth more than any new feature.
  • LLMs are decision engines, not conversation toys. The magic is in the tool layer, not the prompt.
  • Trust compounds. Every confirmed action makes the user delegate more next time.
  • Solving boring problems beats inventing exciting ones. "Did I pay Saurav back?" matters more than another chart.

🔮 What's next for Expenso: Social Impact & Scalability

  • Proactive Niva — moves from reactive ("you asked") to anticipatory ("your electricity bill is 22% higher this month, want me to investigate?")
  • Receipt Intelligence — point camera → Niva extracts items, categorizes, splits, files for warranty/tax
  • Vernacular Voice — full conversational support in Hindi, Tamil, Telugu, Bengali, Marathi using Sarvam AI / IndicTrans for the next 500M users
  • Decentralized Financial Identity (DFI) — a portable, user-owned credit profile for the 190M unbanked Indians who have spending history but no CIBIL score

Built With

Share this project:

Updates