MemoryBank: Your AI Financial Assistant That Actually Remembers You

💡 Inspiration

Money isn't just about transactions - it's about patterns, habits, and decisions over time.

But most financial tools treat every choice in isolation:

  • "Can I afford this?"
  • "What's my balance?"

They don't know you. They don't remember.

We asked: What if your financial advisor actually remembered your spending habits, your impulses, your patterns - and used that to guide real decisions?

We built MemoryBank - an AI system that learns from your past to shape your future conversations.

🎤 What It Does

MemoryBank is a voice-based financial AI assistant that remembers you.

You can:

  • Call your AI advisor in real time via Twilio
  • Upload bank statements (PDFs) to build context
  • Get personalized responses tailored to you

The system:

  • Parses financial data into structured insights
  • Tracks spending patterns and habits
  • Remembers previous interactions and decisions
  • Responds with context-aware, memory-driven advice
  • Evolves with every call

The difference: Instead of "Can I afford this?" our system says "You can... but you've already hit that category 4 times this week."

🛠 How We Built It

We architected a lightweight backend pipeline:

  1. Data Layer: Bank statement PDFs are parsed into structured JSON (transactions, categories, balances)
  2. Voice Interface: Twilio handles incoming calls, speech recognition, and audio routing
  3. Memory & Context: We retrieve relevant financial history and behavioral patterns from stored profiles
  4. AI Brain: Google Gemini API generates conversational, context-aware responses with custom prompts for natural speech
  5. Voice Synthesis: Inworld TTS converts AI responses to expressive speech with emotional tags (pauses, emphasis, tone)
  6. Learning Loop: Post-call analysis extracts behavioral signals and updates memory for future interactions

Tech Stack:

  • Flask (Python backend)
  • Twilio (voice calls + speech recognition)
  • Google Generative AI (Gemini)
  • Inworld TTS (expressive text-to-speech)
  • JSON-based memory storage

This creates a closed loop where the system becomes more personalized with every call.

🚧 Challenges We Ran Into

TTS Flexibility: We initially tried multiple text-to-speech solutions to support different personalities. We landed on Inworld TTS for its expressiveness and emotional tag support, allowing responses to sound natural and contextual.

Memory Architecture: Building true memory wasn't just storing data — we had to design a system that retrieves relevant historical context and integrates it into real-time conversations at scale. Structuring data for Gemini to reason over both current financials and past user behavior was non-trivial.

Real-time Conversation Flow: Managing multi-turn voice interactions with Twilio introduced complexity. Speech recognition inconsistencies, handling silence gracefully, and maintaining conversational rhythm required careful TwiML tuning and timeout management.

Context Quality: The AI's usefulness depends entirely on structured, relevant context. We spent significant time on parsing accuracy and contextual retrieval so Gemini could generate intelligent (not generic) responses.

Voice Naturalness: Making responses sound like talking to a friend - not a bot reading a script - required extensive prompt engineering to encourage short, natural language with emotional tags.

🏆 Accomplishments We're Proud Of

  • Built a true memory-driven financial agent, not just a chatbot
  • Achieved end-to-end voice integration with real-time processing
  • Created a system that actually learns from behavior and uses it in reasoning
  • Designed conversational AI that sounds natural, sassy, and helpful
  • Successfully integrated parsing, behavioral tracking, memory, and real-time voice

📚 What We Learned

  • Memory is the hardest part: Raw data isn't intelligence. Structuring context for AI reasoning takes the most engineering effort.
  • Constraints breed creativity: Working within hackathon time limits forced us to pick the right tech stack and avoid over-engineering.
  • Voice is hard: Real-time speech, natural pauses, and conversational rhythm matter way more than written interaction.
  • Behavior > balance: Knowing your spending patterns is 10x more useful than just knowing your account balance.
  • Prompt engineering is everything: Getting the AI to sound natural, concise, and personalized required extensive iteration on instructions.

🚀 What's Next

  • Real-time budget alerts based on category thresholds
  • Advanced behavioral scoring models
  • Multi-user household financial coordination
  • Transaction categorization improvements and auto-learning
Share this project:

Updates