Inspiration

The inspiration for Easy MonAI came from seeing complex bank statements and financial documents and being bewildered at the complexity of the documents and how they related to each-other.

Easy MonAI allows users to form a unified understanding of all of their financial documents in one place, easily digestible with a simple report, or a podcast for people too busy to read.

What it does

Easy MonAI transforms the "anxiety of the spreadsheet" into a conversational, educational experience. Our system follows a sophisticated three-stage pipeline to bridge the financial literacy gap:

  1. Intelligent Data Processing: Users upload financial documents (PDFs and CSVs). Our platform utilizes Google Gemini's advanced multimodal reasoning to understand varied and complex financial documentation. Gemini is able to ask the user clarifying questions and request additional documents to ensure a complete view of the user's financial situation.
  2. AI-Driven Podcast Synthesis: After Gemini produces a report, it produces a script for a podcast, and it is narrated using ElevenLab's advanced voice generation in a language of the user's choosing.
  3. Conversational Financial Coaching: * The Briefing: A virtual podcast host delivers a summary of your financial health, identifies your top spending categories, and calls out concerning patterns.
    • Actionable Tips: Every episode concludes with 2-3 specific, actionable tips to improve your yields and savings.
    • Interactive Insights: Users aren't just listeners; they can ask follow-up questions to the AI agent to gain specific insights into their data.

How we built it

We utilized Google Gemini's advanced multi-modal reasoning to break down complex and hard to parse financial documents into simple reports with clear and actionable steps to improve a user's finances. Gemini is able to ask the user clarifying questions and request additional documents to ensure a complete view of the user's financial situation. We utilize AWS to host user documents (encrypted using the user's own password for security) and podcast files via S3 and AWS Cognito for user login and authentication. We also utilize AWS DynamoDB to store user account data. We utilize Vultr to host our program back-end and front-end, with Cloudflare for our domain and DNS ( easymonaitemp.com ).

Challenges we ran into

The biggest challenge was the integration of Google Gemini and hosting our project. Google Gemini's SDK had some issues using both search and tool-calling (important for our agentic document workflow). To fix this, we present a second instance of Gemini with search and no tools, which the primary agent can call as if it were a tool.

For hosting, we had issues ensuring the server was always up to date. To fix this, we created a github action which builds and sends a docker image to the server on every successful commit.

ElevenLabs also presented some challenge. We wanted users to be able to follow along visually with the podcast if they liked, so we made it so that the user can see a transcript of the podcast with the text highlighted as it reads to the user. This was difficult, but Elevenlabs made it possible by returning extensive timestamp data.

Accomplishments that we're proud of

We are proud of the agentic document flow, and Gemini's ability to understand a wide variety of financial documents. We are proud of how our UI turned out, and how simple and easy to use the program is.

We are also proud of our automatic deployment process, which allows us to focus on developing, rather than infrastructure.

What we learned

We learnt how to integrate AWS services such as S3 and Cognito into a project securely using strictly scoped IAM permissions. We got experience working with the Google GenerativeAI SDK, and integrating Gemini into a project.

We learnt how to integrate advanced ElevenLabs features such as transcript timestamps and voice generation controls to ensure our podcasts are generated quickly for users while maintaining excellent audio quality.

Built With

Share this project:

Updates