Inspiration
Presentations can be a hassle; especially if you're preparing alone. The rehearsals do not inspire confidence, you keep messing up, and end up procastinating because nobody you're making the rules. That lack of direction and motivation is not unexpected, it is cognitive overload for trying to present while auditing simultaneously. I considered this as an excellent situation to implement AI, giving a real time feedback partner that corrects you in real time, so you don't have to deal with unnecessary work.
What it does
Auditore Eloquence enables Live Audits for Professional Communication (Including but not limited to: Presentations, Seminars, Interviews, Speeches, and Debates). These sub-categories have dedicated 'Templates', which are the engine's model for defining pipelines, flow of control, agent roles, and priority levels. It is not merely dynamic instructions, but a dynamic layout.
Core Features:
- Real Time feedback: Eloquence detects flaws in communication through multimodal analysis, and triggers feedback based on priority level: {High: Audio Feedback, Low: Text Feedback}.
- Specialized Audit Table: The Auditor is not a single generalist agent, but a council of multiple specialized agents, designed for giving expert feedback in distinct core aspects of communication. This is made possible through a multi-layer approach, where the receptor agent and reasoning council are assigned to different stages of the audit.
- Templates: Professional Communication is very broad. A debate consists of a two-way argumentation chain between people with conflicting stances, and is judged based on rhetoric and substantiation, While a Speech would consist of a monologue, where confidence, fluency, and fairness is necessary. We solve this with specialized templates, coming with customized criteria and layout. Users can choose their template through an SVG Donut Navigator.
- Dynamic Layout: I designed a dynamic-layout paradigm, where the architecture itself undergoes changes for adapting to tasks.
How we built it
Frameworks: Python logic with FastHTML for frontend, Custom CSS + Javascript (for svg navigator), FastAPI for connecting the logic and frontend, Gemini 2.5 Flash and Flash Native Audio Dialog as brains for reasoning/live agents using websockets, implemented with Google GenAI library and Deployed on Google Cloud Run.
Intellectual Property Notice The Dyanmic Layout Engine architecture and adaptation paradigm is pending patent (2026). The source code is provided under the MIT License for hackathon evaluation.
Architecture:
- Custom definition modules for Live Layer and Reasoning Layer managed by Orchestration Logic, and implemented in the main application, are all written in Python (Google-GenAI, websockets, FastHTML, FastAPI, Numpy).
- The workflow, criteria for evaluation, and priority levels are all defined by JSON format Templates, which is the core mechanism of the dyanamic layout engine. Each template must follow the specific schema I've defined, but are flexible with agent roles and instructions. The dictionaries flow through a parsing and validation pipeline before they are integrated into the system.
- UX is handled by FastHTML, CSS, and JavaScript, providing a pleasant and professional interface consisting of pink, yellow, and green elements. Navigation is made smoother with SVG Donuts.
- Gemini 2.5 Flash and Flash Native Audio Dialog are the brains for layer 2/layer 1 respectively. 2.5 Flash handles reasoning tasks which require some time, while the Native Audio Model is used as a live receptor, recieving video and audio feed through websockets.
Challenges we ran into
This project came with a fair share of challenges, and the most severe would be the Latency Bottleneck. I had to design a Live Agent Workflow, but that would mean giving up on multi-agent architecture for optimizing Attention dilution. Even with AI Augmented Engineering for rapid prototype development, the time constraints were not favourable for an asynchronous agentic architecture, with specialized agents running in parallel at near zero latency. I did not want to compromise for a Jack of All Trades, Master of None Audior, as it would suffer greatly from attention dilution for a single category. I came up with a multi-layer, dynamic architecture based on dedicated templates for specific tasks to solve this. Other challenges included: managing interaction with browser, invalid audio formats, 1007s for the live receptor, 503s for the rest API agents. I built robust exception handling pipelines for these issues.
Accomplishments that we're proud of
Dynamic Layout Architecture: Highly flexible architecture for agentic systems.
Templates: The keys of Dynamic Layout. Specialized workflows for specific branches.
Transparent Flow Of Control: This is a really neat feature for a user, and a lifesaver for a developer during testing.
What we learned
Physics Reality Check: It is not advisable to engineer a system running multiple reasoning agents in parallel for a low latency, interactive task, on limited resources. Dynamic Layout: Changing the architecture for adaptation across distinct branches of a field is useful without reality check exceptions.
What's next for Auditore Eloquence
Detailed session-score system, and records of best sessions/starred sessions. I would also like to flesh out the dynamic layout's template schema, for more advanced adaptation. I'd also like to add a 'semantic notes' reinforcement learning pipeline, using a NoSQL GCP Database. In the future, I'd like to push the limits with live systems running multiple agents in parallel.
Built With
- css
- fastapi
- fasthtml
- gemini
- google-cloud
- google-genai
- javascript
- json
- numpy
- python
- vertex-ai
- websockets

Log in or sign up for Devpost to join the conversation.