Inspiration
The inspiration for TriSense Agent came from the realization that true human-AI collaboration shouldn't be limited to a text box. We wanted to build a "Live Agent" that could see what we see, hear what we say, respond in real time, and help us create across multiple media formats.
The goal was to consolidate three distinct AI interaction paradigms into one unified "Tri-Sense" platform:
- Live Voice
- Creative Orchestration
- Visual Navigation
How we built it
We architected the system with a robust Node.js 20 and TypeScript backend, deployed on Google Cloud Run.
The frontend is a custom-built Glassmorphism SPA using Vanilla JavaScript to ensure maximum performance and a premium aesthetic without heavy framework overhead.
AI Core
We utilized the @google/generative-ai SDK, specifically leveraging gemini-2.0-flash for high-speed reasoning and gemini-live-2.5-flash-preview for interruption-tolerant voice relay.
Context & Memory
Google Cloud Firestore provides stateful session tracking, allowing the agent to remember past turns across different modes.
Infrastructure
The project features a fully automated CI/CD pipeline via Cloud Build and Infrastructure-as-Code using Terraform.
Challenges we faced
Voice Interruption Handling
Synchronizing full-duplex audio over WebSockets required custom relay logic to ensure the agent stops speaking immediately when the user interrupts.
Structured Multimodal Parsing
Designing a "Creative Storyteller" mode that outputs narratives interleaved with image prompts and video shot lists required rigorous prompt engineering and section-based parsing logic.
Vision-to-Action Reliability
Ensuring the "UI Navigator" produced actionable numbered steps from a single screenshot was solved by implementing a specialized confidence-scoring mechanism.
What we learned
Building TriSense taught us the value of cognitive routing—an orchestrator that understands whether a user wants creative help, a quick chat, or technical navigation.
We also learned that providing transparency, such as confidence levels and guardrail warnings, builds significantly more trust in multimodal AI agents.
Built With
- artifact-registry-devops:-terraform
- cloud-build
- css3-frameworks/platforms:-node.js
- docker
- express.js
- firestore
- html5
- javascript
- languages:-typescript
- live-preview)-cloud-services:-google-cloud-run
- secret-manager
- websocket-(ws)-ai-apis:-google-gemini-api-(2.0-flash
Log in or sign up for Devpost to join the conversation.