Inspiration

The inspiration for TriSense Agent came from the realization that true human-AI collaboration shouldn't be limited to a text box. We wanted to build a "Live Agent" that could see what we see, hear what we say, respond in real time, and help us create across multiple media formats.

The goal was to consolidate three distinct AI interaction paradigms into one unified "Tri-Sense" platform:

  • Live Voice
  • Creative Orchestration
  • Visual Navigation

How we built it

We architected the system with a robust Node.js 20 and TypeScript backend, deployed on Google Cloud Run.

The frontend is a custom-built Glassmorphism SPA using Vanilla JavaScript to ensure maximum performance and a premium aesthetic without heavy framework overhead.

AI Core

We utilized the @google/generative-ai SDK, specifically leveraging gemini-2.0-flash for high-speed reasoning and gemini-live-2.5-flash-preview for interruption-tolerant voice relay.

Context & Memory

Google Cloud Firestore provides stateful session tracking, allowing the agent to remember past turns across different modes.

Infrastructure

The project features a fully automated CI/CD pipeline via Cloud Build and Infrastructure-as-Code using Terraform.

Challenges we faced

Voice Interruption Handling

Synchronizing full-duplex audio over WebSockets required custom relay logic to ensure the agent stops speaking immediately when the user interrupts.

Structured Multimodal Parsing

Designing a "Creative Storyteller" mode that outputs narratives interleaved with image prompts and video shot lists required rigorous prompt engineering and section-based parsing logic.

Vision-to-Action Reliability

Ensuring the "UI Navigator" produced actionable numbered steps from a single screenshot was solved by implementing a specialized confidence-scoring mechanism.

What we learned

Building TriSense taught us the value of cognitive routing—an orchestrator that understands whether a user wants creative help, a quick chat, or technical navigation.

We also learned that providing transparency, such as confidence levels and guardrail warnings, builds significantly more trust in multimodal AI agents.

Built With

  • artifact-registry-devops:-terraform
  • cloud-build
  • css3-frameworks/platforms:-node.js
  • docker
  • express.js
  • firestore
  • html5
  • javascript
  • languages:-typescript
  • live-preview)-cloud-services:-google-cloud-run
  • secret-manager
  • websocket-(ws)-ai-apis:-google-gemini-api-(2.0-flash
Share this project:

Updates