Choglin: The Voice-First HMI Engine

Inspiration We were inspired by the gap between high-speed AI transcription (like Omi) and the friction of manual task switching. We wanted to build a "Human-Machine Interface" that felt less like a chatbot and more like a tactical HUD—turning natural speech into instantaneous system actions without ever touching a keyboard.

What it does Choglin is a high-performance orchestration layer. It ingests live audio transcription signals via a secure API, processes the intent using Gemini-powered intelligence, and dispatches commands to a stack of productivity nodes. Whether it's drafting a Slack message, creating a Google Doc, or archiving session notes, Choglin executes at the speed of thought.

How we built it

The stack is built for ultra-low latency and a premium technical aesthetic:

Core: Next.js 15 with TypeScript for a robust, type-safe architecture.

Intelligence: Google Gemini 1.5/2.5 Flash for rapid intent extraction and JSON command mapping. Styling: A custom "Tactical HUD" design system using Vanilla CSS, featuring glassmorphism, JetBrains Mono typography, and high-frequency micro-interactions. Integration: A modular "Node" system designed to bridge Omi transcription signals with external APIs like Slack, Google Workspace, and Discord. Challenges we ran into Mapping vague, conversational speech to rigid API payloads was the biggest hurdle. We had to iterate heavily on our prompt engineering to ensure Gemini would strictly output actionable JSON without "chatting." We also faced significant challenges with integration permissions and real-time signal polling to keep the terminal feed feeling alive.

Accomplishments that we're proud of

We are incredibly proud of the UI/UX design. Instead of a standard dashboard, we created a high-fidelity command center that makes the user feel like they are operating a future-gen system. We also successfully built a flexible "Engine-Node" architecture that allows us to plug in new services (like Notion or GitHub) with minimal configuration.

What we learned

We learned that in voice-driven interfaces, feedback is everything. If the system doesn't visually react within milliseconds of speech, the "magic" breaks. This taught us how to optimize Next.js server routes and implement predictive UI states to mask API latencies.

What's next for Choglin: Voice Automation Integration

The vision for Choglin is to become the unified OS for ambient computing. Our next steps include:

Streaming Signals: Moving from polling to WebSockets for true zero-latency updates. Local Action Engine: Allowing Choglin to execute local shell commands and filesystem actions via voice. Contextual Memory: Implementing a vector database so Choglin remembers your preferences and past commands for better personalized automation.

Built With

Share this project:

Updates