Inspiration

We were frustrated by how unreliable modern voice assistants feel. While testing, Google Maps’ Gemini took over 10 seconds to tell us where the nearest restaurant was. For something that simple, that delay felt unnecessary.

We asked a bigger question:

What happens when communities rely on these systems; during emergencies, in rural areas, or in low-connectivity environments?

If voice assistants are slow, cloud-dependent, and unpredictable, they fail the people who need them most.So we decided to build something local-first, resilient, and efficient.

What it Does

LowLa is a Low-Latency Intent Router for voice assistants.

Instead of sending every command to an LLM, LowLa:

Converts speech to text --> Embeds the command--> Searches a local Chroma vector database --> Executes instantly if there’s a confident match --> Falls back to an LLM only when necessary

In our demo:

Push-to-talk works Vector search via Chroma runs under 500ms LLM fallback works The hybrid system functions live We eliminated the regex layer after discovering that our vector search was fast and reliable enough on its own.

How We Built It

LowLa combines:

Faster Whisper for speech-to-text

Sentence Transformers (MiniLM) for embeddings

ChromaDB for vector search

Node.js for routing logic

Python backend for vector operations

ElevenLabs for text-to-speech

LLM fallback for unknown commands

Challenges We Ran Into

Virtual environment configuration delays

ElevenLabs API credit limitations

Latency tuning and threshold calibration

Keyword collisions in early routing logic

Tight hackathon time constraints

Accomplishments We’re Proud Of

Building a fully functional hybrid voice routing system

Eliminating unnecessary LLM calls for matched intents

Achieving sub-500ms local execution

Successfully running a live demo

Designing an adaptive architecture that improves over time

The biggest moment for us was realizing, "We can actually demo this live."

What We Learned

Retrieval is often more efficient than generation

Embeddings are incredibly powerful for flexible matching

Vector databases scale better than we expected

Latency matters more than model size

Integration across systems is harder than building individual components

Hybrid AI systems are more practical than purely LLM-based systems

We shifted from thinking “How do we use an LLM?” to “When do we not need one?”

What’s Next

With more time, we would extend LowLa into:

Predictive intent modeling (begin execution before the user finishes speaking)

Full wake-word integration

Personalized adaptive caching per user, perhaps based on career.

Multi-step and contextual command routing

Edge-device deployment optimization

Our long-term vision is a voice assistant that:

Learns once. Executes forever.

It's time to adopt the voice assistant that grows with you.

Built With

Share this project:

Updates