Habla

Inspiration

I relocated to Spain and quickly ran into a real problem: I needed to make important local calls (schools, clinics, services), but I don’t speak Spanish yet. Many of these interactions still require a phone call, not an app or website. I built Habla to remove that immediate language barrier.

What it does

Habla has two modes:

Live Call Mode: real-time translation during a 1:1 phone call
Agent Mode: an AI phone agent that calls on my behalf and handles the conversation

It also includes:

Live transcription and transcript updates
Critical info detection/confirmation (for things like names, dates, phone numbers, addresses, amounts)
Verified facts summary after/during calls
Context memory (remembers caller preferences and past context for future calls)

How we built it

I built it as a solo founder across three parts:

habla-core (main backend): FastAPI + Twilio Voice/Media Streams + Amazon Nova 2 Sonic
habla-ios (client): SwiftUI app with WebSocket audio streaming, call UX, history, summaries, and memory
habla-accounts (microservice): AWS Lambda + API Gateway + DynamoDB for secure caller-ID ownership by device

For Live Call Mode, the backend runs dual streaming sessions (both directions) and bridges audio between iOS and PSTN. For Agent Mode, I built a dedicated call manager with real-time status, transcript events, instruction injection, critical-info tracking, and call lifecycle handling.

Challenges we ran into

1:1 latency is still high. It is usable in practice, but model response time is the main bottleneck
Agent call endings were tricky. Early versions could linger or fail to close naturally
Telephony/audio bridging required careful handling of codecs, sampling rates, and streaming reliability
Balancing speed with trust/safety features (critical confirmations + verified summaries) added complexity

Accomplishments that we're proud of

As a solo founder, I shipped a full working product with both Live Translation and Agent Mode
Agent Mode feels smooth in real usage
Live mode has noticeable latency, but it is still practical for real conversations
I implemented high-value trust features: transcription, critical info checks, verified facts, and context memory
I tested it myself end-to-end in realistic scenarios

What we learned

In real-time voice AI, system engineering matters as much as prompting
Model response time dominates user experience in live translation
Prompting alone is not enough for stable phone agents; runtime guardrails and explicit end-call logic are necessary
For sensitive calls, users need structured outputs (transcript + verified facts), not only raw audio

What's next for Habla

Reduce live-call latency further (especially model-response bottlenecks)
Improve agent completion reliability and closure behavior
Expand context memory so follow-up calls feel more personalized and efficient
Broaden language/support coverage and harden production reliability
Client-side: sync with contacts, sync data with iCloud

To try this app, please use the following link: https://testflight.apple.com/join/PkUSuqZm

Built With

amazon-dynamodb
amazon-gateway
amazon-lambda
amazon-web-services
python
redux
swift
swiftui
twilio

Updates

Maksym Bilan started this project — Mar 07, 2026 09:11 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.