JustRemind — Multilingual Voice Reminders Without Language Settings

Inspiration

In real life, language is rarely simple.

Many people live and work in multilingual environments. Their phone system language might be English, while their daily conversations and thoughts happen in Chinese — often switching naturally between languages, sometimes within the same sentence.

Most voice interfaces assume a single language context. As a result, developers are forced to:

Ask users to choose a language in advance
Require manual language switching
Or build fragile language-detection pipelines

These approaches introduce configuration screens, edge cases, and failure modes — and still break down in mixed-language scenarios.

I encountered this problem firsthand. My phone system language is English, but I often create reminders by speaking Chinese, or by mixing Chinese and English together. Existing voice reminder solutions either failed to recognize my speech or required awkward manual setup.

I wanted to eliminate language configuration entirely — and let users speak naturally.

What it does

JustRemind allows users to create reminders using natural voice input in complex, multilingual environments — without selecting or configuring any language.

Users can speak in any language, or mix languages freely, and JustRemind will:

Automatically detect one or multiple spoken languages
Accurately transcribe mixed-language speech
Understand user intent and temporal expressions
Convert speech into structured, reliable reminders
Ask clarifying questions when necessary instead of failing silently

The user experience is intentionally minimal: press, speak naturally, and the reminder is created.

For example, even if the phone’s system language is English, a user can say:

“提醒我明天下午三点 meeting 前半小时喝咖啡”
“Next Friday 帮我 remind 一下给 Mom 打电话”
“提醒我 after work 去 pick up 干洗的衣服”

JustRemind understands the intent and creates the correct reminder automatically.

How I built it

The project is built as a native iOS application with a lightweight cloud backend.

The iOS app records short, user-initiated voice clips.
Audio is securely sent to a Cloudflare Worker endpoint.
The Worker forwards the audio to Gemini 3, which performs language detection, transcription, and semantic understanding.
Gemini returns a strictly structured JSON output containing detected languages, parsed reminder fields, and confidence.
The app creates the reminder locally and synchronizes it across devices.
If the cloud request fails, the app gracefully falls back to on-device speech recognition.

This architecture keeps the client simple while allowing Gemini to handle the most challenging parts: multilingual understanding, reasoning, and ambiguity resolution.

Challenges I faced

One major challenge was ensuring reliable, structured outputs from a generative model. Since the app depends on precise reminder data, I carefully designed prompts that enforce strict JSON responses and added validation and retry logic on the backend.

Another challenge was privacy. The system minimizes data transfer by only processing short, user-initiated voice clips and avoiding long-term storage of audio data.

What I learned

This project showed me that language complexity is not just a speech recognition problem — it is a reasoning problem.

By delegating multilingual understanding and intent reasoning to Gemini, an entire layer of application complexity disappeared. Instead of building language-specific logic, configuration flows, and edge-case handling, I was able to design a simpler and more robust user experience.

This reinforced how powerful multimodal models can be when applied thoughtfully to everyday tools: not by adding features, but by removing friction.

Built With

cloudflare-workers
google-gemini-3-api
ios
json
rest-apis
swift

Updates

Linh Chen started this project — Jan 05, 2026 03:56 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.