## Inspiration

Our team has 3 Spanish speakers and 1 Brazilian Portuguese speaker. During our first planning call, we realized we were constantly stopping to clarify, repeat, or simplify our words. The conversation felt broken.

That's when it hit us: we're building at a hackathon about AI agents, yet we can't even talk naturally with each other.

Language barriers kill conversations. Interpreters are expensive, translation apps are clunky, and existing solutions break the natural flow of conversation.

So we built the personal agent we needed ourselves.

## What it does

i18n meet is your personal translation agent for video calls. Each participant speaks their native language and hears everyone else in their own language—instantly.

  • You speak Spanish → Others hear English, Portuguese, Japanese...
  • They speak Japanese → You hear it in Spanish
  • Everyone uses their native language. Zero friction.

Key features:

  • Real-time speech transcription with automatic language detection
  • AI-powered translation to 10+ languages
  • Natural voice synthesis with native-sounding voices per language
  • Floating AI agent panel with meeting actions detection
  • Live transcript with original + translated text

## How we built it

The Translation Pipeline: Speech → Daily.co + Deepgram STT → Groq Translation → ElevenLabs TTS → Listener

Tech Stack:

  • Daily.co for video infrastructure with native Deepgram transcription (nova-2 multilingual model)
  • Groq (llama-3.1-8b-instant) for ultra-fast translation
  • ElevenLabs Flash v2.5 for natural voice synthesis (~75ms latency) with language-specific voices
  • OpenAI GPT-5.1 for the AI agent (meeting summaries, action detection)
  • Next.js 15 + React 19 for the frontend
  • motion/react for smooth UI animations (draggable/resizable agent panel)
  • Neon (Postgres) + Drizzle ORM for data persistence
  • Vercel for deployment

Voice mapping: Each language has a native-sounding ElevenLabs voice:

  • English: Adam, Spanish: Lily, Portuguese: Freya, French: Charlotte
  • German: Hannah, Italian: Serena, Japanese: Elli, Korean: Michael

## Challenges we faced

  1. Audio routing complexity: We mute the original remote audio and play translated TTS instead. Managing this without echo or overlap required a queue-based playback system.

  2. Latency optimization: Real-time translation must feel instant. We combined:

    • Groq's llama-3.1-8b for fast translation
    • ElevenLabs Flash v2.5 with optimizeStreamingLatency=4
    • Non-blocking audio queue
  3. Multi-language sync: Each participant needs their personalized audio stream. We handle transcription events per-speaker and route translations to the correct listeners.

  4. Transcription reliability: Daily.co sometimes doesn't provide translations, so we built a Groq fallback that kicks in automatically.

## What we learned

  • ElevenLabs Flash v2.5 is incredibly fast—75ms latency makes real-time TTS viable
  • Daily.co's transcription API is powerful but requires careful event handling for edge cases
  • The "universal translator" from Star Trek is finally possible in 2026
  • Building with your own pain point makes development 10x more focused

## What's next

  • Voice cloning so translations sound like the original speaker (ElevenLabs supports this)
  • Meeting summaries via Resend - automatic email with transcript + action items
  • Support for 50+ languages - expand beyond the current 10
  • Mobile apps - React Native with Daily.co SDK
  • Agent improvements - better action detection, calendar integration

Built with

Next.js, React, TypeScript, Tailwind CSS, Daily.co, Deepgram, Groq, ElevenLabs, OpenAI, Neon, Drizzle ORM, Vercel, motion/react


Try it out links

  1. https://i18n.crafter.run
  2. https://github.com/crafter-station/i18n

Built With

  • drizzle
  • next.js
Share this project:

Updates