Inspiration

My inspiration came from seeing the widespread inefficiencies in current multilingual meeting tools. Standard translation apps force the user to constantly switch focus from the conversation to the technology. My goal was to eliminate that distraction by building an application that automatically handles the complex linguistic routing in the background, allowing me to focus entirely on the discussion.

What it does

My application, Goglobal, is a seamless, AI-powered communication agent designed for international meetings. It ensures I never miss a moment of the conversation while intelligently managing translation and detailed transcription.

My application offers two core utilities, accessible through a single, floating agent button:

  1. Live Voice Translation (Sub/Dub): The system automatically detects the language being spoken by others (Mandarin, Korean, Spanish, etc.) and instantly translates it into my pre-selected target language. I have full control to receive the output as Subtitles only, Dubbed Audio only, or both simultaneously.

  2. Integrated Transcript & Summary: The app records the entire conversation, regardless of the language spoken by guests, creating a unified transcript. I can then use the integrated Gemini Summarizer API to instantly generate a meeting summary, delivered in my target language, focusing on key decisions.

    How we built it

  3. I constructed Goglobal as a modern React Single-Page Application (SPA) using a highly modular and secure architecture:

  4. Frontend (UX/UI): I created a minimalist, floating agent interface designed to pop out unobtrusively over any meeting window. The UI uses compact controls and strong blue/orange branding.

  5. Real-Time Voice & Detection: I integrated the browser's native Web Speech API for mic capture and the Azure Translator API for highly accurate, automatic language detection of the speaker's language.

  6. Intelligence Layer: I utilized the Gemini API to power the essential on-demand summarization tool, focusing on extracting actionable intelligence from the transcript.

  7. No Self-Translation: I purposefully configured the logic so that my own voice is only recorded to the transcript, ensuring I never hear my own speech translated back to me.

    Challenges we ran into

  8. Environment Stability (The Vicious Cycle): The most time-consuming challenge was a persistent caching bug within the Codespaces environment that prevented environment variables (.env) from loading, forcing numerous restarts and troubleshooting steps that were entirely external to the functional code.

  9. Browser TTS Inconsistency: Relying on the browser's native Text-to-Speech (speechSynthesis) resulted in unreliable, low-quality voice output (e.g., English voices reading Spanish). We ultimately fixed this by implementing a smart voice filtering mechanism and preparing the system for a move to higher-quality, reliable cloud-based TTS (like Azure TTS) to guarantee pronunciation accuracy in languages like Korean and Vietnamese.

    Accomplishments that we're proud of

  10. The Seamless Output Experience: I successfully unified automatic language detection, translation, and user-driven dual-mode control (Sub and Dub toggles) into a single, intuitive system.

  11. Minimalist, Professional UX: I created a clean, professional, and entirely functional interface that provides powerful tools without cluttering the main meeting screen.

  12. Full Functional Integration: I successfully unified three separate major AI services (Azure Translator, Gemini Summarizer, and browser media tools) into a single, cohesive, ready-to-use application.

    What we learned

  13. The necessity of building redundancy and stability into the application launch process due to inconsistent development environments.

  14. The critical importance of modular code structure to allow for complex feature integrations (like simultaneous Sub and Dub toggles).

    What's next for Goglobal

  15. Multi-Participant Support: I plan to refactor the core logic to identify and translate multiple speakers simultaneously.

  16. Persistent Storage: I will integrate Firebase or Firestore to save meeting transcripts and generated summaries for post-meeting review.

Built With

Share this project:

Updates