Project Story: Translytic AI

Overview

Translytic AI is a Chrome extension designed to enhance the YouTube viewing experience by leveraging Google's AI APIs. It fetches video transcripts, translates them into multiple languages, displays captions on videos, and even provides summarized points. Users can also listen to translated text using Text-to-Speech (TTS), making it easier to understand and enjoy content in their preferred language.


Inspiration

About 3-4 months ago, my team and I started working on Shopify, a completely new technology for us. Creating an application from scratch was challenging due to the lack of comprehensive resources. Despite trying online learning platforms, we couldn't find consolidated guidance because Shopify apps integrate multiple technologies.

Eventually, I came across a YouTube channel with useful tutorials, but it was in Hindi. While I had no trouble understanding, some of my teammates faced difficulties. This inspired me to build an application capable of live-translating YouTube videos, including spoken translations, to make content accessible for everyone.


What it does

  • Live Transcript Translation: Fetches YouTube video transcripts and translates them into various languages based on user selection.
  • Multilingual Captions: Displays translated captions directly on YouTube videos.
  • Summarization: Summarizes the original transcript into concise points for quick understanding.
  • Text-to-Speech (TTS): Allows users to listen to the translated text in their preferred language.

How I built it

  • Technologies Used:
    • Google AI APIs: Leveraged Google’s Translation, Summarization, and Text-to-Speech APIs to process and transform content.
    • Chrome Extensions API: Utilized this to integrate the extension directly with YouTube, enabling real-time interaction with the video.
    • ReactJS: Used React for building the extension's frontend, managing UI components, and providing smooth user interaction.
    • Webpack: Employed Webpack for bundling the ReactJS code, ensuring efficient extension loading and performance.
    • JavaScript: Used JavaScript for handling the logic of fetching video transcripts, applying API services, and managing data flow.
    • HTML/CSS: Designed the UI and overlay captions on YouTube videos using custom styles.

Challenges I ran into

  • Transcript Extraction: Ensuring accurate and timely fetching of transcripts from YouTube videos.
  • Translation Accuracy: Handling nuances in language translation for better contextual meaning.
  • API Rate Limits: Managing usage limits for Google AI APIs during testing and development.
  • Caption Overlay: Aligning captions seamlessly on the video without disrupting the viewing experience.
  • Real-Time Updates: Updating translated captions in real-time for a smooth user experience.
  • API Instability: One of the main challenges I faced was the instability of the APIs. Sometimes, the services would break abruptly, leading to unexpected behavior. Fortunately, the issue has been resolved, and everything is now functioning properly.
  • Text-to-Speech (TTS) Issue in Chrome Canary: I encountered a problem with the Text-to-Speech (TTS) functionality in Chrome Canary. Although speechSynthesis.getVoices() lists 22 languages, the TTS feature only works for English. I tested it using sample code and even tried other tools and extensions to verify if the issue was on my end, but the problem persisted. It seems like there is something specific to Chrome Canary causing this limitation.

  • Slow Summarization API: The summarization API, while effective, is a bit slow, especially with lengthy YouTube video transcripts. My system tends to lag when processing a large amount of data, as there are multiple summarizations happening simultaneously. This is particularly noticeable when dealing with YouTube videos that have long transcripts, adding strain to both the system and the processing time.


Accomplishments that Iam proud of

  • Successfully built a tool that enables users to watch and understand any YouTube video in their preferred language.
  • Implemented real-time translation and TTS, significantly enhancing accessibility.
  • Overcame technical challenges to deliver accurate and seamless captions.

What I learned

This project was a valuable learning experience, especially in dealing with AI challenges. Some key takeaways include:
  • Handling Speech Synthesis with Varying Caption Lengths: One of the most important lessons I learned was how to manage speech synthesis when captions vary in length. Sometimes captions are long, and other times they are short, making it difficult to ensure that all the text fits within the available time. I had to come up with strategies to adjust the pacing and duration of the TTS to ensure that the full text is spoken within the given time frame without cutting off important parts.
  • Deep understanding of Google AI APIs and their integration.
  • Improved skills in building robust Chrome extensions.
  • Insights into managing user experience for multilingual tools.
  • Gained expertise in handling real-time data and overlays.

What's next for Translytic AI

I have a lot of plans for expanding and improving Translytic AI. Before diving into new features, my priority is to improve the current version by addressing any gaps and fixing existing issues. Once that’s done, I’ll begin working on additional features, such as:
  • OCR Technology for Images: Extend the app's functionality beyond YouTube videos by using OCR technology to extract text from images. This will allow users to translate and summarize images in the same way they do with videos.
  • Interactive Summarization Chat: I plan to introduce a feature where users can interact with the summarization results in a chat format. Users will be able to ask questions about specific parts of a video, creating a continuous chat thread based on the summarization, which enhances user engagement.
  • Video Presentation Creation: Another exciting feature is the ability to create small presentations from YouTube videos. This will help users quickly extract key points and create shareable content.
  • Additional Ideas: I also have more ideas in the pipeline, aiming to make the app more versatile and user-friendly.
  • Enhanced Language Support: Add more languages and dialects for greater inclusivity.
  • Offline Functionality: Enable certain features to work offline using pre-downloaded models.
  • Advanced Summarization: Incorporate detailed summaries with contextual highlights.
  • User Feedback Integration: Implement features based on feedback to improve usability.
  • Platform Expansion: Extend functionality to other video platforms like Vimeo and Dailymotion.

Built With

Share this project:

Updates