About the Project
Inspiration
The inspiration for Caption Detective came from the common frustrations users experience with misheard captions on YouTube. As someone who frequently watches educational content, I noticed that misinterpretations of proper nouns, especially technical terms, could lead to confusion and misunderstandings. I wanted to create a solution that would enhance the viewing experience by leveraging on-device AI to correct these errors in real-time.
Challenges Faced
One of the main challenges I encountered was the limited capability of the Gemini-nano model. While I initially aimed to have the AI read entire sentences and make corrections, the model struggled to understand the context and instructions accurately. This led me to break down the implementation into smaller tasks, which ultimately resulted in a more effective process for extracting and verifying proper nouns.
Additionally, I faced challenges in ensuring seamless integration with YouTube's caption system, as well as managing the state of the extension effectively. Despite these hurdles, the experience was incredibly rewarding, and I am excited about the potential for future improvements as more advanced AI models become available in Chrome.
Implementation Overview
The core functionality revolves around the proper noun extraction process, where the extension identifies and corrects misheard terms in YouTube captions.
To address the limitations of the Gemini-nano model, I broke down the implementation into several smaller tasks:
Extracting Proper Nouns: The extension first extracts proper nouns from the video title and description using a structured prompt. This involves sending the title and description to the AI model and receiving a list of identified proper nouns.
Verification of Proper Nouns: After extraction, the identified proper nouns are verified to ensure accuracy. This step filters out any non-proper nouns from the list.
Checking for Suspicious Nouns: The extension then checks for nouns that may not have been transcribed correctly, allowing for further refinement of the results.
Generating Related Proper Nouns: Finally, if enabled, the extension generates related proper nouns based on the context of the video. This is done by detecting the domain of the content and brainstorming related terms, enhancing the overall accuracy of the corrections.
Replacing Proper Nouns: The replacement of proper nouns in the captions is performed using a custom implementation. Initially, I attempted to use Chrome's Rewrite API, but it did not yield the desired results. The Prompt API provided some improvements but inadvertently altered unwanted parts of the text. As a result, I opted for a manual replacement approach, utilizing a phonetic matching algorithm to find close matches for the proper nouns. This method allows for more precise replacements, although it could be further improved as more advanced AI models become available.
Future Directions
The possibility exists that as more advanced AI models become available, this extension could learn the context of the video as it plays, leading to improved results over time.
Built With
- gemini
- plasmo
- typescript
Log in or sign up for Devpost to join the conversation.