Inspiration

Our team was inspired by the pervasive nature of technology, and how users consume content without any awareness of what they may be exposing themselves to. Parents constantly worry about what their kids are listening and watching, and recent political events have begun to cause a rift as users selectively submerge themselves within echo chambers that limit their world view. With the growth in GenAI as a tool to improve workflow efficiencies, our team created Transcribler to ease user pain points surrounding media consumption: time spent, content validity, and age appropriateness for underaged audiences.

What it does

This YouTube window extension gives users pertinent information about the video being displayed. Transcriber will provide a summarization of the video, sentiment analysis, age rating, toxicity, and fake news indicator. Users can also engage with a chatbot and ask details about the video content itself without leaving the video.

How we built it

First, we set up a VertexAI Workbench integrated with Github for development. Second, we worked on getting the transcription of Youtube videos as input for Gemini by using youtube-transcript-api. Third, we set up Gemini 1.5 Flash to ingest the input and return output and run a chatbot in real time based on the transcript. Fourth, we set up advanced prompts for each of our five topics of insight Gemini would have to provide. Guardrails were setup within the prompts for Age Rating and Toxicity. Fifth, the design team built the frontend prototype using Figma. Sixth, the frontend team used the prototype as a guideline to develop the frontend by creating a React frontend with Vite and Material UI that was able to display the Summary and Chat outputs of our LLM. Seventh, we integrated the frontend UI with a Flask API endpoint for our backend using a Google Virtual Machine in our GCP instance so that our UI could be populated with real data. Last, we deployed the full-stack application as a chrome extension using the Google Chrome Extension API.

Challenges we ran into

We weren't able to move the chrome extension to align with the right menu where the other available videos are shown. This way we wouldn’t disrupt the user visibility of the video by reading the chrome extension Gemini insights or interacting with the chatbot. Because of our fear in running into data breach, privacy, or compliance issues, especially when children are using it, the data is only stored during usage. If the user leave the extension, refresh or close the page, all the data is deleted and never to be found. We were not able to reduce the generation time latency of insights from the videos of 15+ seconds to less than 5 seconds. When Youtube videos do not provide transcription we need to use Gemini to transcribe. We built the solution but such a feature would increase our latency from 15+ seconds to 30+ seconds, so we did not integrate it into our solution. We hope parallelism will reduce latency and allow the Gemini transcription to be generated in less than 5 seconds.

Accomplishments that we're proud of

Running the backend on a GCP virtual machine was challenging but rewarding to see it running successfully. We built very complex prompts that even included Youtube guidelines and US Film Age Rating appropriateness guidelines as guardrails to the responses. We were very happy with the consistency of the responses in not infringing the guidelines within the answers.

What we learned

Deploying Gemini was fairly easy. We thought it would be challenging but it was straight forward. Our expert in GCP taught us valuable knowledge in using cloud for deployment and how to set up an endpoint environment to deploy our backend. Deploying the Chrome extension was so simple that we were shocked it was that easy.

What's next for Transcribler

We want to set up a Parental feature where parents could register to Transcriber to receive messages or emails alerts when the child or teen are watching content they weren't supposed to based on the insights provided by Gemini 1.5 Flash. Improve time latency generation of insights by Gemini 1.5 Flash from 15+ seconds to less than 5 seconds by parallelizing our 5 insight generation. Our ultimate goal is to deploy in Gemini Pro for higher accuracy if latency is kept below 5 seconds. Expand to allow upload of external videos that are not from Youtube. Lastly, if we win extra funding we would hire the services of a legal representative with expertise in data privacy to guide us in building any data feature. We also would like to add translation to it, so we are not only limited to English speakers but we can reach out to the Youtube global audience.

Built With

  • apis
  • axios
  • chrome-extension-api
  • cloud-services
  • databases
  • figma
  • flask-backend-rest-api
  • frameworks
  • gemini-api
  • google-ai-generativelanguage
  • google-api-core
  • google-api-python-client
  • google-auth
  • google-auth-httplib2
  • google-cloud
  • google-cloud-language
  • google-cloud-secret-manager
  • google-generativeai
  • googleapis-common-protos
  • langchain
  • langchain-google-vertexai
  • material-ui
  • numpy
  • or-other-technologies-did-you-use?-python-3
  • pandas
  • platforms
  • python
  • python-libraries-(see-`environment.yml`-for-details)
  • react
  • requests
  • rsa
  • scikit-learn
  • scipy
  • setuptools
  • transformers
  • venv-(environment-management)
  • vertexai-workbench-integrated-with-github-(for-development)
  • vite
  • youtube-transcript-api
Share this project:

Updates