Inspiration

I often watch cooking videos on Youtube. Even though I like the videos and plan on cooking the recipes at home, I often end up being too lazy to identify and buy all of the ingredients. This is my attempt at solving this dilemma.

What it does

Recitube fetches video transcripts from Youtube. It then instruments the Chrome AI API to perform ingredient extraction in a "vanilla-RAG" fashion. It provides convenient search links to Amazon and Walmart enabling friction-free shopping.

How we built it

This is the first Chrome extension I have ever built, but the experience was quite straight-forward.

Challenges we ran into

  • Some (food)tubers are very talkative. This poses a challenge for the small on-device models context length. This seems like a problem that technological progress will be solving soon though.
  • Structured prediction & forcing the API to produce valid json that can be properly parsed and displayed. My prompt seems to work reasonably well, but I have seen it fail every once in a while.
  • Some youtubers don't explicitly mention all of the ingredients / maybe only show them in the video. This is of course not something that can be solved by only looking at the transcript.

Accomplishments that we're proud of

It is pretty awesome how often I have used the extension since I built it - it is just so convenient.

What we learned

  • Getting LLM-backed apps off the ground is surprisingly easy and powerful
  • Prompt engineering is essential to achieving good performance

What's next for Recitube

  • Combine transcript with Video description to improve ingredient extraction (no need to skim through transcript if the content creator was kind enough to already provide an ingredient list in the video description)

Built With

Share this project:

Updates