posted an update

Title

AlphaChef - Build a seq-to-seq model to generate recipes from food videos.

Who

ChenHao Lu Nan Chen (nchen40) PingYao Shen (pshen6) Ziao Zhang

Introduction

We are trying to build a model that auto-generates recipes from food videos. While food video is a great way to learn cooking, some videos are too long and may be difficult to memorize all the steps mentioned in the video. We believe it will be useful if we can generate a precise text recipe that gives the learner an overview of the cooking steps.

Related Work

We found an existing paper that are related to our goal: UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

The paper will only be our reference and starting point.

Data

Youcook2 http://youcook2.eecs.umich.edu/.
Contains 2000 cooking videos on 89 recipes with 14K video clips.

Methodology

We will be using the already preprocessed video data from the YC2 dataset, and build our own preprocessing method for the text data. They will then be encoded separately with two encoders. These encodings will go through a transformer to generate the output text descriptions.

Metrics

We will use common NLP metrics like perplexity, BLEU, etc., to evaluate the performance of our model.

Ethics

1. We decide to use Youcook2 as our dataset. We notice that most of the videos are about western dishes. Given that all the videos and recipes in our dataset are in English, this uneven distribution is understandable. Nevertheless, this might affect the model’s performance when we try to generate recipes for certain types of dishes.

  1. The stakeholders of this project might be those who are trying to learn to cook a new recipe. Unclear instructions in our generated text recipe may lead to waste of food ingredients, or more severely, kitchen accidents.

Division of labor

ChenHao Lu: encoder and decoder Nan Chen: preprocess text and video data PingYao Shen: preprocess text and video data Ziao Zhang: encoder and decoder

Log in or sign up for Devpost to join the conversation.