Perishable food and retail goods are significant contributors to industrial GHG emissions, and yet 30-40% of food that is produced worldwide is wasted. In particular, at home and supermarket sources contribute to a combined total of 80% of total US food losses. Given the popularity of existing meal planning apps, we focus our attention on consumer choice. Specifically, we develop an application that uses computational linguistics to suggest recipe ingredient alternatives.

What it does

decarbonate is a Google Chrome extension that parses recipe pages in order to find ingredients with lower carbon footprints. We measure carbon footprint as the CO2 equivalent per kg or liter of a food commodity. We have built in an additional back-end filter for water footprint, but it is not yet visible to users. Flip through our image gallery to see some examples!

How we built it

Ingredient Parsing and Entity Normalization. We generate graphical representations of the dependencies for each ingredient in a recipe, and then we traverse the children of the subject and root tokens. We search for adjectival or appositional modifiers, noun phrase adverbial modifiers, compounds, and objects of prepositions. We remove vulgar fractions and measurement tokens, and then lemmatize and strip (punctuation, whitespace, casing) our output.

We create TF-IDF vectors for each of our normalized ingredients. We then use cosine similarity to match vectors against the names of foods with known emission impacts. To do this, we use PolyFuzz because it allows us to compute cosine similarity quickly (using sparse_dot_topn) and with lower memory requirements. We use the default n-gram range of (3,3) with a similarity threshold of 75%.

Alternative Ingredient Selection. Our final selection of ingredient alternatives is based on two sources. The first source is the emission data itself. Food names are aggregated at multiple levels (e.g., name, group, type), and we leverage the type field to make recommendations across names. For instance, we would suggest "Rasberry (Openfield)" (0.632) instead of "Rasberry (Heated Greenhouse)" (7.350) or "Rasberries (Frozen)" (1.175). Our second source is Wolfram Mathematica. We generate and parse a list of all food types for which the platform supports substitution, and then combine those with our suggestions from the emissions data. For example, the best alternative to "Blackberry" is "Rasberry" (Footprint = 7.350).

We experiment with Sentence Transformers (specifically "paraphrase-MiniLM-L6-v2") to generate embeddings for each ingredient. However, we find that these embeddings are not sufficiently granular for usage on our data (e.g., beer is closest to fast food).

Real-time ingredient replacement. We edit the HTML of each recipe website in order to replace ingredients in real time.

Challenges we ran into

Ingredient names are challenging to parse due to the presence of colloquial language in measurement unit descriptions (e.g., "handful of kale"). This points to a broader issue surrounding ambiguity within language describing food names. Some sources are meticulous in documenting units, while others give instructions at a high-level. Given more time, we would do a deeper exploration of the USDA dataset for food names and groups in order to implement an intermediary text normalization step. We would also improve the logic of our ingredient replacement tool, which currently relies on search and replace.

What's next for decarbonate

Taken together, carbon- and water-based emission filtering can provide consumers with strong alternatives to products that they may not know are harmful to the environment. We would ideally like to integrate emission scores into existing meal plan or recipe search tools.

Share this project: