Multimodal track

Inspiration

Tracking a number of stray cats is a challenge in modern cities. Our goal was to facilitate this task by utilizing the natural propensity of people to take picture of the cats.

What it does

From a sent photo, identifies which stray cat it is from the database

How we built it

We used Meta's Llama via Together.ai (meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8) to determine if the photo was relevant (a cat) or not. Then in case of positive answer from the model, we pursued with image retrieval. We used the VGG-19 model to determine which cat the photo corresponds the most to in the database. Then asked the language model to formulate the answer, adding hypothetical information about the cat.

Challenges we ran into

It was hard to get the LLM to generate the desired content.

Accomplishments that we're proud of

The bot successfully recognize the cats in our small dataset and seems to formulate satisfactory answers.

What we learned

Designing a bot, using APIs and deploying models

What's next for Who is that cat

Complete information in the database, refine the vision model to recognize more accurately cats in a larger dataset.

Built With

Share this project:

Updates