Multimodal track
Inspiration
Tracking a number of stray cats is a challenge in modern cities. Our goal was to facilitate this task by utilizing the natural propensity of people to take picture of the cats.
What it does
From a sent photo, identifies which stray cat it is from the database
How we built it
We used Meta's Llama via Together.ai (meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8) to determine if the photo was relevant (a cat) or not. Then in case of positive answer from the model, we pursued with image retrieval. We used the VGG-19 model to determine which cat the photo corresponds the most to in the database. Then asked the language model to formulate the answer, adding hypothetical information about the cat.
Challenges we ran into
It was hard to get the LLM to generate the desired content.
Accomplishments that we're proud of
The bot successfully recognize the cats in our small dataset and seems to formulate satisfactory answers.
What we learned
Designing a bot, using APIs and deploying models
What's next for Who is that cat
Complete information in the database, refine the vision model to recognize more accurately cats in a larger dataset.
Log in or sign up for Devpost to join the conversation.