Spilazzola

Inspiration

Internet made information accessible while AI making knowledge accessible. But knowledge is multimodal. So I wanted to build a chat where you can leverage multimodal inputs to learn new things by practicing, and in your own sytle.

What it does

It takes videos, pictures, PDFs as input and teaches you new things. You can upload an hour long lecture videos and it will watch the video, extract lecture notes, prepare quiz cards to help you learn from it.

It leverages multiple Agents for different capabilities:

Extract lecture notes: Uses gemini to watch the lecture video and extract notes
Analyze image: Uses gemini to describe the image
Analyze pdf: Uses gemini API to analyze the pdf
Ask to math Agent: Uses DeepSeek-R1-Distill-Llama-70B for getting help with math questions
Ask to code Agent: Uses Claude Sonnet 3.5 for getting help with code questions
Ask to science Agent: Uses DeepSeek-R1-Distill-Llama-70B for getting help with science questions
Search web: Uses exa.ai to retrieve information from the web. Feed the text from first 5 urls to model.

How we built it

Used agno.ai for processing multimodal data, and used Sambanova with function calling for the base chat model.

Challenges we ran into

Building the interface, and processing long videos

Accomplishments that we're proud of

Video to lecture notes by splitting long videos into small chunks and using gemini to prepare a complete lecture note markdown

What we learned

Spare more time for building the UI...

What's next for Spilazzola

Adding whiteboard and shared session.

Built With

agno
exa.ai
sambanova

Updates

batuhan aktaş started this project — Feb 16, 2025 05:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.