Inspiration
Internet made information accessible while AI making knowledge accessible. But knowledge is multimodal. So I wanted to build a chat where you can leverage multimodal inputs to learn new things by practicing, and in your own sytle.
What it does
It takes videos, pictures, PDFs as input and teaches you new things. You can upload an hour long lecture videos and it will watch the video, extract lecture notes, prepare quiz cards to help you learn from it.
It leverages multiple Agents for different capabilities:
- Extract lecture notes: Uses gemini to watch the lecture video and extract notes
- Analyze image: Uses gemini to describe the image
- Analyze pdf: Uses gemini API to analyze the pdf
- Ask to math Agent: Uses DeepSeek-R1-Distill-Llama-70B for getting help with math questions
- Ask to code Agent: Uses Claude Sonnet 3.5 for getting help with code questions
- Ask to science Agent: Uses DeepSeek-R1-Distill-Llama-70B for getting help with science questions
- Search web: Uses exa.ai to retrieve information from the web. Feed the text from first 5 urls to model.
How we built it
Used agno.ai for processing multimodal data, and used Sambanova with function calling for the base chat model.
Challenges we ran into
Building the interface, and processing long videos
Accomplishments that we're proud of
Video to lecture notes by splitting long videos into small chunks and using gemini to prepare a complete lecture note markdown
What we learned
Spare more time for building the UI...
What's next for Spilazzola
Adding whiteboard and shared session.
Built With
- agno
- exa.ai
- sambanova
Log in or sign up for Devpost to join the conversation.