Inspiration
With billions of people in the world, everyone learns differently. Yet education systems still tend to treat students the same, delivering identical lectures at the same pace and offering support only after assessments reveal problems. We were inspired by the idea that personalisation should happen during learning, not after it.
We wanted to build a system that could recognise when a student might be confused while they are watching a lecture and respond immediately. Instead of reacting to poor performance, Accio aims to prevent misunderstanding in the first place.
What it does
Accio is an autonomous multi-agent system that delivers personalised learning in real time.
A student uploads a lecture recording, and we use Gemini’s multimodal capabilities to process the video and audio as context. Students can also upload lecture slides for a richer understanding of the material.
Accio breaks the lecture into clear subtopics using Gemini. As the lecture plays in the browser, our system monitors learning signals such as pauses and rewatches.
We trained a machine learning model on a dataset of approximately 50,000 students’ viewing behaviour. This allows Accio to recognise behavioural patterns that may indicate confusion. When behaviour appears unusual, the system asks the student whether they would like clarification.
If the student accepts, Accio sends Gemini the recent lecture context along with the current video frame as an image. This ensures that the explanation is specific to the exact slide and moment in the lecture. If the student declines, the system adjusts its understanding of that behavioural signal for that individual. Over time, Accio becomes personalised to the student rather than relying solely on general trends.
After the lecture, Accio generates a targeted quiz weighted towards topics where the student may have struggled. It then produces a personalised summary emphasising weaker areas. We also implemented a custom solution to embed relevant slide images into the summary so that important visual context is preserved.
For students who prefer conversation, Gemini Live enables a natural one-to-one dialogue experience, which is particularly useful for accessibility needs such as poor vision.
How we built it
Accio was built collaboratively, with responsibilities divided according to our individual strengths. Some of us focused on training the machine learning model, others worked on backend systems and orchestration, while others concentrated on frontend development and integrating Gemini’s capabilities.
Although we divided tasks, we regularly supported each other when technical challenges arose. This teamwork allowed us to progress efficiently while maintaining cohesion across the system.
Technically, we combined behavioural modelling, multimodal processing, mathematical and dimensional mapping and real-time browser monitoring. We also designed a custom algorithm to extract images from lecture slides and embed them into generated summaries without compromising quality.
Challenges we ran into
One major challenge was finding a suitable dataset of student viewing behaviour. It required extensive research to identify data that was large enough and relevant to our use case.
Training the machine learning model was another significant challenge. Behavioural data can be noisy and difficult to interpret. Fortunately, one of our teammates had prior experience in this area, which helped us refine the model and resolve issues more effectively.
We also encountered limitations with large language models when working with PDFs. Initially, summaries were text only because the model could not directly extract and embed slide images. We overcame this by building a custom image extraction and alignment approach, demonstrated in our video.
Accomplishments that we're proud of
We are particularly proud of building a system that adapts to students in real time rather than relying only on post-lecture assessments.
We are also proud of successfully training a behavioural model using a large dataset and integrating it seamlessly with Gemini’s multimodal capabilities.
Finally, solving the challenge of embedding slide images into generated summaries was a major technical milestone that significantly improved the quality of the final output.
What we learned
We learned that effective personalisation depends on continuous feedback and careful interpretation of behavioural signals. Not every pause or rewind means confusion, and systems must adapt to individuals rather than apply fixed assumptions.
We also learned the importance of collaboration. Dividing work based on strengths while supporting one another allowed us to overcome complex technical challenges more efficiently.
What's next for Accio
Next, we want to refine the behavioural model further by incorporating more diverse learning signals and expanding the dataset.
We also aim to improve conversational capabilities, making the interactive experience even more natural and accessible.
Ultimately, our goal is to continue developing Accio into a scalable, affordable alternative to traditional tutoring, making personalised education more accessible to everyone.
Built With
- fastapi
- google-gemini-api-(including-gemini-live)
- google-oauth
- html/css
- javascript
- joblib
- latex
- numpy
- paid.ai-api
- pymupdf
- python
- scikit-learn
- sql
- sqlite
- supabase
- tectonic
- typescript
- vite
- web-audio-api
- websockets
Log in or sign up for Devpost to join the conversation.