Inspiration
We all know the power of standing next to a good teacher at a white board as they smoother all your confusions through step by step guided walk through
What it does
We do anything the same magic that teacher + white board can do it, using AI!
How we built it
We reversed engineering current white board software's, creating our own unique coordinate system and ability for multi-modal ability + dual system updates. From here we used Google Cloud Vision API to analyze the current state of the white board (OCR) and used an efficient coordinate algorithm overlay to convert the photo not only to real world user data but also with accurate spatial relations of all the strokes on the white board. Doing this we were able to have real time data of users strokes and inputs, feeding all of the individual data into Gemini flash and Vapi. By doing so we were able to give Gemini's as well as Vapi context to provide feed back for the user in Json / Speech format. We used in depth prompt engineering to turn Gemini's and Vapi's outputs and walk through the users problem like an exceptional teacher providing direct hints and guidance directly through the whiteboard and speech. All of this happened in a matter of a few seconds, the low latency thanks to the OCR Coordinate System + Flash set up we choose.
Challenges we ran into
We took a major risk with this project due to its many technical layers. The first was accurately converting the our whiteboard into a machine readable data with proper spatial relations, so that we can provide feedback, comments, and highlights in the correct locations. We decided to do OCR due to its speed over an LLM in order to quickly covert to text and then be able to use our algorithm to find coordinates (something an LLM is incredibly unreliable for). This coordinate algorithm itself took many hours to streamline but was the back bone between proper communication through the white board between teacher ai and user. Once we had the correct coordinates we had to work to perfectly prompt our models to provide useful guidance as well as having it printing on the white board in the correct locations so the user could seamlessly understand how our AI was aiding them, just like a teacher! Finally we added the ability for Vapi to also help you through understanding the context and answering any questions through voice.
Accomplishments that we're proud of
We almost perfected this difficult coordinate grid being able to identify and directly highlight specific characters we want to address that the user wrote. For example if the user accidentally added a - sign, we could isolate in on the white board, highlight it, and provide feed back.
What we learned
We learned the power AI has to providing in depth education to all, but the meticulous steps that must be completed in order to provide a thorough learning experience. We worked with many workflows and systematically improved to the most efficient version for updates. Providing helpful and fast guidance.
What's next for Whiteboard
teacher Implementing with tools such as Kahn academy, One notes, Good notes, and other white board related tools or even providing this as its own service could be very beneficial for individuals to have their own teachers.
Built With
- deepmind
- gemini
- ocr
- python
- react
- vapi
- visionproject
Log in or sign up for Devpost to join the conversation.