Inspiration
LLMs can be a useful resource for getting help with math problems you are struggling with. However, feedback can be very general and visualizing concepts can be a struggle. Also it is very inconvenient and time/memory consuming to constantly take screenshots or manually write down and input into chat agent for help on work.
What it does
BrickStein can take screenshots with a simple button press and will provide guidance with any problem circled in red. Concepts can also be visualized using the Manim Python library simply by prompting the bot to create a video which includes graphs, drawings, and TTS audio that helps elaborate on the context. The bot also supports audio input.
How we built it
We used Streamlit as our front-end for a LangChain agent that handles the user prompts and generates Manim visualizations. OpenCV is used for processing the screenshot and extracting the circled problem. Audio input is processed into text using Google Chirp Audio AI. GPT-4o API is used for the agent and code generation and Manim and MoviePy is used for video generation. Tavily API is used to give the agent web access.
Challenges we ran into
The uncertainty of LLMs can make it difficult when properly formatting responses from the chat-bot. Bug fixing for automatic code generation or tool calling in the architecture. Using OpenCV to crop out just the highlighted portion of the screen capture to minimize token usage and maximize response quality.
Accomplishments that we're proud of
Extracting the math problem with a simple button press. Seemlessly converting raw StreamLit output into visually appealing LaTeX formatting. Converting audio into text automatically. Zero direct function calls in the code as the framework allows the agent to call all tools in the program on its own using its own discretion (this is how the agent decides when to take screenshots or generate videos based on user input and context) Automatic code and audio generation utilizing GPT-4o api which is run with AI terminal access and created with Manim and MoviePy libraries.
What we learned
How agents interact with users by calling tools. How to reduce model output variability with prompt engineering.
What's next for BrickStein
Better explanations and Manim videos by refining the model prompts. Feedback loops for better code generation and thus better quality videos. Textbook database for RAG combined with priority list defined in prompt so the AI agent searches for relevant response in a trusted or relevant piece of literature first before relying on the internet. Add a hint feature. Revamp UI.
Built With
- google-web-speech-api
- langchain
- langgraph
- manim
- moviepy
- openai
- opencv
- python
- streamlit

Log in or sign up for Devpost to join the conversation.