Inspiration
My sister and my mom are teachers, and I've seen how they prepare tests for their students. It takes a lot of time and is quite boring and repetitive, especially when they need to create multiple versions of the same test to prevent cheating. I realized I could help them by creating a tool that automatically generates such tests. If programmers can vibe-code and then review their code, why can't teachers vibe-create and then review tests?
What it does
It generates tests for language teachers (for now only english) based on parameters like level (A1, A2, B1, B2, C1) and prompt (topic)
How we built it
I built it in python creating a RAG pipeline. Firstly i extracted data from my textbooks using local LLM models with vision. Then i chunked the data and imported it into Qdrant database. After that i designed the retrival, and agentic algorithm - model plans, creates database queries, creates tests .etc. His work is separated into steps. For LLM API i used ollama cloud.
Challenges we ran into
- Deployment of backend. As you can see, my project is CLI aplication. Thas because I had lots of issues connected to deployment - for some reason Render always returned an error. Because of lack of time i decided it will be faster to ship demo as such application.
- Reading data from pictures. Firstly i wanted to use OCR and then LLM because it would be cheaper token-wise to analyze pictures with OCR locally and then send request to some clever AI to refactor data into json files. But it turned out OCR was doing pretty bad. I had to use local models which also were losing context on a big pages flooded with vocabulary. So i had to split those images into smaller parts. But when i split them into smaller parts i lost context again because some photos where cut in the middle of a picture. So i split them but this time into less parts (2 to be exact) and found more clever local model which analyzed it good enough and then cloud one was fixing output of the local one. That way i saved tokens and could automaticaly extract data from photos. ## Accomplishments that we're proud of Finishing the project at all. I know its very simple response, but this is my first RAG i have ever built myself. Even though I see lots of technical issues that need to be fixed i still am very proud of myself that i finished it and its usable. ## What we learned
- How to use Qdrant
- How to use RAG
- To never use easyocr package because its mid
- how to think about agentic workflows for ai models
- my python confidence grew stronger ## What's next for AutoTests AI Probably a good website and a backend deployment. Then i have to run multiple tests and analyze what the models are still doing wrong (because they are still doing wrong a lot of things). Then fix it and add more features like new languages. Also i should test various languages for prompting model. Does model works the same for all languages? I have to find out.
Log in or sign up for Devpost to join the conversation.