Cutting edge generative AI to implement a full stack AI tutor for helping explain any idea in an interactive way with diagrams. It can understand photos of maths and output as a PDF.

Inspiration

We were frustrated sitting in lectures and having to ask ourselves what that concept on the slide or maths on the whiteboard meant. We wanted to make an AI-powered website to allow everyone to leverage AI to have it explain the concept to you in an interactive way.

What it does

The user can ask about a topic they don't understand or are curious to learn about. The AI will then ask them questions about what they want to know. It will then produce an interactive diagram representing the key components of the topic in a graphical way. The user can click on any part of the diagram to get more detail about an individual component. For example, if they ask about TCP-IP, the AI might clarify that they want to know about the protocol layers. It can then create a diagram with information about the 4 main protocol layers. The user can then click on an individual component (e.g., application layer) and get an even more detailed breakdown. They can also ask for help understanding part of the topic or clarifying the diagram using our AI Question Answer system.

For those who like to work out maths problems on a board or are trying to understand and take notes from the lecturer's scribbles, we have the custom tool for you. Our system can scan handwritten maths and produce both LATEX code and a PDF file with the maths cleanly typed out with a brief explanation. This can be used as part of an integrated feature with our main diagramming tool. Uploading an image, the AI will be able to understand the maths and help explain it to you. For example, if you are unsure about some maths, the AI may recognise it as a part of calculus and be able to break down this topic with interactive diagrams for you. You can then discuss this maths with our Question Answer system AI, which can help you to understand by answering questions and providing clarifications.

This open-source tool should be invaluable in helping improve accessibility in the classroom and beyond. It can make formatted maths, diagrams, and explanations for your notes. It can also help explain a topic to you with interactive diagrams and a question-answer system to help understand the diagram or maths.

How we built it

This project's final version was implemented using Streamlit. This hosts a full-stack mobile compatible website that allows for file uploads and interactive diagrams. This is powered by a Python backend that accesses powerful large language, diffusion, and computer vision models from OpenAI. We utilised chain of thought and one-shot learning to help guide our system to produce clean PDFs of maths or to break down topics into explainable chunks.

In our prototyping, we experimented with other technologies, including using open-source Llama models from Hugging Face. We also implemented some of the backend in Databricks and tried to access AI models through it.

Challenges we ran into

We had issues using Llama models, but we grew our knowledge about LLMs in the process and resolved them. We were able to access, through Hugging Face, an 8-billion parameter variant of Llama 3. However, it was unable to follow instructions. After researching instruction fine-tuning, we used an instruct version to try to make UML and SVD diagrams from descriptions. This model was not able to do this well. The 80-billion parameter variety was able, but we were not able to access it for free or run it locally.

At this point, we were also trying to implement the backend in Databricks. We liked how easy it was to get started with it, but we were unable to access the necessary Llama models through it. We also struggled to expose our cluster through an endpoint to build an app on top of it.

We decided to use OpenAI's GPT-4 and 4o models as well as their DALL-E models to fix this model access issue.

Accomplishments that we're proud of

We are proud to have taken on an ambitious challenge as a team and to have created a solution that involved singificant technical growth about generative AI and full-stack development.

What we learned

We learned lots about the challenges and issues with implementing full-stack websites that are mobile compatible. We had to iterate and prototype various ideas, including using Databricks, before we settled on using Streamlit for everything to minimise integration issues.

We also **grew our knowledge" about AI models, particularly the differences between large language models and which one was more suited to our task. We learned about the different quantised versions of Llama and which we could run vs which were capable of our use case. We learned about the necessity of instruction fine-tuning as we struggled to make a large language model do as we wished. We also learned about accessing powerful models remotely through OpenAI's APIs.

What's next for Explainatron - 3000

We want to find a suitable open-source language model that can be accessed for free to power the backend. While OpenAI's models were a lifesaver, we want our final product to be as accessible as possible, meaning it must be free and truly open-source.

We also want to get user feedback on our deployed website to help refine its capabilities to best help people in their personal learning growth.

Built With

  • databricks
  • diffusion
  • latex
  • llama
  • llms
  • openai
  • react
  • streamlit
Share this project:

Updates