AlphaTutor

Inspiration

Generative models stand to completely democratize education. However, students can only interact with them through a chat interface. With AlphaTutor, we're making AI tutors more robust by training them to provide videos, games, simulations, and real-time feedback to students. It's like the AlphaZero of teaching -- combining multiple modalities to improve how AIs teach students. We believe that, by making this platform available to the world, we stand to massively disrupt the online education market.

What it does

AlphaTutor is a comprehensive tutoring app that combines the power of videos, games, and real-time feedback to provide a dynamic and immersive learning experience. At the core of AlphaTutor is a chatbot that walks a student through a concept by tailoring its responses to a student's answers and assessments.

Using a knowledge tree, it helps a student gauge where they are in their understanding of a concept. Then, we take it a step further. If the AI believes that another type of content would help, it will provide the student with videos and inline games. For example, if a student has trouble understanding gravity, AlphaTutor will display an animation of the Falling Feather experiment. To show the potential for inline games, we also coded up a Jeopardy game that is dynamically generated to reinforce ideas that the student and tutor have been discussing.

In the future, the app will also incorporates additional interactive games and challenges, providing a well-rounded learning experience. This real-time feedback system analyzes students' progress and provides personalized guidance, helping them overcome obstacles and achieve academic success.

How we built it

For AlphaTutor, we used React to build the frontend. On the backend, we employed tools like GPT-index and Langchain to effectively manage and process data. To enhance its language capabilities, we integrated the GPT-4 model, a state-of-the-art language model.

Challenges we ran into

Developing AlphaTutor was a complex process with several challenges. The first major hurdle was dealing with the latency in interactions with the language model, which made interactions slower than we would have liked. Additionally, we struggled to devise effective prompts for the model, which was crucial in generating interesting and diverse games, videos, and feedback.

The second key challenge lay in the token limitations within the context windows. This constraint meant that there was a cap on the length and complexity of the text that the model could handle in a single interaction. This limitation presented a significant challenge in generating a variety of games, particularly those that required longer or more intricate text inputs. However, we were surprised how it was still able to generate structured code for a game as complex as Jeopardy.

Accomplishments that we're proud of

We built a tool that we ourselves would use for learning various topics, effectively tapping into our sense of curiosity. We personally found it much more interesting to interact with an agent through videos and games than through text, and think that this kind of multi-modality is crucial to developing "stickier" AI experiences.

What we learned

The biggest thing we discovered is that you have to code in a "hybrid programming" manner in order to make the most use of LLM outputs. Effectively, our LLM ends up generating a ton of structured data (game code, feedback, knowledge graphs, video links) that we then need to parse in order to display them to a user. However, surprisingly, we found that this was actually quite feasible and that, with the right prompting, we could get good structured responses.