VisuAI

Logo
Main Inspiration

Inspiration

As students who had struggled to pay attention in class, especially in our teen years we initially meant to create VisuAl as a solution to help improve the learning process for people with ADHD or learning disabilities. However, as we were testing our project with our friends and peers, their feedback allowed us to realize our program had more applications than what we initially expected. We realized that we found a way to make learning, one of the most fundamental parts of life, interesting and engaging by presenting seemingly complex topics in an entertaining way.

What it does

VisuAI as the name suggests pulls from a variety of LLM's and image generation models including Cohere.ai, Chat-gpt 4, and DALLE-3. Using these models we were able to train a system which produced a visually comic and flashcards for any given topic or lecture that is recorded. VisuAI works by converting audio inputs into text which is then fed to the LLM's with specific prompts to generate interesting story-like visuals to help engage the client with the subject. It is also generates relevant flashcards to asynchronously test the students knowledge as they go.

The education system has been criticized for having many problems for quite some time now, especially it came to how things were taught, the pace at which students learned, and the students interest in the taught subject, our solution aims to some extent, solve these major issues by presenting a seemingly impossible to grasp lesson/concept in a appealing, meaningful, way, which we especially believe caters to individuals with attention deficit disabilities.

How we built it

The process which we undertook to develop this software can be broken down into 4 components, Ideation, Frontend Development, LLM and ML integration, Backend Development, and Prompt Engineering.

Ideation: Through many iterations and late-night brainstorming sessions, we had developed many applicable ideas. We knew we wanted to do a project related to education specifically for those with learning disabilities. Looking through various resources including, the DSM-5 we found a common thread amongst beneficial learning strategies which is the implementation of visual aids and cues. From there it was just a matter of prototyping and executing.

Frontend: The frontend for our application is developed with React.js and was initially designed and prototyped in figma.

LLM/ML Integration: As previously stated our program uses a variety of AI resources from Cohere.ai to Chatgpt-4. After transcribing the audio inputs we were able to dynamically use the API's of these models to generate a variety of resources including the storyboard and flashcards. Initially, the outputs were very uncoherent and lackluster, however, we were able to develop a minimum viable product.

Backend Development: Backend development involved the use of Express.js, Node.js and Python to route the resulting data into the frontend framework.

Prompt Engineering: As previously mentioned our initial outputs from the models were not satisfactory. Although with much perseverance, and creative thinking we were able to meticulously craft prompts which produced far superior outputs that were far more visually appealing and relevant.

Challenges we ran into

Some challenges we ran into while we were working on VisuAI had to be getting the data pipeline working. The most difficult part of our project was finding an effective way to effectively transcribe audio, this was eventually solved using the react-speech-recognition library.

Other challenged included:

Ineffective prompting API key issues Lack of Library support Burnout issues

Accomplishments that we're proud of

Relatively inexpensive solution so it can be easily accessible
Well designed UI and Frontend
Surprisingly interesting generated results -Strong team collaboration ## What we learned

We learned how to effectively manage our time and overcome unique challenges presented by integrating such a large variety of technologies.

What's next for VisuAI

Could be an mobile APP so it can be used on the go, right now its hosted on the web so that may limit some of its capabilities
More lightweight model to reduce lag
More unique study tools such as, audio stories, Mini-games, and more
More accessibility to those who may be visually impaired
Use a database to store lessons so users can refer back to it