Visionary

Visionary logo and banner
Setting up a new note
An AI-generated note from a video capture
Video capture interface

Inspiration

From developing new work habits to increasing attention spans during lectures, getting used to the new learning environment is challenging. And as a team as first-year students, we decided to develop a tool for helping students with this transition.

What it does

Inspired by the popular workspace Notion, we decided to take it one step further: an automated note-taking system by streaming live video data. Taking notes is tedious and pulls our focus towards what the speaker has already said, rather than what the speaker is currently saying. This makes struggling to keep up quite common, and inhibits our ability to thoughtfully process what the speaker is saying. By having an AI-powered note-taking system take care of this arduous task for us, we are free to immerse ourselves deeper in the presentation.

How we built it

Our webapp is build in Next.js with TypeScript and styled with TailwindCSS. Our Notion-like editor is powered by Novel, a comprehensive WYSIWYG editor with AI powered autocomplete. Authentication is provided by Clerk.

The editor consists of a webcam component, which takes images at regular intervals and sends them to our Flask Python backend environment, where OpenCV filters and processes the image for EasyOCR to perform computer vision to extract text from. The extracted text is often erroneous, especially with handwriting, so the confidence parameter returned by EasyOCR is coupled with a brief user-provided topic/description and sent to GPT-4, which suggests a possible original phrase with a higher confidence parameter. The autocorrected keywords are passed on to the Cohere generative command-nightly model and considers an emphasis parameter for each keyword, which is calculated from the size, frequency, and duration that words appear for, and generates realistic and concise point-form notes on the extracted information.

The resulting text is then mapped to the appropriate Prisma Schema using GPT-4, which is then managed with CockroachDB for saving data.

Challenges we ran into

Hosting an OpenCV Python Flask is quite resource intensive, and we were unfortunately not able to find a suitable hosting platform during the hackathon. Instead, we hosted our API on a local machine.
Learning new APIs and technology, such as computer vision and NLP.
Engineering prompts for LLMs to produce the most desirable responses.
Our original plan to implement certain pieces of hardware into the project got cut short, due to unforeseen limitations of the hardware products.

Accomplishments that we're proud of

Stepping outside of our comfort zone!

What we learned

Basic computer vision and text recognition process
Use of different LLM models

What's next for Visionary

We originally planned to use AdHawk's eye-tracking glasses for the video input, but changes to the hardware over the past year made this impossible. However, the technology still exists, and is a possibility for the future.
Improvements in AI features, especially Cohere's classification and summarization beta features.
More comprehensive editor tools, especially UI/UX improvements such as responsive design and loading indicators

Built With

clerk
cockroachdb
cohere
flask
javascript
nextjs
novel
openai
prisma
python
react
typescript

Submitted to

Hack the North 2023

Created by

I worked on the backend of this project, creating the Flask app and handling the computer vision and NLP components.

Ryan Zhu
Worked on backend (Cohere and OpenAI GPT4 API integration), engineered and optimized prompts and model parameters for accurate and coherent output, transformed text-form notes to JSON format for Novel to read.

Adrian Tang
ashley zhao
Ishaan Dey

Updates

Ryan Zhu started this project — Sep 17, 2023 06:26 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.