Inspiration

From developing new work habits to increasing attention spans during lectures, getting used to the new learning environment is challenging. And as a team as first-year students, we decided to develop a tool for helping students with this transition.

What it does

Inspired by the popular workspace Notion, we decided to take it one step further: an automated note-taking system by streaming live video data. Taking notes is tedious and pulls our focus towards what the speaker has already said, rather than what the speaker is currently saying. This makes struggling to keep up quite common, and inhibits our ability to thoughtfully process what the speaker is saying. By having an AI-powered note-taking system take care of this arduous task for us, we are free to immerse ourselves deeper in the presentation.

How we built it

Our webapp is build in Next.js with TypeScript and styled with TailwindCSS. Our Notion-like editor is powered by Novel, a comprehensive WYSIWYG editor with AI powered autocomplete. Authentication is provided by Clerk.

The editor consists of a webcam component, which takes images at regular intervals and sends them to our Flask Python backend environment, where OpenCV filters and processes the image for EasyOCR to perform computer vision to extract text from. The extracted text is often erroneous, especially with handwriting, so the confidence parameter returned by EasyOCR is coupled with a brief user-provided topic/description and sent to GPT-4, which suggests a possible original phrase with a higher confidence parameter. The autocorrected keywords are passed on to the Cohere generative command-nightly model and considers an emphasis parameter for each keyword, which is calculated from the size, frequency, and duration that words appear for, and generates realistic and concise point-form notes on the extracted information.

The resulting text is then mapped to the appropriate Prisma Schema using GPT-4, which is then managed with CockroachDB for saving data.

Challenges we ran into

  • Hosting an OpenCV Python Flask is quite resource intensive, and we were unfortunately not able to find a suitable hosting platform during the hackathon. Instead, we hosted our API on a local machine.
  • Learning new APIs and technology, such as computer vision and NLP.
  • Engineering prompts for LLMs to produce the most desirable responses.
  • Our original plan to implement certain pieces of hardware into the project got cut short, due to unforeseen limitations of the hardware products.

Accomplishments that we're proud of

  • Stepping outside of our comfort zone!

What we learned

  • Basic computer vision and text recognition process
  • Use of different LLM models

What's next for Visionary

  • We originally planned to use AdHawk's eye-tracking glasses for the video input, but changes to the hardware over the past year made this impossible. However, the technology still exists, and is a possibility for the future.
  • Improvements in AI features, especially Cohere's classification and summarization beta features.
  • More comprehensive editor tools, especially UI/UX improvements such as responsive design and loading indicators

Built With

Share this project:

Updates