Inspiration

Everybody we know is a visual learner. Everybody we know also hate poorly written lecture notes. We realized there's so much time being wasted trying to absorb content from different sources of content. We also use tools like ChatGPT which fails to understand context especially when it comes to STEM content with research papers, tables, equations, etc. As CS students, this inspired us to use intelligent tools and AI frameworks to create Newt.

What it does

Newt is an AI tool that takes in your lecture content (that you want to learn or teach) and converts it into Khanacademy or 3Blue1Brown like videos. At its crux, it's a video generation AI agent that uses its chain-of-thought reasoning to decide how to architect a beautiful, interactive visual animation to enhance your learning.

How we built it

Newt's backend is built was written in Python and we utilized tools like Agno (AI framework) and FastAPI (web framework for building APIs). We setup an API endpoint in the backend the frontend can call. Newt's frontend was created using Next.js, Tailwind CSS and shad/cn, and was written in Typescript.

The AI agent instance in the backend has been setup with a Knowlege Base. This sources the documentation of the animation library and is used by the AI agent to architect the layout of the animation.

Challenges we ran into

While the AI's code was good, we noticed these caveats:

  • some elements go out of bounds of the video repeatedly
  • there is overlapping between components since the AI fails to understand the exact position of an element
  • scaling can be expensive: the core function of the agent can be GPU-taxing and this can cause deployments to be expensive and slow
  • rendering takes time: once we retrieve the Manim code, compiling it, then rendering it and sending it back to the client takes nearly 5 minutes

Accomplishments that we're proud of

  • A really clean animation with minimal outside prompting : We asked for a simple Depth-first-search concept animation and retrieved a really result without over-prompting the agent.
  • A really clean frontend: courtesy of shad/cn and Tailwind.

What we learned

We learnt some really cool things a long the way. These include - effective prompt engineering, creating a strong knowledge base, setting a retrieval augment agent, understanding how model context protocol works in order to connect business logic to client data, and how LLMs work with a lot of data.

What's next for Newt

We are laser focused on improving the quality of the code-output from the agent, it's still far from perfect. We spend the first night just experimenting with different ways we could run the agent - with memory, without memory, few-shot prompting, with/without reasoning, with/without knowledge, with/without search tools, and this gave us huge insights into how to develop an agent that works at scale for varied types of queries. We hope to improve our ingestion engine to increase the speed between the the user uploading their documents, to getting a fully-fledged awesome animated video back.

Built With

Share this project:

Updates