TL;DR — Netify turns documents into narrated, animated video lessons. Upload a PDF, get back a teachable MP4. This very video was made by Netify. Built for edtech teams, trainers, and students who have the content but can't produce the video.

Inspiration

I built Netify because I have high school finals exams in a week, and I learn better from videos than from static documents.

A PDF can contain everything I need to know, but it does not always teach me. A good lesson has:

  • pacing
  • emphasis
  • examples
  • visuals
  • a voice that guides you through the material

That personal problem became a broader product idea. Edtech startups, corporate training teams, and students all have the same bottleneck: they have written content, but video is expensive and slow to produce.

Netify exists to make that conversion automatic.

Edtech is the right place to start because it gives PDF-to-video a real job. The system cannot just summarize a document. It has to understand the material, decide how to teach it, create visuals that clarify the ideas, and produce something people can actually learn from.

What it does

Netify is the AI layer that transforms documents into video lessons.

The flow is:

  1. A user uploads a PDF.
  2. Netify parses the document and extracts the core ideas.
  3. Netify plans a lesson structure.
  4. Netify generates custom animated scenes.
  5. Netify creates narration with ElevenLabs.
  6. Netify renders the final MP4.
  7. Netify streams progress while the job runs.

The result is not a slide deck with a voice track. Each scene is generated around the function it needs to serve: explaining a concept, comparing two ideas, showing a process, highlighting a statistic, or making dense material easier to follow. The graphics adapt to the content instead of forcing the content into a fixed template.

The current product has two surfaces:

  • A local web app where users can sign in, upload PDFs, store resources, start video generation, watch progress, and see generated videos.
  • A developer API foundation where a backend can submit a signed PDF URL, start a render job, and subscribe to progress events.

For the hackathon demo, the goal is simple:

Show a video explaining Netify that was itself generated by Netify.

How we built it

Netify started as a local PDF-to-video pipeline. During HackRome, I worked on turning it into a product.

The generation pipeline is multi-agent. It uses LLM calls through an OpenAI-compatible interface with DeepSeek as the current model backend.

The agents:

  • analyze the source document
  • organize the material
  • plan a video structure
  • choose the visual approach for each scene
  • generate TypeScript animation code
  • validate the result
  • repair failures
  • create narration
  • render the finished MP4

For video generation, Netify uses TypeScript, and React. This is what makes the output different from static slides: the system generates executable animated scenes with layouts, transitions, icons, charts, and motion.

For narration, I integrated ElevenLabs during the hackathon. The backend uses eleven_multilingual_v2 by default and generates MP3 voiceovers for each scene, with retry logic and audio duration detection so scenes can be timed around the narration.

For the product layer, I connected Supabase:

  • Supabase Auth handles users.
  • Postgres stores profiles, uploaded resources, and video metadata.
  • Private Storage buckets hold source PDFs and generated videos.
  • Row Level Security keeps each user's files and jobs isolated.

For the API layer, I added a render service with:

  • POST /api/render to start an asynchronous PDF-to-video job.
  • GET /api/events/:jobId to stream live render progress with Server-Sent Events.
  • Bearer-token authentication for protected API access.
  • Upload of completed MP4s back into Supabase Storage.

The web app uses Next.js route handlers as a server-side bridge. The browser never sees the Netify API key or Supabase service key. It asks the Next.js app to generate a video, and the server creates a signed PDF URL, calls the render API, and relays progress back to the browser.

Challenges I ran into

The biggest challenge was making an existing local pipeline run in a completely different environment under hackathon time pressure.

Netify had mostly lived on my cofounder's Mac. Today I had to:

  • get it running on Windows
  • wire it into a local web app
  • connect it to cloud product infrastructure

The second challenge was turning a script into something API-shaped. Local generation can assume files exist on disk. A product needs:

  • users
  • private storage
  • signed file handoff
  • job IDs
  • progress events
  • status rows
  • API keys
  • a path for returning the final video

The third challenge was reliability. Netify generates code, not just text. LLM-written TypeScript can look correct and still fail during rendering. The pipeline needs validation and repair loops so the final user experience is still: upload a document, receive a video.

Cloud deployment also became a practical blocker. The render backend is designed to become a containerized service, but the Scaleway setup required a credit card even though hackathon credits were available. Because of that, the current demo runs locally rather than as a public live generation service.

Accomplishments that we're proud of

I am proud that Netify now feels like a real product instead of only a local experiment.

During the hackathon, I integrated ElevenLabs so generated lessons can have a high-quality teacher voice. That changes the emotional quality of the output: the video feels guided, not just animated.

I also connected Supabase for the product backbone:

  • authentication
  • profiles
  • resource metadata
  • video metadata
  • private PDF storage
  • private video storage
  • row-level security

I built the start of the Netify API, including:

  • asynchronous render jobs
  • Bearer auth
  • signed PDF handoff
  • Supabase upload of finished videos
  • live progress streaming

I am also proud that the demo is honest: the project is being presented through a video made by Netify itself. That is the clearest proof of the idea.

Finally, Netify has already been tested on real financial literacy content from Finanz, including ETF education material. Those videos look promising, and they point toward a real edtech use case.

What we learned

I learned that PDF-to-video is only interesting when the goal is teaching.

A generic converter can summarize content. A learning product has to decide:

  • what matters
  • what order to explain it in
  • what visual form helps
  • what narration makes it easier to understand

I learned that voice is not a small feature. ElevenLabs makes the output feel much closer to a real teacher. For education, the difference between flat audio and expressive narration is huge.

I learned that the hard part of AI video is reliability. Generating text is one thing; generating renderable animated code is much harder. The repair and validation loops are as important as the first generation step.

I also learned how much product infrastructure is needed around an AI pipeline: auth, storage, job ownership, progress, API keys, and a clean path from source PDF to final MP4.

What's next for Netify

The next step is durable cloud rendering. Netify should run as a hosted render service, not just locally, so judges, students, and edtech teams can generate videos directly.

After that, I want to add:

  • webhooks
  • API-key management
  • a stable public developer API

That would let edtech companies integrate Netify inside their own products.

The most important product feature after generation is scene-level editing. Users should be able to regenerate or edit individual scenes instead of rerunning the whole video.

That is why the business model will likely be token-based: customers need flexible usage across generation, repair, editing, and rerendering.

I also want to add brand kits, so a company can upload its colors, fonts, tone, and logo once, and every video Netify generates matches its identity.

With OpenAI credits, I would replace the current DeepSeek backend with stronger OpenAI models for higher-quality reasoning, lesson planning, scene generation, and repair. With ElevenLabs Scale, I would make the teaching voice more emotional, adaptive, and language-aware.

Built With

Share this project:

Updates