Inspiration

I want to democratize design AI for everyday people using PowerPoints and Google Docs to connect with their audience. As a product manager, I built assistive design products for M365 suite (PowerPoint, Word, Excel). The products were used by teachers, students, consultants and everyone in between. The recurring feedback was always the tools like PPT or Google Slides have not evolved to serve emerging forms of content creation and storytelling. The visual output is always boring. I wanted to fix that

What it does

DocTok takes webpages and documents as input and returns an engaging TikTok style feed that can be shared. The feed is powered by different AI agents. Creators have to provide the URL and select characters. For each Webpage/Document a social post is created. DocTok will generate fun explanations of the topic in the document/web page from the point of view of the chosen character. There will be voice over by a compatible agent, and a relevant GIF from giphy to spruce the content up.

How we built it

I solo hacked on Replit with help from GPT, Ghostwriter to do the coding. I’m using GPT4 to generate summaries, appropriate title for each post. I’m using GPT 3.5 to generate a keyword per post based on the summary sentiment. I’m passing the keyword via GIPHY API to get the relevant GIF. The voice over is from eleven labs; a compatible voice agent is selected based on the character selected.

Challenges we ran into

Exceeding rate limits for GPT4. Request taking too long and Replit dying
Getting the right training data to program synthetic voice agents to match the character chosen by the user. I had ideas to program an agent that can transform documents to other forms of visual interfaces like Pinterest boards, audiobooks etc. However, it took a lot of prompting to get it to provide high quality visual output. I had to zero in on a few problems to tackle. And decided to focus on transforming documents into TikTok style feeds.

Accomplishments that we're proud of

Weaving multiple AI agents to generate every piece of output - code, text, voice over, gifs.

What we learned

In context learning is quite powerful for coding tasks. Ghostwriter is good with targeted bug fixing. GPT4 is excellent in proactively catching coding gotchas and recommending solutions in HTML and Python. Hallucinations are less in GPT4.

What's next for DocTok

My goal is to empower everyday people with Design agents. I’m working actively on a product to convert M365 and Google Documents that 500M+ people use into engaging visual interfaces in one click; supercharge them with AI, so they can be master storytellers.

Built With

Share this project:

Updates