Inspiration
In the fast-paced corporate world, knowledge decays faster than it can be documented. By the time a trainer finishes designing a slide deck on a topic like "React 19" or "Generative AI Compliance," the information is already outdated.
We realized that current "Course Creators" are just text-wrappers—they generate static content that dies the moment it's created.
Inspired by the Gemini 3 "Action Era," we asked: What if a course wasn't just a document, but a living agent? We wanted to move beyond the "Blank Page Problem" to solve the "Relevance Problem." We built Ai Course Creator not just to write slides, but to act as an intelligent instructional designer that reasons, structures, and synthesizes multimodal formats instantly.
What it does
Ai Course Creator is an autonomous orchestration engine that transforms a single high-level goal (e.g., "Train my team on Cyber Security trends") into a fully deployable learning ecosystem. It turns weeks of instructional design work into a 60-second autonomous loop. Unlike simple wrappers, it uses Gemini 3 Pro’s reasoning capabilities to execute a multi-step workflow:
- Reasoning & Planning: It doesn't just guess; it generates a "Thought Signature" to outline a pedagogical structure suited for the specific audience.
- Multimodal Content Generation: It simultaneously architects:
- Visuals: Professional PowerPoint slides with structured layouts (.pptx).
- Audio & Video : Instructor-led voiceovers for on-the-go learning (.mp3) with video of the contents.
- Assessment: Interactive quizzes to verify knowledge retention.
- Cross-Language Scaling: Leveraging Gemini’s multilingual capabilities, it instantly translates technical nuances into multiple languages, allowing global teams to train simultaneously.
How we built it
We built a Python-based Agentic Loop powered by the Gemini 3 API.
The Brain (Gemini 3 Pro): We utilized the model's Long Context Window to allow the agent to "hold" the entire course structure in memory while generating specific modules. This ensures that Slide 10 is contextually aware of Slide 1.
The Orchestrator: We used Google AI Studio to tune the system instructions, enforcing a strict JSON schema output that our backend could parse. If any error occurs, the model automatically redirected to use another api key and continue the process (Fallback method). The model doesn't stop its working, if anything breaks the loop, it is automatically fix the bug or error automatically with the Brain.
The "Action" Layer: -> Backend: Flask (Python) serves as the controller. -> Tool Used: The agent calls custom functions to interface with python-pptx for slide rendering and gtts for audio synthesis. -> Reasoning Trace: We implemented a system where the model outputs its "internal monologue" before generating the JSON, allowing us to debug its logic flow.
Challenges we ran into
Structured Output Consistency: Getting a Large Language Model to output complex, nested JSON (for slide layouts) without a single syntax error was difficult. We solved this by using Gemini's native JSON Mode and implementing a retry-loop that feeds error logs back to the model for self-correction.
Hallucination in Technical Content: Ensuring the "Facts" in the slides were accurate. We mitigated this by adjusting the temperature and prompting the model to cite its reasoning within the "Thought Signature."
Multimodal Synchronization: Aligning the generated audio script perfectly with the bullet points on the slide required precise token-counting and timing logic.
Api key limits: We are using the free tier for using Gemini Api to create all these things, Sometimes it the process brakes by limits in api keys.
Accomplishments that we're proud of
The "One-Click" Reality: We successfully achieved a workflow where a user types one sentence and receives three distinct file formats (PPT, PDF, MP3,MP4) that are actually usable.
Reasoning over Formatting: The AI understands design. It knows when to use a bulleted list versus a title slide, purely based on the context of the content it generated.
Latency Reduction: We optimized the chain to generate a full 10-slide course in under a minute, which feels magical to the end-user.
What we learned
Gemini 3 is a Logic Engine, not just a Chatbot: We learned that the real power lies in asking the model to plan before it acts. The quality of the slides improved drastically when we forced the model to output a "Table of Contents" first.
The Power of Multimodality: We realized that text is only 10% of learning. Adding the audio layer (powered by the text generation) made the tool feel like a "Teacher" rather than a "Typewriter."
What's next for Ai Course Creator
To fully embrace the Gemini 3 Strategic Tracks, our roadmap includes:
- The "Marathon" Feature: Implementing a nightly cron job where the Agent autonomously checks the web for updates on the course topic and updates the slides if the information changes (e.g., new regulation passed).
- Live Coaching (Gemini Live API): Adding a mode where the user presents the slides back to the AI, and the AI uses the camera/mic to give feedback on their presentation style.
- Visual Generation: Integrating Imagen 3 to generate custom, context-aware diagrams for the slides instead of just text-heavy layouts.
Log in or sign up for Devpost to join the conversation.