Inspiration

During my years of medical training, I had to spend over 100 hours per week at the hospital, dealing with stress and difficult times, to concentrate on taking my exams. At 3 am, after a 24-hour shift, you had little time to remember, and words blurred as you started to dream with your eyes open. For me, it was easy to take a dopamine hit by watching a fun video or scrolling through Instagram.

Many apps offer a similar service, but none is personalized enough to your culture, background, or any crazy ideas you might have, and most give a generic template for their content that may not include what you need to learn. I wanted to build something that turns that "dopamine hit" into productive learning—using the same engaging, colorful, and weird storytelling that keeps us scrolling, but harnessing it to master complex medical concepts.

But my deeper motivation goes beyond exams. Medical professionals deal with life, death, stress, lawsuits, and painful decisions every day. My dream is that a doctor, facing a critical decision with only seconds to act, will vividly "see" a MedMnemonic image; perhaps one that reminds them of their childhood, and that instant recall will guide them to the right answer. We want to make learning not just memorable, but joyful, so that knowledge is there when it matters most.

What it does

MedMnemonic AI is a personalized learning companion that transforms dry, complex medical facts into vivid, unforgettable visual stories.

Custom Mnemonics: You feed it a topic (like "Cushing's Syndrome"), and it generates a quirky, character-driven story with sound-alike names (e.g., "Cush the Cushion") that map to clinical symptoms.

PDF to Mnemonic Series: The real magic happens when you upload your own PDF (textbook chapter, guidelines, lecture notes). The app analyzes your document, breaks it down into a logical, step-by-step series of mnemonics, and generates a complete study path—ensuring you cover every detail of your specific material.

Visual Grounding: It doesn't just generate an image; it assumes you need to know exactly which part of the image matters. Using AI vision analysis, it draws bounding boxes around characters so you can click a medical fact (e.g., "Moon Face") and see the corresponding character light up while the rest of the image fades.

Recursive Learning ("Dive Deeper"): If a term in the story is unfamiliar, you can click "Dive Deeper" to instantly spin up a new, nested mnemonic for that specific concept, creating an infinite web of knowledge.

Global Challenge: A gamified mode that tests your retention across your entire library of generated mnemonics using spaced repetition principles.

How we built it

We built MedMnemonic AI as a full-stack Python application, leveraging Google's Gemini models for their multimodal capabilities.

The Brain (Gemini 3.0 Flash & Pro Image): We leverage Gemini 3.0's capabilities.

  • Stage 1: Gemini 3.0 Flash generates the mnemonic story and extracts key medical associations in structured JSON.
  • Stage 2: We enhance the visual prompt to ensure the image generator captures specific details.
  • Stage 3: Gemini 3.0 Pro Image generates a high-fidelity custom illustration from the enhanced prompt.
  • Stage 4 (The Magic): We send the generated image back to Gemini 3.0 Flash (using its multimodal vision capabilities) to identify the X/Y coordinates of the characters we just created. This allows for the interactive "click-to-highlight" feature.

The Frontend (Streamlit): We used Streamlit to rapidly build a reactive, premium-feeling UI. We pushed Streamlit to its limits with custom CSS, session state management for the recursive "Dive Deeper" feature, and dynamic history loading.

The Cloud (Google Cloud Storage): To make the app persistent and deployable, we integrated Google Cloud Storage (GCS). This acts as our backend database, storing the JSON data and images so users can access their generated library from any device.

Batch Processing Power: For users who need to study entire subjects (e.g., "Pharmacology of Antibiotics"), we implemented a Hybrid Batch Architecture.

  • Topic Decomposition: deeply researches a broad topic and breaks it down into granular subtopics.
  • Parallel Scaling: We use Gemini 3.0 Flash to instantly generate the text content (stories, quizzes) for the entire curriculum.
  • Batch Asset Generation: We offload the heavy lifting of image generation to the Gemini Batch API. This allows us to generate dozens of high-fidelity illustrations in parallel in the background, bypassing standard rate limits and drastically reducing the time it takes to create a full course's worth of visual study aids. User gets the same high-quality results, just 100x faster.

Challenges we ran into

  • Hallucination vs. Structure: Getting an LLM to be creative with stories but rigid with JSON structure was difficult. We had to implement strict Pydantic validation and retry logic to ensure every generated mnemonic could be parsed programmatically.
  • Visual Consistency: Generative AI is often a "black box." Sometimes the story would describe a "green robot," but the image would show a blue one. We solved this by adding a specific "Visual Enhancement" step in our pipeline where Gemini rewrites the prompt specifically for the image generator to ensure alignment.
  • Statelessness: Implementing the "Dive Deeper" feature (where you generate a mnemonic inside another mnemonic) was tricky in a Streamlit environment, which reruns the script on every interaction. We had to carefully manage st.session_state to preserve the "parent" context while exploring the "child" concept. Deployment Secrets: Moving from a local environment to the cloud introduced complexity with managing API keys and GCS credentials securely, requiring us to implement a dual-loading strategy (Streamlit secrets vs. local TOML).

Accomplishments that we're proud of

  • The "Click-to-Highlight" Feature: We are incredibly proud of the bounding box implementation. Seeing the AI correctly identify "Sori the Robot" in a generated image and highlighting it when the user clicks "Psoriasis" feels like magic and deeply reinforces the learning connection.
  • Infinite Learning: The "Dive Deeper" button turned a static flashcard app into an endless Wikipedia-like rabbit hole of mnemonics.
  • Premium UX: We moved beyond the standard "data science" look of Streamlit, adding custom gradients, glassmorphism cards, and smooth transitions to make the app feel like a consumer product.

What we learned

  • Multimodal is the Future of EdTech: Text alone is boring. Images alone are pretty but shallow. Combining them—where the image understands the text—is the holy grail of engagement.
  • Prompt Engineering is Engineering: Writing the system prompts for the medical persona required as much iteration as writing the Python code itself.

What's next for MedMnemonic AI

  • Spaced Repetition System (SRS): We want to integrate an algorithm (like Anki) that schedules reviews based on how well you answered the quizzes, ensuring efficient long-term retention.
  • Community Sharing: Allowing users to publish their best mnemonics to a public "Global Brain" so other medical students can benefit from their creativity.
  • Voice Mode: Adding text-to-speech for the stories so students can learn while commuting or at the gym.

Built With

  • and-the-multimodal-bounding-box-analysis.-**google-gemini-3.0-pro-image**:-used-to-generate-the-high-fidelity
  • custom-mnemonic-illustrations.-**google-cloud-storage-(gcs)**:-acts-as-our-persistent-database
  • gemini
  • gemini-batch-api
  • google-cloud
  • pillow
  • pydantic
  • python
  • responsive-web-application-and-handle-session-state.-**google-gemini-3.0-flash**:-the-workhorse-model-for-reasoning
  • story-generation
  • streamlit
Share this project:

Updates