Inspiration

As students and everyday consumers, we face two recurring challenges:

  • Understanding complex educational diagrams that are often presented as static images with little to no explanation.
  • Navigating legal documents filled with dense language and hidden risks that most people skip or misunderstand.

Textbooks rarely explain the processes behind diagrams, and contracts or terms of service are written in legal jargon that obscures important details.

We wanted to leverage Gemini 3’s advanced capabilities—multimodal vision, Thinking Mode, and long-context processing—to solve these real-world problems. The result is a unified application featuring two powerful tools that make complex information accessible and understandable for everyone.


What We Built

We developed a Streamlit-based application that includes two distinct, AI-powered projects:

1. Diagram Decoder

An educational tool that transforms static textbook diagrams into interactive, step-by-step explanations. Users upload a diagram image, and the system:

  • Identifies components, labels, and relationships using Gemini 3’s multimodal vision
  • Explains causal relationships and underlying processes using Thinking Mode for deep reasoning
  • Generates quiz questions to reinforce and test understanding

2. Fine Print Translator

A social good tool designed to uncover hidden risks in legal documents. Users can upload contracts, terms of service, or paste text, and the system:

  • Extracts text from images, PDFs, or direct input
  • Audits documents for predatory or risky clauses using Thinking Mode
  • Provides clear Red / Yellow / Green risk assessments with actionable recommendations

How We Built It

We used LangGraph to orchestrate multi-node agent workflows, allowing each tool to process information through well-defined stages.

Diagram Decoder Workflow

  1. Vision Identification (Gemini 3 Flash) Fast recognition of components and labels
  2. Logic Explanation (Gemini 3 Pro with Thinking) Deep causal and process-level reasoning
  3. Quiz Generation (Gemini 3 Flash) Creation of educational assessment questions

Fine Print Translator Workflow

  1. Text Extraction (Gemini 3 Flash / PyPDF) Document and image text processing
  2. Risk Audit (Gemini 3 Pro with Thinking) Detection of subtle and predatory clauses
  3. Risk Summary (Gemini 3 Flash) Clear, user-friendly risk classification

The application features a clean and intuitive Streamlit interface, where users provide their own Gemini API keys to ensure privacy and security.


Challenges We Faced

  1. Model Availability We initially encountered 404 errors due to incorrect model names. This was resolved by using the correct Gemini 3 Preview models: gemini-3-flash-preview and gemini-3-pro-preview.

  2. Response Parsing Gemini 3 responses are returned in a deeply nested format. We implemented a robust extraction function to parse and display clean, readable outputs.

  3. Quota Management The Gemini 3 Preview models have strict free-tier limits. We added graceful error handling and user-friendly messages to manage quota-related issues.

  4. Deployment Vercel does not natively support Streamlit. We deployed the application on Streamlit Cloud, which is optimized for Streamlit-based projects.

  5. User Experience Balancing power and simplicity was critical. We minimized UI complexity, implemented a straightforward API key input, and focused on core functionality.


What We Learned

  • LangGraph Orchestration: Designing multi-agent workflows with effective state management
  • Gemini 3 Capabilities: Applying multimodal vision, Thinking Mode, and long-context reasoning in practical applications
  • Production Deployment: Understanding platform constraints and selecting the right deployment solution
  • User-Centric Design: Prioritizing clarity, usability, and real-world value over visual complexity

Impact

  • Education: Helps students understand complex diagrams in biology, physics, engineering, and chemistry
  • Consumer Protection: Empowers users to identify risks in legal documents before agreeing to them
  • Accessibility: Makes complex information understandable without requiring specialized expertise

Together, these tools address widespread problems with practical, AI-powered solutions—demonstrating the real-world impact of Gemini 3’s advanced reasoning and multimodal capabilities.

Built With

  • langgchain
  • langgraph
  • python
  • streamlit
Share this project:

Updates