StudyMind

The AI Core

Powered by: The production-ready google-genai SDK hooking directly into Google AI Studio (Gemini 2.5 Flash).
Structured Data: We utilized Structured Outputs (JSON Schema) to force the AI to return strict data formats that our Next.js frontend could immediately map to functional UI state changes.

Challenges We Ran Into

1. Enforcing Strict JSON Outputs

During initial prototyping, the chatbot would occasionally inject conversational pleasantries (like "Sure, I can help you with that event!") alongside the data properties. This unstructured text broke our frontend JSON parsers. We solved this by explicitly configuring Gemini’s response_mime_type to "application/json" and passing strict Pydantic v2 schemas directly to the SDK API calls, ensuring the model exclusively outputs the exact object contract expected by our frontend.

2. Multi-Modal Math Transcription Accuracy

Transcribing handwritten mathematical formulas proved challenging due to differing handwriting styles. For instance, a messy handwritten line representing an integration bound:

$$\int_{0}^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2}$$

would occasionally drop symbols or mistake bounds for standard letters. We mitigated this by heavily tuning the model's system_instruction to act as an expert academic OCR engine, explicitly mandating that any parsed mathematical notation be wrapped in valid LaTeX formatting delimiters before saving to the database.

3. Image Processing Latency

Transporting high-resolution canvas drawings as raw image uploads created noticeable network lag. We optimized this lifecycle by compressing the HTML5 canvas data into lighter base64 PNG blocks directly on the client side before streaming the binary payload into the Gemini vision pipeline.

Built With

beanie
cloudrun
fastapi
firebase
gemini-2.5
mongodb
motor
next.js

Updates

Umme Munia started this project — May 19, 2026 05:51 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.