The AI Core
- Powered by: The production-ready
google-genaiSDK hooking directly into Google AI Studio (Gemini 2.5 Flash). - Structured Data: We utilized Structured Outputs (JSON Schema) to force the AI to return strict data formats that our Next.js frontend could immediately map to functional UI state changes.
Challenges We Ran Into
1. Enforcing Strict JSON Outputs
During initial prototyping, the chatbot would occasionally inject conversational pleasantries (like "Sure, I can help you with that event!") alongside the data properties. This unstructured text broke our frontend JSON parsers. We solved this by explicitly configuring Gemini’s response_mime_type to "application/json" and passing strict Pydantic v2 schemas directly to the SDK API calls, ensuring the model exclusively outputs the exact object contract expected by our frontend.
2. Multi-Modal Math Transcription Accuracy
Transcribing handwritten mathematical formulas proved challenging due to differing handwriting styles. For instance, a messy handwritten line representing an integration bound:
$$\int_{0}^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2}$$
would occasionally drop symbols or mistake bounds for standard letters. We mitigated this by heavily tuning the model's system_instruction to act as an expert academic OCR engine, explicitly mandating that any parsed mathematical notation be wrapped in valid LaTeX formatting delimiters before saving to the database.
3. Image Processing Latency
Transporting high-resolution canvas drawings as raw image uploads created noticeable network lag. We optimized this lifecycle by compressing the HTML5 canvas data into lighter base64 PNG blocks directly on the client side before streaming the binary payload into the Gemini vision pipeline.
Log in or sign up for Devpost to join the conversation.