Inspiration
Jardinly AI: The Multimodal Bio-Architect 🌿 Inspiration We all have a "black thumb" story. You buy a beautiful Calathea, place it on your desk, and three weeks later, it's brown and crispy. The problem isn't that we don't care; it's that we don't understand the invisible environmental factors—light spectrum, humidity, and airflow—that dictate plant health. Existing apps are just encyclopedias or simple image classifiers. They tell you what the plant is, but not why it's dying in your specific room. We wanted to build an AI Agent, not just a database. We wanted a "Bio-Architect" that could see your room, hear your questions, and reason about the biology of your home ecosystem. What it does Jardinly AI is a multimodal gardening companion that uses the full spectrum of the Gemini API: Spatial Environmental Audit (Video): Users scan their room with their camera. The app uses Gemini 3.0 Flash to analyze the video stream, identifying multiple plants simultaneously, assessing light direction, inferring humidity from leaf texture, and generating an "Ecosystem Score" for the room. Medical-Grade Diagnostics (Image): The Gemini 2.5 Flash model acts as a pathologist, detecting microscopic signs of pests (spider mites, aphids) and diseases (root rot, fungus) from static photos, returning structured treatment plans. Hands-Free Coaching (Audio): Using the Gemini Live API, users can have a real-time, low-latency voice conversation with their AI Botanist while their hands are dirty. Algorithmic Care: We combine AI reasoning with raw mathematical modeling to calculate solar angles and evaporation rates based on the user's geolocation. How we built it We built Jardinly as a Progressive Web App (PWA) using React 19, Vite, and Tailwind CSS.
- The AI Core We leveraged the @google/genai SDK to orchestrate three distinct modalities: Video (Spatial Analysis): We utilized the newest Gemini 3.0 Flash Preview model for its massive context window and superior video understanding. We capture a 4-second video buffer via the browser's MediaRecorder API, convert it to base64, and feed it to the model with a system instruction to act as a "Bio-Architect." Audio (Gemini Live): We implemented a custom WebSocket connection to the Gemini Live API. We handle raw PCM audio processing (16kHz input / 24kHz output) using the Web Audio API (ScriptProcessorNode and AudioContext) to create a seamless, interruptible voice conversation. Image (Diagnostics): We used Gemini 2.5 Flash with JSON Schema enforcement to ensure that every diagnosis returns strict, type-safe data (Severity, Symptoms, Treatment) that our UI can render into medical cards.
- The Math (Solar Physics) To make the AI grounded in reality, we didn't just ask the LLM "is there enough light?" We implemented the Cooper Model for solar declination to calculate the exact Theoretical Daylight Duration ( ) for the user's latitude ( ) on any given day of the year ( ). First, we calculate the Solar Declination angle ( ): Then, we derive the Sunset Hour Angle ( ): Finally, the day length in hours is: We feed this hard data into the AI's context window, allowing it to give advice based on actual solar physics rather than hallucinations. Challenges we faced Video Token Optimization: sending raw video frames to an LLM is bandwidth-heavy. We had to fine-tune our MediaRecorder settings (codecs, bitrates) to capture high-fidelity details (needed to see pests) while keeping payload sizes small enough for a quick response from Gemini 3.0. Audio Sync: The Gemini Live API emits raw PCM audio chunks. Synchronizing these chunks to play smoothly without "clicking" or "gaps" required implementing a custom audio buffer queue system that tracks the audioContext.currentTime precisely. JSON Schema Strictness: Getting the AI to reliably return nested JSON objects for complex care plans (e.g., specific watering schedules) required rigorous testing of our responseSchema definitions. Accomplishments that we're proud of The "Spatial Scanner": Seeing the AI correctly identify that a plant was "too close to a drafty window" just by analyzing a 5-second video pan of a room was a magic moment. Multimodal Fluidity: The app feels like a cohesive tool. You can snap a photo, get a diagnosis, and then immediately switch to Voice Mode to ask follow-up questions about that specific diagnosis without losing context. Solarpunk Aesthetic: We built a UI that feels futuristic yet organic, using glassmorphism and fluid animations (Framer Motion) to match the advanced AI underneath. What I learned I learned that context is king. A photo of a plant is good, but a video of the room the plant lives in is game-changing. By moving from Gemini 1.5 to Gemini 3.0, we unlocked the ability to understand the "environment" rather than just the "subject." I also gained a deep appreciation for the Gemini Live API—the ability to interrupt the model mid-sentence makes the conversation feel genuinely human. What's next for Jardinly AI AR Overlay: Using WebXR to project the "Ecosystem Score" directly onto plants in the camera view. IoT Integration: Connecting to soil moisture sensors to feed real-time data into the prompt context. Community Garden: A social feature to share rare plant scans and diagnoses with local gardening groups.
Built With
- aistudio
Log in or sign up for Devpost to join the conversation.