Inspiration

Agriculture challenges are universal. It is the backbone of food security worldwide, yet farmers often lack immediate access to expert guidance when facing crop diseases or uncertain seed quality. These problems are time-sensitive and can lead to significant yield losses if not addressed early. We were inspired to build a solution that uses AI to bring expert-level agricultural insights directly to farmers, anywhere in the world without regional or technical barriers, using nothing more than an image and a simple interface.


What it does

SmartFarm AI is a multimodal agricultural assistant that analyzes crop leaves and seeds from images and provides instant, actionable insights. Users can upload an image of a crop leaf to detect possible diseases or upload seed images to evaluate seed quality before sowing. The system returns structured results including identification, confidence scores, observations, and recommendations. Additionally, SmartFarm AI offers voice-based guidance, allowing users to listen to AI-generated advice for hands-free and accessible usage. The system is designed to be globally usable.


How we built it

SmartFarm AI is built using a FastAPI backend integrated with Google Gemini 3 Vision. It used Gemini 3 Vision to analyze images of crops and seeds (performing all image understanding and reasoning) and FastAPI handled image uploads, Gemini API calls, structured JSON extraction, and voice generation. Carefully designed prompts ensure that Gemini returns reliable, structured JSON outputs for both crop disease detection and seed quality analysis. The frontend is developed using Streamlit, featuring a professional UI with light and dark themes, image previews, and audio playback. Voice responses are generated using text-to-speech services, completing a fully multimodal experience.


Challenges we ran into

One major challenge was enforcing consistent and valid JSON responses from a generative AI model. Another challenge was integrating multiple AI workflows—crop analysis and seed analysis—into a single seamless system. Ensuring smooth communication between the frontend and backend while maintaining performance and usability also required careful design and testing.


Accomplishments that we're proud of

  • Successfully built a fully multimodal AI system using vision and voice
  • Integrated Gemini 3 Vision without relying on heuristic or rule-based logic
  • Designed a global, crop-focused solution applicable across regions
  • Delivered a clean, user-friendly interface with real-time AI responses

What we learned

We learned how to design a production-ready multimodal AI system using Gemini, from prompt engineering to full-stack deployment. This project taught us how to design and deploy a real-world AI application using multimodal models. We gained hands-on experience with prompt engineering, AI output validation, backend–frontend integration, and building accessible user experiences that go beyond text-based interactions.

Gemini Integration

  • SmartFarm AI is powered by the multimodal capabilities of Gemini 3, which serves as the core intelligence of the application.
  • The system uses Gemini 3 Vision (gemini-3-flash-preview) to analyze images of crops and seeds. For crop disease detection, Gemini processes leaf images to identify the crop type, detect possible diseases, estimate confidence scores, and generate treatment recommendations. For seed quality analysis, Gemini evaluates visual characteristics such as shape, color, damage, and defects to classify quality levels and provide sowing guidance.
  • All AI responses are generated directly by Gemini using carefully engineered prompts and returned in strict JSON format, ensuring reliable parsing and seamless integration into the application. No heuristic or rule-based logic is used.
  • In addition to visual analysis, SmartFarm AI includes voice capability, converting Gemini-generated insights into spoken guidance so users can listen to recommendations hands-free. This makes the system accessible in low-literacy or on-field scenarios.
  • By combining image understanding, reasoning, structured output generation, and voice support, Gemini 3 enables SmartFarm AI to deliver scalable, globally applicable agricultural intelligence.

What's next for SmartFarm AI

Next, we plan to expand SmartFarm AI with multilingual voice support, historical analysis for tracking crop health over time, and mobile-friendly deployment. We also aim to integrate weather and soil data to provide even more context-aware recommendations for farmers worldwide.


Github Repo

Star our repo here GitHub.


Built With

Share this project:

Updates