Inspiration: Modern information is overwhelming. Students, builders, and researchers constantly work with documents, screenshots, diagrams, and scattered notes, yet most AI tools treat these inputs in isolation or provide only simple summaries.
We wanted to explore what happens when AI moves beyond conversation and becomes a true reasoning system. The goal behind Orby Insight was to demonstrate how Gemini’s multimodal capabilities could transform raw information into structured intelligence that helps people understand, evaluate, and act faster.
Instead of building another chatbot, we focused on building an AI analyst that thinks through information the way a researcher or consultant would.
What it does: Orby Insight is a multimodal intelligence engine powered by Gemini that analyzes documents, text, and images to generate structured insights.
Users can upload content such as notes, business cases, diagrams, or screenshots, and Orby Insight automatically produces: concise summaries key insights and hidden assumptions risk analysis critical thinking questions actionable next steps confidence scoring
The system interprets visual concepts conceptually rather than simply describing them, allowing diagrams and images to be analyzed as ideas instead of objects.
The result is an AI tool that converts raw input into decision-ready intelligence.
How we built it: We built Orby Insight using Google Gemini through Google AI Studio as the core reasoning engine.
The architecture follows a structured analysis pipeline: User Input (text or image) → Multimodal Processing via Gemini → Structured Reasoning Prompts → JSON Intelligence Output → Insight Dashboard UI
Key technical components include: Gemini multimodal input for combined text and image understanding structured JSON outputs to transform responses into usable data automatic analysis mode detection (research, risk, visual interpretation, idea evaluation) Prompt engineering is designed for deep reasoning rather than conversational output context-aware follow-up analysis to extend insights
By leveraging Gemini’s long-context reasoning and multimodal understanding, the system performs layered analysis instead of single-response generation.
Challenges we ran into:
One of the biggest challenges was moving beyond basic AI responses. Early versions behaved like a chatbot and produced unstructured text, which made insights difficult to use.
We solved this by redesigning prompts to enforce structured outputs and analytical reasoning workflows.
Another challenge was ensuring visual inputs were interpreted conceptually. Instead of simple image descriptions, we refined prompts to encourage relationship analysis and inference.
Balancing depth of reasoning with fast response time for a live demo environment was also a key technical challenge.
Accomplishments that we're proud of: Successfully building a multimodal reasoning system rather than a traditional chatbot Designing structured intelligence outputs powered entirely by Gemini Demonstrating real-world analytical use cases within a short hackathon timeframe Creating an interface where AI outputs feel actionable and decision-oriented Showcasing Gemini’s capabilities through practical problem-solving
What we learned: This project taught us that the real power of modern AI models lies in structured reasoning and multimodal understanding.
We learned how prompt design significantly affects model behavior, and how structured outputs can transform AI from a conversation tool into a decision-support system.
We also gained hands-on experience designing systems around AI capabilities rather than adding AI as a feature afterward.
What's next for Orby Insight: Next, we plan to expand Orby Insight into a full intelligence workspace by adding: persistent memory across analyses collaborative insight sharing domain-specific analysis modes (education, business, research) real-time data integrations deeper autonomous reasoning workflows
Our long-term vision is to evolve Orby Insight into a personal AI analyst that helps people think more clearly and make better decisions using multimodal intelligence.
Built With
- api
- css3
- google-ai-studio
- google-gemini-1.5-pro
- html5
- javascript
- json
- multimodal-ai-processing
- prompt-engineering
- rest
- structured-output-design
- web-based-ui


Log in or sign up for Devpost to join the conversation.