🛡️ Project Aegis: Beyond Sight, Into Reasoning
The Inspiration
Traditional accessibility tools and industrial safety monitors share a fatal flaw: they are reactive, not proactive. They can label a "spill," but they don't understand that a spill next to a server rack is a catastrophic event. I was inspired by the idea of "Synthetic Intuition"—using Gemini 3 Pro to give users a "sixth sense" that doesn't just see the world, but understands the laws of physics and safety within it.
How I Built It
Project Aegis was built using a "Vibe Coding" philosophy in Google AI Studio.
- The Brain: I utilized the Gemini 3 Pro model, specifically leveraging its Native Multimodality.
- The Workflow: Using AI Studio’s Build Mode, I described the architectural requirements and used the Annotate feature to iteratively refine the UI without manual coding.
- The Logic: I implemented a custom O.R.A. (Observe, Reason, Act) framework within the system instructions to ensure the model produces structured JSON safety alerts.
The Technical "Superpowers"
Aegis isn't just a wrapper; it pushes the boundaries of Gemini 3:
- Temporal Reasoning: By feeding a continuous stream into the context window, Aegis maintains a spatial map of the environment.
- Multimodal Fusion: It combines visual cues (a flickering light) with audio cues (a buzzing sound) to diagnose electrical faults.
- Probabilistic Risk Assessment: I used the model to calculate risk scores. For example, if \(P(h)\) is the probability of a hazard and \(S\) is the severity, Aegis calculates the Risk Index \(R\):
$$R = P(h) \times S$$
When \(R > 0.75\), the UI triggers a "High-Reasoning" emergency protocol.
Challenges I Faced
- Latency vs. Reasoning: Real-time safety requires speed. I had to balance the Thinking Levels of Gemini 3—using lower reasoning for clear paths and triggering high-reasoning only when an "entity of interest" was detected.
Built With
- code-execution-tool
- gemini-3-flash
- gemini-3-pro
- gemini-api
- google-ai-studio
- google-cloud-run
- google-search-grounding
- langgraph
- lucide-react
- native-multimodality
- python
- react.js
- tailwind-css
- text-to-speech
- thinking-levels
- thought-signatures
- typescript
- vibe-coding-workflow
- webrtc

Log in or sign up for Devpost to join the conversation.