🛡️ Project Aegis: Beyond Sight, Into Reasoning

The Inspiration

Traditional accessibility tools and industrial safety monitors share a fatal flaw: they are reactive, not proactive. They can label a "spill," but they don't understand that a spill next to a server rack is a catastrophic event. I was inspired by the idea of "Synthetic Intuition"—using Gemini 3 Pro to give users a "sixth sense" that doesn't just see the world, but understands the laws of physics and safety within it.

How I Built It

Project Aegis was built using a "Vibe Coding" philosophy in Google AI Studio.

  • The Brain: I utilized the Gemini 3 Pro model, specifically leveraging its Native Multimodality.
  • The Workflow: Using AI Studio’s Build Mode, I described the architectural requirements and used the Annotate feature to iteratively refine the UI without manual coding.
  • The Logic: I implemented a custom O.R.A. (Observe, Reason, Act) framework within the system instructions to ensure the model produces structured JSON safety alerts.

The Technical "Superpowers"

Aegis isn't just a wrapper; it pushes the boundaries of Gemini 3:

  1. Temporal Reasoning: By feeding a continuous stream into the context window, Aegis maintains a spatial map of the environment.
  2. Multimodal Fusion: It combines visual cues (a flickering light) with audio cues (a buzzing sound) to diagnose electrical faults.
  3. Probabilistic Risk Assessment: I used the model to calculate risk scores. For example, if \(P(h)\) is the probability of a hazard and \(S\) is the severity, Aegis calculates the Risk Index \(R\):

$$R = P(h) \times S$$

When \(R > 0.75\), the UI triggers a "High-Reasoning" emergency protocol.

Challenges I Faced

  • Latency vs. Reasoning: Real-time safety requires speed. I had to balance the Thinking Levels of Gemini 3—using lower reasoning for clear paths and triggering high-reasoning only when an "entity of interest" was detected.

Built With

  • code-execution-tool
  • gemini-3-flash
  • gemini-3-pro
  • gemini-api
  • google-ai-studio
  • google-cloud-run
  • google-search-grounding
  • langgraph
  • lucide-react
  • native-multimodality
  • python
  • react.js
  • tailwind-css
  • text-to-speech
  • thinking-levels
  • thought-signatures
  • typescript
  • vibe-coding-workflow
  • webrtc
Share this project:

Updates