Scout

Inspiration

As a Resident Advisor at Vanderbilt University, I spend 30+ minutes after every patrol round writing duty reports by hand. I walk through 4 floors, check 80+ rooms, and document everything manually. I wanted an AI partner that could watch through my phone camera, understand what it sees, speak up about problems without being asked, and write the report for me.

What it does

Scout is a real-time AI patrol assistant that rides along during RA duty rounds. The RA tapes their phone to their chest and walks. Scout:

Sees through the rear camera via Gemini Live API, processing 1fps JPEG frames
Speaks proactively — narrates observations, calls out anomalies, without being asked
Knows where you are — ESP32 BLE iBeacon hardware on each floor auto-detects location
Builds a 3D model — rooms and corridors pop into a Three.js dashboard in real-time as Firestore updates arrive
Flashes hardware alerts — ESP32 LEDs flash when anomalies are detected on that floor.
Writes the report — generates a complete structured duty report at end of patrol with zero manual input.

Phone = capture device. Laptop = real-time 3D dashboard. ESP32s = indoor positioning + visual alerts.

How I built it

Agent: Google ADK with bidi-streaming, gemini-2.5-flash-native-audio-latest, Kore voice, 9 custom tools.
Backend: FastAPI on Cloud Run, Firestore for real-time state, Cloud Storage for frame snapshots.
Mobile: Flutter iOS app — rear camera, 16kHz PCM audio, BLE scanner (flutter_blue_plus), AVAudioEngine native audio playback.
Dashboard: Angular 21 (standalone, zoneless) + Three.js 3D building renderer with pop-in animations, anomaly markers, floor selector.
Hardware: 2x ESP32-WROOM-32 running dual-core FreeRTOS firmware — Core 0: BLE iBeacon advertising, Core 1: WiFi HTTP alert server + LED control
IaC: Terraform for GCP infrastructure (Cloud Run, Firestore, GCS, Artifact Registry, Secret Manager).
CI/CD: Cloud Build auto-deploys on push to main.

Challenges

iOS AVAudioSession interruptions can silence agent audio playback; implemented retry logic in the platform channel
Gemini vision + voice in the same bidi-stream required careful frame routing (binary JPEG detection via magic bytes vs JSON control frames)
Grid-based 3D positioning without GPS indoors — solved with BLE beacon proximity + agent spatial reasoning.
Real-time Firestore → Three.js rendering pipeline needed debouncing and eased animations to feel smooth.

What I learned

ADK bidi-streaming is powerful but requires careful backpressure management
ESP32 dual-core FreeRTOS allows BLE and WiFi to run simultaneously without interference
Proactive AI agents (that speak without being asked) feel fundamentally different from chatbots
Hardware integration (BLE beacons + LED alerts) makes AI tangible in a way pure software can't

What's next

Gaussian splatting integration for photorealistic 3D building reconstruction
Multi-building support with persistent spatial models
Anomaly trend analysis across patrol history

Built With

angular.js
ble-ibeacon
c
cloud-build
cloud-run
cloud-storage
dart
esp-idf
esp32
fastapi
firestore
flutter
freertos
gemini-2.5-flash
gemini-live-api
google-adk
python
terraform
three.js
typescript

Updates

Private user started this project — Mar 16, 2026 07:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.