Inspiration
As a Resident Advisor at Vanderbilt University, I spend 30+ minutes after every patrol round writing duty reports by hand. I walk through 4 floors, check 80+ rooms, and document everything manually. I wanted an AI partner that could watch through my phone camera, understand what it sees, speak up about problems without being asked, and write the report for me.
What it does
Scout is a real-time AI patrol assistant that rides along during RA duty rounds. The RA tapes their phone to their chest and walks. Scout:
- Sees through the rear camera via Gemini Live API, processing 1fps JPEG frames
- Speaks proactively — narrates observations, calls out anomalies, without being asked
- Knows where you are — ESP32 BLE iBeacon hardware on each floor auto-detects location
- Builds a 3D model — rooms and corridors pop into a Three.js dashboard in real-time as Firestore updates arrive
- Flashes hardware alerts — ESP32 LEDs flash when anomalies are detected on that floor.
- Writes the report — generates a complete structured duty report at end of patrol with zero manual input.
Phone = capture device. Laptop = real-time 3D dashboard. ESP32s = indoor positioning + visual alerts.
How I built it
- Agent: Google ADK with bidi-streaming, gemini-2.5-flash-native-audio-latest, Kore voice, 9 custom tools.
- Backend: FastAPI on Cloud Run, Firestore for real-time state, Cloud Storage for frame snapshots.
- Mobile: Flutter iOS app — rear camera, 16kHz PCM audio, BLE scanner (flutter_blue_plus), AVAudioEngine native audio playback.
- Dashboard: Angular 21 (standalone, zoneless) + Three.js 3D building renderer with pop-in animations, anomaly markers, floor selector.
- Hardware: 2x ESP32-WROOM-32 running dual-core FreeRTOS firmware — Core 0: BLE iBeacon advertising, Core 1: WiFi HTTP alert server + LED control
- IaC: Terraform for GCP infrastructure (Cloud Run, Firestore, GCS, Artifact Registry, Secret Manager).
- CI/CD: Cloud Build auto-deploys on push to main.
Challenges
- iOS AVAudioSession interruptions can silence agent audio playback; implemented retry logic in the platform channel
- Gemini vision + voice in the same bidi-stream required careful frame routing (binary JPEG detection via magic bytes vs JSON control frames)
- Grid-based 3D positioning without GPS indoors — solved with BLE beacon proximity + agent spatial reasoning.
- Real-time Firestore → Three.js rendering pipeline needed debouncing and eased animations to feel smooth.
What I learned
- ADK bidi-streaming is powerful but requires careful backpressure management
- ESP32 dual-core FreeRTOS allows BLE and WiFi to run simultaneously without interference
- Proactive AI agents (that speak without being asked) feel fundamentally different from chatbots
- Hardware integration (BLE beacons + LED alerts) makes AI tangible in a way pure software can't
What's next
- Gaussian splatting integration for photorealistic 3D building reconstruction
- Multi-building support with persistent spatial models
- Anomaly trend analysis across patrol history
Built With
- angular.js
- ble-ibeacon
- c
- cloud-build
- cloud-run
- cloud-storage
- dart
- esp-idf
- esp32
- fastapi
- firestore
- flutter
- freertos
- gemini-2.5-flash
- gemini-live-api
- google-adk
- python
- terraform
- three.js
- typescript
Log in or sign up for Devpost to join the conversation.