=============================================================================== REALITY COPILOT
Your Copilot in Real Reality
MOTIVATION
We have GitHub Copilot revolutionizing how we write code. We have Windows Copilot transforming how we interact with our desktop. So why can't we have Reality Copilot - an AI assistant that understands and enhances our physical world in mixed reality?
"Reality Copilot, your copilot in real reality."
WHAT IT DOES
Reality Copilot transforms Meta Quest 3 into an intelligent spatial computing assistant by integrating cutting-edge AI services with real-time camera feeds. Capture your environment, segment objects, generate 3D models, and compose context-aware emails - all in mixed reality.
KEY FEATURES
- FastVLM: Real-time image understanding with natural language
- SAM3: Advanced object segmentation with text prompts
- SAM3D: Convert 2D masks to full 3D models instantly
- Smart Email: Context-aware composition with AI-generated content
- Hardware Recording: H.264/H.265 video with dual audio (mic + speaker output)
- Voice-First: TEN VAD native library (.so) - on-device, just speak, no buttons
TECH STACK
Unity C# → Quest Camera API, TEN VAD (.so), UI/Scene Management Python → FastVLM, SAM3, SAM3D inference servers Java/C++ → Hardware video encoding, audio capture, native optimizations JavaScript → Three.js 3D viewer, WebView content
TECHNICAL HIGHLIGHTS
- Multi-language integration: Seamless Unity C# ↔ Python ↔ Java/C++ ↔ JavaScript
- On-device VAD: Native library (.so) for local voice detection - no server needed
- Hardware acceleration: Native MediaCodec for efficient recording
- Zero-copy transfers: Optimized frame pipeline
- Android-optimized: UnityWebRequest for APK asset loading
- OAuth2 security: Gmail integration with token refresh
HOW IT WORKS
- CAPTURE - Grab camera feed with precise timestamps
- PROCESS - Send to AI services (FastVLM/SAM3/SAM3D)
- VISUALIZE- Display results in 3D space
- INTERACT - Email, save, or manipulate content
INNOVATION
First Reality Copilot for spatial computing - unifying multiple state-of-the- art AI models (vision, segmentation, 3D generation) with voice-first hands- free interaction and industry-first dual audio capture (mic + speaker) in VR.
PERFORMANCE
- 72Hz frame rate maintained
- <2s AI inference
- <50ms voice detection latency
- <2GB memory footprint
- 1080p@30fps recording
REQUIREMENTS
Built for Meta Quest 3/3S Unity 6000.0.38f1 Python 3.8+ Android SDK
WORKFLOW EXAMPLE
User: "Capture" → Camera captures view → Image shown in overlay
User: "SAM3D Local" → Loads 3D model → Displays in space
User: "Email Model" → AI analyzes content → Generates description → Opens email composer
User: Send email → Gmail API sends → Confirmation displayed
AI SERVICES (Python Server)
FastVLM Server - POST /fastvlm/process SAM3 Segmentation - POST /sam3/segment SAM3D Generation - POST /sam3d/generate Health Check - GET /status
Note: VAD runs locally as native library (.so), not as a service
SUPPORTED FORMATS
Images: PNG, JPEG, WebP 3D Models: GLB, GLTF Video: H.264, H.265 (hardware-accelerated) Audio: WAV, MP3, OGG
KEY ACHIEVEMENTS
- First Reality Copilot for spatial computing
- Multi-modal AI integration in VR
- Voice-first hands-free interaction with VAD
- Industry-first dual audio capture (mic + speaker)
- Hardware-optimized video recording
- Cross-platform asset loading for Android
- Secure OAuth2 Gmail integration
- Real-time 3D model generation from segmentation
FUTURE PLANS
- Multi-user collaboration with voice chat
- Cloud synchronization
- Persistent AR anchors
- Custom model training
- Real-time translation (text + speech)
- Gesture recognition (voice + gesture commands)
- Embodied AI integration with robotics
REFERENCES
This project integrates several cutting-edge open-source projects:
FastVLM - Apple's Fast Vision Language Model https://github.com/apple/ml-fastvlm
SAM3 - Meta's Segment Anything Model 3 https://github.com/facebookresearch/sam3
SAM 3D Objects - Meta's 3D object generation https://github.com/facebookresearch/sam-3d-objects
Embodiment - Google's embodied AI research https://github.com/google/embardiment
TEN VAD - Voice Activity Detection framework https://github.com/TEN-framework/ten-vad
Unity Passthrough Camera API Samples - Meta's official Quest camera API https://github.com/oculus-samples/Unity-PassthroughCameraApiSamples
RobotVisionUnityPluginQuest - Hardware-accelerated recording plugin https://github.com/luffy-yu/RobotVisionUnityPluginQuest
=============================================================================== Reality Copilot, your copilot in real reality.

Log in or sign up for Devpost to join the conversation.