Inspiration
I built Sight-Line after struggling with real-time AI that gave instructions without showing where to act.
While trying to fix my car, I could not find the part the AI referred to. It described actions but did not highlight the location. This caused delay and repeated mistakes.
I wanted a system that shows what the AI sees and points to the exact place to act.
What it does
Sight-Line uses a webcam to guide interaction with real objects.
- You give a command in plain language
- The system detects the object
- It selects the exact region to interact with
- It overlays guidance on that region
- It tracks movement in real time
- It verifies when the action is complete
It stores past tasks and improves performance over time.
How we built it
Frontend
- Next.js, React, TypeScript Webcam feed, UI, and interaction flow
Backend
- FastAPI, Python API layer and orchestration pipeline
Vision Pipeline
- Capture frame from webcam
- Send image to Gemini
- Receive bounding box for target region
- Track region using OpenCV.js
- Overlay guidance
- Capture final frame
- Verify completion with Gemini
Memory
SQLite Stores:
- Object type
- User request
- Initial and final images
- Outcome
Intelligence Layers
Gemini Detection, bounding boxes, verification, memory curation
DigitalOcean Gradient LLM inference for slower tasks:
- Memory curation
- Correction digest
Input and Output
- Web Speech API Voice input and spoken responses
Security
- Unkey Authentication, API key validation, and rate limiting
Development
Augment AI coding assistant used during development
- Helped debug frontend and backend issues
- Assisted feature implementation
- Enabled faster iteration through parallel workflows
Challenges we ran into
- Detection accuracy for small parts
- Tracking drift during movement
- Reliable verification between frames
- Latency in real-time interaction
Accomplishments that we're proud of
- Real-time system that maps intent to a physical region
- Stable tracking of selected regions
- Verification using before and after states
- Memory system that improves repeated tasks
What we learned
- Object detection alone is not enough
- Mapping intent to a region is required
- Real-time systems need separation of fast and slow tasks
- Tracking and verification must align
- Structured memory improves performance
What's next for Sight Line
- Improve detection accuracy for small components
- Reduce latency
- Add multi-step task support
- Improve structure of correction inputs
Built With
- assistantui
- digitalocean-gradient
- fastapi
- google-gemini-api
- next.js
- opencv.js
- python
- react
- sqlite
- tailwind-css
- typescript
- unkey
- web-speech-api
Log in or sign up for Devpost to join the conversation.