Sight Line

Front page

Inspiration

I built Sight-Line after struggling with real-time AI that gave instructions without showing where to act.

While trying to fix my car, I could not find the part the AI referred to. It described actions but did not highlight the location. This caused delay and repeated mistakes.

I wanted a system that shows what the AI sees and points to the exact place to act.

What it does

Sight-Line uses a webcam to guide interaction with real objects.

You give a command in plain language
The system detects the object
It selects the exact region to interact with
It overlays guidance on that region
It tracks movement in real time
It verifies when the action is complete

It stores past tasks and improves performance over time.

How we built it

Frontend

Next.js, React, TypeScript Webcam feed, UI, and interaction flow

Backend

FastAPI, Python API layer and orchestration pipeline

Vision Pipeline

Capture frame from webcam
Send image to Gemini
Receive bounding box for target region
Track region using OpenCV.js
Overlay guidance
Capture final frame
Verify completion with Gemini

Memory

SQLite Stores:
- Object type
- User request
- Initial and final images
- Outcome

Intelligence Layers

Gemini Detection, bounding boxes, verification, memory curation
DigitalOcean Gradient LLM inference for slower tasks:
- Memory curation
- Correction digest

Input and Output

Web Speech API Voice input and spoken responses

Security

Unkey Authentication, API key validation, and rate limiting

Development

Augment AI coding assistant used during development
- Helped debug frontend and backend issues
- Assisted feature implementation
- Enabled faster iteration through parallel workflows