Inspiration

I built Sight-Line after struggling with real-time AI that gave instructions without showing where to act.

While trying to fix my car, I could not find the part the AI referred to. It described actions but did not highlight the location. This caused delay and repeated mistakes.

I wanted a system that shows what the AI sees and points to the exact place to act.


What it does

Sight-Line uses a webcam to guide interaction with real objects.

  • You give a command in plain language
  • The system detects the object
  • It selects the exact region to interact with
  • It overlays guidance on that region
  • It tracks movement in real time
  • It verifies when the action is complete

It stores past tasks and improves performance over time.


How we built it

Frontend

  • Next.js, React, TypeScript Webcam feed, UI, and interaction flow

Backend

  • FastAPI, Python API layer and orchestration pipeline

Vision Pipeline

  1. Capture frame from webcam
  2. Send image to Gemini
  3. Receive bounding box for target region
  4. Track region using OpenCV.js
  5. Overlay guidance
  6. Capture final frame
  7. Verify completion with Gemini

Memory

  • SQLite Stores:

    • Object type
    • User request
    • Initial and final images
    • Outcome

Intelligence Layers

  • Gemini Detection, bounding boxes, verification, memory curation

  • DigitalOcean Gradient LLM inference for slower tasks:

    • Memory curation
    • Correction digest

Input and Output

  • Web Speech API Voice input and spoken responses

Security

  • Unkey Authentication, API key validation, and rate limiting

Development

  • Augment AI coding assistant used during development

    • Helped debug frontend and backend issues
    • Assisted feature implementation
    • Enabled faster iteration through parallel workflows

Challenges we ran into

  • Detection accuracy for small parts
  • Tracking drift during movement
  • Reliable verification between frames
  • Latency in real-time interaction

Accomplishments that we're proud of

  • Real-time system that maps intent to a physical region
  • Stable tracking of selected regions
  • Verification using before and after states
  • Memory system that improves repeated tasks

What we learned

  • Object detection alone is not enough
  • Mapping intent to a region is required
  • Real-time systems need separation of fast and slow tasks
  • Tracking and verification must align
  • Structured memory improves performance

What's next for Sight Line

  • Improve detection accuracy for small components
  • Reduce latency
  • Add multi-step task support
  • Improve structure of correction inputs

Built With

Share this project:

Updates