Enthiran: Real-Time AI Guidance for Physical Tasks

(Enthiran is a Tamil word that means “machine” or “robot,” representing our vision of intelligent assistance in the real world.)

Why We Built This

Enthiran started while we were thinking seriously about startup ideas that solve real, everyday problems not just flashy tech demos.

We kept coming back to how often people struggle with basic physical tasks. Whether it’s fixing something at home, changing a tire, assembling furniture, or even cooking, most of us rely on YouTube videos and written guides. And most of the time, they don’t match the situation in front of us. Different tools, different setups, different environments which leads to confusion, mistakes, and frustration.

At the same time, we were working in tech and data science, watching AI become incredibly powerful at digital tasks like writing code and analyzing data.

It felt strange that AI was advancing so quickly, yet people were still guessing their way through real-world work.

So as part of exploring a startup problem worth solving, we asked ourselves:

Why can’t AI help people in the moment while they’re actually doing something physical?

That question became Enthiran.

What We’re Building

Enthiran is a mobile app that uses your phone’s camera to guide you through real-world tasks in real time.

Instead of watching a full tutorial first, you point your camera at what you’re working on and the system:

  • Detects what task you’re doing
  • Highlights important tools and objects on the screen
  • Walks you through each step with voice guidance
  • Warns you about common mistakes before they happen

The experience is meant to feel like having someone knowledgeable next to you adapting instructions to your exact situation rather than giving generic steps.

How We Built It

Our focus was on giving the AI real situational understanding, not just object labels.

We used:

  • Live camera input from the phone
  • Vision models from Google to analyze scenes and recognize tasks
  • AR overlays to visually guide where to act
  • Voice instructions for hands-free use
  • A backend that learns from how users perform each step

So instead of only recognizing a “wrench,” the system understands you’re changing a tire, where you are in the process, and where people usually struggle.

What Was Challenging

Real-time feedback was one of the hardest parts. Even short delays feel long when you’re actively working with your hands, so we had to carefully manage processing and responsiveness.

Another challenge was deciding what information actually helps. Showing everything the AI sees is overwhelming. Showing too little isn’t useful. We spent a lot of time refining what the system surfaces at each step.

We also worked on turning user interactions into meaningful insights like common mistakes and success patterns so the system improves over time.

What We Learned

Technically:

  • Multimodal AI is incredibly capable but sensitive to small changes
  • Real-time vision systems require careful optimization
  • Designing AR guidance is very different from typical app design

From a product perspective:

  • People don’t want more information they want guidance in the moment
  • Context makes instructions far more effective
  • Learning from real user behavior is what turns a tool into something truly useful

Where We’re Taking This

We’re treating Enthiran as the beginning of a startup idea, not just a hackathon project.

Next, we want to:

  • Expand to many more real-world tasks
  • Improve speed and accuracy
  • Let users contribute and refine workflows
  • Explore hands-free experiences like smart glasses

Why This Matters

Everyday skills shouldn’t feel intimidating or risky.

People waste time, break things, or give up simply because instructions don’t match their real situation.

Enthiran is our attempt to use modern AI to bridge that gap making real-world tasks easier, safer, and more approachable.

Built With

  • Gemini Vision AI
  • Real-time camera processing and AR overlays
  • Data-driven learning from user interactions
  • A lot of experimentation and iteration

— Harsha & Niveda

Built With

Share this project:

Updates