Enthiran: Real-Time AI Guidance for Physical Tasks

(Enthiran is a Tamil word that means “machine” or “robot,” representing our vision of intelligent assistance in the real world.)

Why We Built This

Enthiran started while we were thinking seriously about startup ideas that solve real, everyday problems not just flashy tech demos.

We kept coming back to how often people struggle with basic physical tasks. Whether it’s fixing something at home, changing a tire, assembling furniture, or even cooking, most of us rely on YouTube videos and written guides. And most of the time, they don’t match the situation in front of us. Different tools, different setups, different environments which leads to confusion, mistakes, and frustration.

At the same time, we were working in tech and data science, watching AI become incredibly powerful at digital tasks like writing code and analyzing data.

It felt strange that AI was advancing so quickly, yet people were still guessing their way through real-world work.

So as part of exploring a startup problem worth solving, we asked ourselves:

Why can’t AI help people in the moment while they’re actually doing something physical?

That question became Enthiran.

What We’re Building

Enthiran is a mobile app that uses your phone’s camera to guide you through real-world tasks in real time.

Instead of watching a full tutorial first, you point your camera at what you’re working on and the system:

Detects what task you’re doing
Highlights important tools and objects on the screen
Walks you through each step with voice guidance
Warns you about common mistakes before they happen

The experience is meant to feel like having someone knowledgeable next to you adapting instructions to your exact situation rather than giving generic steps.

How We Built It

Our focus was on giving the AI real situational understanding, not just object labels.

We used:

Live camera input from the phone
Vision models from Google to analyze scenes and recognize tasks
AR overlays to visually guide where to act
Voice instructions for hands-free use
A backend that learns from how users perform each step

So instead of only recognizing a “wrench,” the system understands you’re changing a tire, where you are in the process, and where people usually struggle.

What Was Challenging

Real-time feedback was one of the hardest parts. Even short delays feel long when you’re actively working with your hands, so we had to carefully manage processing and responsiveness.

Another challenge was deciding what information actually helps. Showing everything the AI sees is overwhelming. Showing too little isn’t useful. We spent a lot of time refining what the system surfaces at each step.

We also worked on turning user interactions into meaningful insights like common mistakes and success patterns so the system improves over time.

What We Learned

Technically:

Multimodal AI is incredibly capable but sensitive to small changes
Real-time vision systems require careful optimization
Designing AR guidance is very different from typical app design

From a product perspective:

People don’t want more information they want guidance in the moment
Context makes instructions far more effective
Learning from real user behavior is what turns a tool into something truly useful