Inspiration

Working with hardware or performing unfamiliar tasks can be difficult, especially when doing it for the first time. Unconventional tasks can be hard to follow when the only resources available are YouTube tutorials or written guides, as every case is unique.

What it does

TaskLens analyzes visual images of the task at hand to understand the goal, assesses any potential safety hazards, and creates a clear task list providing convenience and security.

How we built it

We used FastAPI to build a high-speed backend that orchestrates two powerful agents. First, GPT-4 Vision analyzes a user’s photo to identify the components and their current state. This data is fed to GPT-4o-mini, which acts as the AI Architect, generating a safe, chronologically optimized task list through advanced reasoning. The system enforces a strict JSON schema to provide precise target data (like pin names or valve IDs). The frontend uses this structured data to drive dynamic HTML Canvas overlays, creating a personalized, risk-free visual guide for any manual task.

Challenges we ran into

The initial plan was to utilize a Live Video API to monitor the process the user is taking to accomplish the task and feed the live feed into Nvidia's Nemotron Nano agents. The purpose was to intervene if mistakes were being made and provide live feedback. However, there were no immediate API services that provided Live Video implementation, and the unfamiliarity with Nvidia's agents wasted a lot of time.

Accomplishments that we're proud of

Learning how to utilize new tools, such as OpenAI's Vision agent, and work with photo inputs for the first time to create a project that can help improve society. Also making a PWA for the first time.

What we learned

Specialized AI architecture is essential for real-time guidance across differing manual tasks. Understanding how to use the proper agents for specified tasks can boost the efficiency of products greatly.

What's next for TaskLens

Next steps for TaskLens are to incorporate a Live Video feed that can work alongside users in real time, rather than have users submit verification photos. Ease of use, efficiency, and safety will improve drastically with Live Video Feed and ensure that no manual tasks are too challenging for the average person.

Built With

Share this project:

Updates