Inspiration

The inspiration behind YODHA AI came from the limitations of current AI assistants, which are mostly restricted to chat-based interaction. We wanted to build a system that not only responds but also performs real-world tasks across systems and devices.

What it does

YODHA AI is a multi-modal, action-based intelligent assistant that can perform real tasks such as opening applications, executing system commands, automating web activities, and controlling devices using voice, text, or gestures.

How we built it

We built YODHA AI using a combination of modern technologies. The frontend is developed using React, Tailwind CSS, Three.js, and Electron for a desktop-based interface. The backend is powered by Python, FastAPI, and AsyncIO. AI capabilities are integrated using Google Gemini, while MediaPipe is used for vision and gesture recognition. For execution, we used tools like Playwright for web automation, build123d for CAD generation, and python-kasa for IoT device control. Real-time communication is handled using Socket.IO.

Challenges we ran into

We faced challenges in integrating multiple components such as AI processing, system control, and real-time interaction. Ensuring smooth communication between frontend and backend and handling real-time execution of commands were key difficulties.

Accomplishments that we're proud of

We successfully developed YODHA AI, a system that goes beyond traditional chatbots by performing real-world actions. Building a multi-modal assistant that integrates voice, vision, and system control is a major achievement.

What we learned

We learned how to integrate AI with system-level operations, manage multi-agent architectures, and develop a full-stack application combining frontend, backend, and automation tools.

What's next for YodhaAI

In the future, we plan to enhance YODHA AI by improving automation capabilities, increasing accuracy, adding more smart device integrations, and making the system more scalable and efficient.

Built With

Share this project:

Updates