Inspiration
While working at the Fascitelli Center, we observed a family playing with their dog. We were struck by how fluid and intuitive their communication was—simple gestures and vocal cues were all that was needed to direct complex movements. This stood in stark contrast to the current state of industrial robotics, which often requires specialized training and cumbersome controllers. We were inspired to bridge this gap, creating KineticK9 to bring that same 'human-canine' simplicity to the world of high-performance quadruped simulation.
What it does
KineticK9 is a multimodal control interface that synchronizes human intent with robotic execution through a high-fidelity simulation environment. By integrating a thoroughly featured full-stack web app with Real-Time Interactive Graphics, the platform enables intuitive operation of quadruped robotics via three primary input vectors:
- Speech & Textual Command: Users issue verbal (e.g. "walk forward") or numeric instructions
- Graphical Interface: A dynamic web-based dashboard allows for direct coordinate-based targeting and environmental interaction, such as directing the agent to specific digital assets like a ball or waypoint.
- Bidirectional Audio Feedback: To simulate authentic human-canine interaction and provide system transparency, the KineticK9 utilizes ElevenLabs Text-to-Speech (TTS) to verify task completion and report status updates (e.g., "Walked to the ball", "Arrived at [an ordered pair of coordinates]").
How we built it
We built this project through the use of a full-stack web app with React and Flask, as well as a Jupyter notebook that trained the model that would become the virtual K9. Programming on Cursor helped to improve the efficiency of the workflow for this project as AI-assisted software development greatly accelerated the process of building the project. Additionally, we built the text-to-speech of the dog's audio output and the speech-to-text that allows vocal commands using ElevenLabs.
Challenges we ran into
One of the primary challenges we ran into was the integration of the robotics simulation with the other elements of the web app, such as the user interface and inputs. We had to develop our software such that a command like "Walk Forward" into the specific duration and velocity of movement recognized by the trained model. Furthermore, the "black box" nature of training presented a different kind of difficulty to this project.
Accomplishments that we're proud of
We successfully trained a simulated robot model to walk from scratch. We are especially proud of the K9's responsiveness to multimodal inputs—it doesn't just move; it interprets commands and stops at precise positions in the domain, demonstrating a high level of control and stability.
What we learned
Our development of KineticK9 provided deep technical insights into the intersection of machine learning, web architecture, and human-robot interaction:
- Complex System Integration: We learned how to manage a system with many moving parts. Integrating a Flask backend with a live React frontend and a Reinforcement Learning model trained in Jupyter was a masterclass in "putting everything together." There was also added complexity in using ElevenLabs' Text-To-Speech and Speech-To-Text features. Ensuring that data flowed seamlessly from a user's voice to the robot's legs required a deep understanding of full-stack connectivity.
- Multimodal Input Normalization: We learned how to normalize inputs from three very different sources and translate them into motion for the robot shown in the KineticK9 software.
What's next for KineticK9
The current iteration of KineticK9 establishes the foundation for intuitive quadruped control. Future development will focus on increasing the agent's autonomy and expanding its environmental awareness:
Vision-Language Navigation (VLN): Integrating computer vision to allow the agent to "see" and describe its surroundings. Instead of just moving to a coordinate, the user could command, "Find the red valve," and the robot would autonomously identify and navigate to the object.
Sim-to-Real Deployment: Perhaps the most important future step for Kinetic K9! Implementing a bridge to export trained control policies from the virtual environment to physical hardware would allow KineticK9 to move from the screen to the real world.
Note: We were a team of two people.
Built With
- elevenlabs
- flask
- html/css
- javascript
- jupyter
- python
- react
Log in or sign up for Devpost to join the conversation.