Inspiration
Mercor organizes human intelligence to power AI. We want to contribute to this mission by building a platform that organizes human intelligence to power physical AI. Introduce, Human Gym.
What it does
Human Gym is building the training layer for physical AI — a fast, addictive way to train workers to collect high-quality real-world data for robotics. Along with Build AI’s 100000 hours of egocentric data, we use NVIDIA Cosmos 3 as a world model that understands physical work at scale. Companies describe a physical task to our AI agent, and Human Gym instantly creates a mission made of step-by-step checkpoints. Workers complete tasks using a phone camera or Meta Ray-Bans while our voice Copilot guides them in real time and our Cosmos 3 world model verifies each subtask from their egocentric video stream. The better they perform, the better the Pokémon they unlock, creating a repeatable game loop that trains workers to generate cleaner, more structured, more useful data for training robots. Human Gym turns physical work into a game, thousands of workers into data collectors, and everyday tasks into the training fuel for robotics and embodied AI.
How we built it
For the frontend, we used Typescript. For the backend, we used Next.js. For the world model we deploy Cosmos3-Super and are attempting to use proprietary egocentric data for fine tuning. For the interactive Co-Pilot, we used Browser Web Speech API.
Challenges we ran into
We ran into trouble into our real-time requirements. Since the World Model is grading the player on their task completion, we needed to give the player feedback quickly. To reduce the latency, we created a low-latency WebRTC stream into Cosmos 3. Furthermore, we call the World Model in parallel with two output schemas, with one output schema being simplified and the other one being exhaustive. The simplified schema guaranteed speediness while the exhaustive schema guaranteed correctness.
Accomplishments that we're proud of
We are proud of getting a working end-to-end product that organizes human intelligence to power Human AI.
What we learned
We furthered our skills in full stack development and applied AI engineering.
What's next for Human Gym
Human Gym is going to create a GPS feature where companies can specify the locations that tasks must take place. We're implementing SLAM and other high quality data signals with proprietary hardware vendors.
Built With
- cosmos3-super
- next.js
- typescript
Log in or sign up for Devpost to join the conversation.