🦾 Artifex: Conversational Robotics Engine

Bridge the gap between natural language and physical simulation. Artifex is a browser-based, AI-powered robotics laboratory that turns your spoken ideas into real-time physical actions — no hardware, no installs, no limits.

💡 Inspiration

Robotics development is notoriously difficult to get into. Between heavy dependencies, expensive physical hardware, and complex control interfaces, the barrier to entry is massive. We asked ourselves:

What if anyone could experiment with a state-of-the-art robotic arm right in their browser, using nothing but natural language?

We were inspired to merge the reasoning capabilities of modern Generative AI with the rigorous physics of MuJoCo — and to democratize robotics research by creating an interactive, frictionless environment where anyone could observe a robotic agent visually "think," "plan," and autonomously correct itself in real-time. By moving $50,000 of hardware dependencies directly into the browser, we are proving that the future of robotics prototyping is accessible to everyone.

🤖 What it does

Artifex is a fully interactive robotics simulation powered by LLMs. Users are presented with a tabletop scene containing colored cubes, a puck, a goalpost, and a highly accurate simulated Franka Emika Panda 7-DOF robotic arm.

Through a premium, glassmorphic conversational interface — designed to be clean and instantly presentation-ready — users simply type commands like:

"Push the puck into the goalpost."

The Core Innovation: Autonomous Continuous Learning

The magic of Artifex is its true autonomy. The AI controller doesn't blindly execute a script — it operates on a continuous learning loop:

📡 Reads the live rigid-body state from the physics engine
📐 Calculates the kinematic distance to target
🦾 Executes the physical action via the simulated arm
👁️ Observes the result — did the puck reach the goalpost?
🔁 Self-corrects — if it missed, the agent independently analyzes the coordinate delta, adjusts its approach vector, and retries

You watch this entire autonomous decision loop unfold in stunning, low-latency 3D physics directly in the browser — complete with real-time streaming logs of the agent's internal "thought" process.

🏗️ How we built it

Artifex is engineered as a highly scalable full-stack monorepo, divided into five tightly integrated layers communicating over WebSockets.

1. 🖥️ The Physics UI (Frontend)

Built with React and Vite. The initial UI shell was scaffolded in minutes using Lovable — AI handled the three-panel layout, dark glassmorphic theme, and component wiring so we could focus on real logic instead of boilerplate. The 3D physics engine is MuJoCo (Multi-Joint dynamics with Contact) compiled to WebAssembly, running heavy rigid-body calculations — inverse kinematics, continuous collision detection — purely client-side at 60fps.

2. 💬 The AI Chat Interface

The conversational layer is built on assistant-ui — production-ready React components that gave us streaming chat, tool call rendering, and state management out of the box. We extended it with a custom AgentBridge runtime adapter to intercept WebSocket events and render the agent's live continuous-learning logs as structured, auto-scrolling Markdown blockquotes directly inside the chat thread.

3. 🧠 The AI Controller (Backend)

The brain of the operation is a FastAPI Python server powered by Google DeepMind Gemini models — specifically gemini-robotics-er-1.5-preview — for advanced spatial reasoning. The model translates natural language intent into spatial coordinates $(x, y, z)$ and calculates action trajectories using Euclidean distance:

$$ d = \sqrt{(x_{target} - x_{current})^2 + (y_{target} - y_{current})^2 + (z_{target} - z_{current})^2} $$

This distance metric feeds directly into the autonomous correction loop — if $d > \epsilon$ after execution, the agent re-plans and retries.

The entire backend was built with Augment Code — a developer workspace where agents are coordinated, specs stay alive, and every workspace is isolated. Augment held full monorepo context across sessions, coordinated multi-file edits, and let our team ship complex backend features at agent speed.

4. ⚡ The Real-Time Bridge

The backend streams live execution states across a persistent WebSocket connection:

thinking  →  plan  →  executing  →  observing  →  correcting  →  result

Our custom @assistant-ui/react adapters catch every state transition and vividly display the agent's self-correction process as structured UI blockquotes — making the AI's reasoning fully transparent and auditable.

5. ☁️ Cloud Deployment

The entire monorepo is hosted on DigitalOcean App Platform. DigitalOcean's straightforward platform made deploying a full-stack WebSocket monorepo surprisingly painless — from git push to a live, secure production domain in under an hour. We configured intelligent path-trimmed routing to serve both the static Vite frontend and the dynamic Python backend over a single unified domain.

🧗 Challenges we ran into

Challenge	What Happened	How We Solved It
Deployment Routing Conflicts	UI and Backend competed for the root web path	Engineered DigitalOcean path-trimming (e.g. `/artifex-agent`) so FastAPI correctly caught `ws://` upgrade requests without a `404`
Package Manager Clashes	Upgrading to the speedier `uv` bundler caused cloud buildpack failures	Pinned the exact `Python 3.12.8` runtime inside the container and purged all legacy `requirements.txt` files
UI State Hydration	Instantaneous WebSocket streams caused rendering jank in React	Built a custom `AgentBridge` adapter to safely buffer and format the agent's continuous-learning thought-logs into smooth, auto-scrolling UI elements

🏆 Accomplishments that we're proud of

🔁 Autonomous Correction Loop — The agent's ability to observe a missed physics interaction (like missing the goalpost), analyze the coordinate delta, and continuously self-correct without any human intervention is a genuine demonstration of reliable LLM-driven autonomy in a physical environment.
🌐 Zero-Install Physics — MuJoCo running natively in the browser via WebAssembly means anyone can try Artifex instantly — no Linux environment, no ROS stack, no gigabyte downloads.
✨ Premium User Experience — The presentation-ready chat panel, dynamic suggestion cards, Markdown-formatted learning logs, and dark glassmorphic styling make Artifex feel like a next-generation product, not a hackathon prototype.

🚀 What's next for Artifex

[ ] Add dynamic obstacle generation and massive multi-object manipulation goals (like self-assembling pyramids)
[ ] Introduce computer vision feedback so the agent can describe what it "sees" before acting, rather than purely relying on internal state data
[ ] Establish a Community Hub allowing users to save, replay, and share successful continuous-learning interaction weights

🙏 Built With & Sponsors

Artifex would not exist without these exceptional tools and platforms. Massive thanks to our sponsors:

Sponsor	Role in Artifex
	The spatial reasoning brain — `gemini-robotics-er-1.5-preview` powers every autonomous planning and self-correction loop
	Full-monorepo agent-speed development — coordinated specs, isolated workspaces, and context that never expired
	Rock-solid App Platform — `git push` to live WebSocket production in under an hour
	AI-scaffolded frontend — three-panel glassmorphic UI from design prompt to deployed app in minutes
	Production-grade streaming chat UI — the perfect host for our live autonomous-learning logs