NOOX: An Open-Source AI-Driven Desktop Automation Device

Inspiration

NOOX began with a simple question: LLMs can plan and reason—but why can’t they directly act on our computers?

I wanted a small, open, portable device that could:

  • Control a computer via HID
  • Run cross-platform shell commands
  • Talk to LLMs
  • Perform multi-step automation autonomously

This became the foundation for NOOX, an ESP32-S3–based intelligent hardware platform merging peripherals, desktop automation, and AI planning.


What I Built

NOOX integrates:

1. Peripheral Hardware

  • USB HID keyboard/mouse emulation
  • USB CDC bidirectional channel
  • OLED screen + 3 buttons
  • GPIO + RGB LED

2. Desktop Automation

A Go-based Host Agent (auto-downloaded via HID) runs shell commands on Windows/Linux/macOS and communicates using JSON over USB CDC.

3. AI Autonomous Planning

With OpenAI/DeepSeek/OpenRouter support, the LLM can:

  • Plan multi-step tasks
  • Call tools like run_command, hid_keyboard_type, gpio_set
  • Iterate until a goal is complete

The system can perform tasks like:

“Create a file, populate it, compress it, and open the folder.”


How I Built It

  • Firmware: Arduino framework + ESP-IDF + FreeRTOS
  • Storage: LittleFS for configs, web UI, and host agent
  • UI: WebSocket-driven dark-theme console (Chat / Advanced Mode)
  • Memory: Heavy JSON/LLM handling stored in 8MB PSRAM
  • Host Agent: Cross-platform BLOB served from ESP32 and executed automatically
  • Hardware:

    • ESP32-S3-WROOM-1
    • SSD1315 OLED (I²C)
    • PMOS USB/battery auto-switching
    • Buttons + LEDs + WS2812

What I Learned

  • How to manage embedded memory to avoid fragmentation
  • Designing reliable USB HID sequences and timing
  • Crafting a CDC JSON protocol resilient to truncation and slow hosts
  • Building a WebSocket-driven UI with real-time streaming
  • Structuring LLM tool-calling for predictable autonomous behavior

Key Challenges

  • Getting HID scripts to reliably bootstrap PowerShell across systems
  • Parsing large LLM JSON responses within ESP32 memory limits
  • Handling differences between PowerShell, pwsh, cmd, bash, and sh
  • Avoiding WebSocket race conditions with multiple clients
  • Ensuring autonomous planning doesn't become unsafe

Security Notes

Because AI can run shell commands and simulate keyboard input, NOOX should never be executed with admin/root privileges. A VM or sandbox is recommended for experimentation.


* Final Thoughts**

NOOX was an exploration of what happens when you combine embedded hardware, desktop automation, and modern LLMs into a single portable device. I learned deeply across hardware, firmware, software, and AI orchestration—and the result is a platform that lets AI literally act on your computer.

Built With

Share this project:

Updates