DeepClaws

DeepClaws is a hackathon prototype for a self-improving SQL agent loop on LiveSQLBench.

The current stack is:

  • Ghost for isolated Postgres eval environments via DB forks
  • Kimi (moonshotai/Kimi-K2.5 on Tinker) as the agent inference model
  • Overclaw for eval-loop analysis and optimization
  • Macroscope as the next PR-generation handoff after Overclaw produces a report

Demo Flow

The UI drives one question through this sequence:

  1. Create a fresh Ghost fork as the eval environment.
  2. Bind the selected LiveSQLBench case to that fork.
  3. Run overclaw optimize kimi-go-brr --fast on the one-case dataset.
  4. Collect Overclaw artifacts and prepare the Macroscope handoff.

Overclaw is the component that runs the agent loop. The UI no longer does a redundant standalone agent run before Overclaw.

Run The Demo UI

Prerequisites:

  • Ghost CLI installed and logged in
  • .env populated with Tinker and other required keys
  • .overclaw/.env populated for Overclaw models
  • benchmark data present under data/livesqlbench

Start the UI:

.\.venv\Scripts\python.exe scripts\run_demo_ui.py

Then open http://127.0.0.1:8000.

Important Paths

Current Limitation

The Macroscope stage is currently a handoff point in the UI. The PR generation step is not automated in the demo yet.

Built With

Share this project:

Updates