DeepClaws
DeepClaws is a hackathon prototype for a self-improving SQL agent loop on LiveSQLBench.
The current stack is:
- Ghost for isolated Postgres eval environments via DB forks
- Kimi (
moonshotai/Kimi-K2.5on Tinker) as the agent inference model - Overclaw for eval-loop analysis and optimization
- Macroscope as the next PR-generation handoff after Overclaw produces a report
Demo Flow
The UI drives one question through this sequence:
- Create a fresh Ghost fork as the eval environment.
- Bind the selected LiveSQLBench case to that fork.
- Run
overclaw optimize kimi-go-brr --faston the one-case dataset. - Collect Overclaw artifacts and prepare the Macroscope handoff.
Overclaw is the component that runs the agent loop. The UI no longer does a redundant standalone agent run before Overclaw.
Run The Demo UI
Prerequisites:
- Ghost CLI installed and logged in
.envpopulated with Tinker and other required keys.overclaw/.envpopulated for Overclaw models- benchmark data present under data/livesqlbench
Start the UI:
.\.venv\Scripts\python.exe scripts\run_demo_ui.py
Then open http://127.0.0.1:8000.
Important Paths
- UI: web/index.html
- Demo server: demo/server.py
- Overclaw wrapper: agent/overclaw_agent.py
- GT-backed dataset: eval/dataset.gt.json
- One-case Overclaw dataset: .overclaw/agents/kimi-go-brr/setup_spec/dataset.json
Current Limitation
The Macroscope stage is currently a handoff point in the UI. The PR generation step is not automated in the demo yet.

Log in or sign up for Devpost to join the conversation.