Inspiration

AgentSculptor is a developer assistant that helps restructure, refactor, and modernize codebases through natural language prompts.

The idea came from a very personal frustration: when I code, I often catch myself rejecting boring tasks. I’d rather explore a new library, experiment with tools, or build features than spend hours splitting files, cleaning imports, or modernizing code. Refactoring isn’t necessarily hard — it’s just not exciting. And because of the lack of motivation, it usually takes me much longer than it should.

AgentSculptor is my attempt to automate these dull chores and free developers to focus on what they actually enjoy. I wanted to see if a reasoning model like GPT-OSS-120B could handle this as a coding partner — asking for confirmation, making safe changes, and even running tests if needed.

What it does

agentsculptor-cli ./test_project "create fast api app with clear instructions on how to run it."

AgentSculptor takes a natural-language request (e.g., “split this file into modules” or “make this code modern”) and executes the workflow step by step:

  • Takes the path of the project root as input.

  • Scans the folder, reads file contents, and builds a project representation so the model has full context.

  • Plans the required actions using GPT-OSS-120B.

  • Backs up files before editing to keep everything safe.

  • Creates or modifies files with the requested refactorings.

  • Updates imports across the project consistently.

  • Runs tests to validate the changes.

  • Iterates again if fixes are needed.

How we built it

AgentSculptor is built around a simple but powerful loop: plan → execute → validate → iterate.

  • Backend & Orchestration: We implemented an Agent Loop in Python that manages tool calls. Each tool (e.g., file creation, refactoring, updating imports, running tests) is wrapped in a safe error-handling layer so execution always returns structured results.

  • Project Context Preparation: Before the model plans any actions, AgentSculptor scans the project root, reads all files, and analyzes structure (number of lines, functions, classes). This “context snapshot” helps the model reason more reliably about the codebase.

  • Reasoning Model: For planning and step-by-step reasoning, we connected to a local vLLM server running GPT-OSS-120B. The planner generates tool calls instead of raw code, which makes the system more predictable and debuggable.

  • Execution Safety: Before making changes, files are backed up. When refactoring requires creating new files, AgentSculptor asks for confirmation. The same happens before modifying existing files.

  • Validation: After changes, it can run tests automatically (via pytest) to ensure nothing broke. If tests fail, the loop can re-plan and attempt fixes.

  • Containerization: The vLLM server is run with Docker, making it reproducible. We use a single docker-compose.yml that specifies GPU usage, memory constraints, and ports.

  • Demo Scenarios: To showcase, we built scripted demo cases: splitting a large file into modules, merging helpers, modernizing Dockerfiles, and generating a brand-new FastAPI app.

Challenges we ran into

  • Tool–Model Interface: One of the hardest parts was making the model interact reliably with tools. Getting GPT-OSS-120B to consistently produce valid JSON plans for tool execution was tricky. The model can be creative, which is great for exploration but risky for automation. Early versions returned inconsistent JSON or unexpected arguments. We had to design strict wrappers around tools to handle errors gracefully and avoid crashes.

  • Context Management: Feeding large codebases to the model is not trivial. We had to build a context preparation step that scans the project root and summarizes file structures so the model has just enough detail without overwhelming its context window.

  • Balancing Autonomy vs. Safety: Letting the model modify files directly was risky. We solved this by adding backups, explicit confirmation steps, and iterative validation with tests. Getting this flow right took a lot of experimentation.

  • Local Deployment of GPT-OSS-120B: Running such a large model locally required GPU memory tuning, Docker configuration, and parallelization tweaks. Even small mistakes in settings (like prefix caching flags) caused unexpected failures.

  • Iteration Control: Sometimes the model would “overthink” and keep producing no-op actions. We had to adjust how we detect stalling and gracefully stop after a reasonable number of iterations.

Accomplishments that we're proud of

  • Full local reasoning agent: We built a local AI agent that can analyze, refactor, and modernize a Python codebase entirely from natural language prompts, using GPT-OSS-120B.

  • Safe iterative workflow: The agent backs up files, confirms actions, modifies code, updates imports, and runs tests—automatically iterating when fixes are needed.

  • Context-aware operations: prepare_context allows the model to understand the project structure deeply—functions, classes, imports, and dependencies—so it can make safe, consistent changes.

  • Real-world showcase tasks: AgentSculptor successfully split large files into modules, merged helper files, modernized old Docker setups, and generated a fully functional FastAPI app with Docker and unit tests. We also tested it on several more complex projects to validate its robustness and flexibility.

  • CLI usability: The workflow can be triggered from a single CLI command, making it approachable for developers.

  • Demonstrated reasoning power of GPT-OSS-120B locally: We showed that even a large reasoning model can act as a coding partner without relying on cloud APIs, maintaining full control over projects.

What we learned

  • Motivation matters: Automating tedious coding tasks can significantly speed up development and reduce the mental friction of refactoring.

  • Local reasoning models are capable: GPT-OSS-120B can plan and execute structured coding changes safely without relying on external APIs.

  • Iterative workflows are key: A step-by-step process with backups, dependency tracking, and automated tests ensures reliability even on complex projects.

  • Context is critical: Building a full project representation (prepare_context) allows the model to reason across files, understand dependencies, and avoid breaking changes.

  • User-in-the-loop improves trust: Asking for confirmations before major changes helps balance automation with safety and keeps the developer confident in the agent’s actions.

What's next for Local tool that reforges your codebase

  • Support for more languages and frameworks: Extend beyond Python to handle other popular languages and web frameworks.

  • Advanced refactorings: Implement more sophisticated transformations like API migrations, pattern-based optimizations, and performance improvements.

  • Better testing integration: Auto-generate more comprehensive unit and integration tests for refactored code.

  • IDE / GUI integration: Make the agent more accessible with IDE plugins or a lightweight GUI for non-CLI users.

  • Self-healing and re-planning: Enhance the loop to automatically detect failures, propose fixes, and retry without manual intervention.

  • Vector store / knowledge database integration: Store project context, refactoring history, and reusable patterns in a vector database to improve reasoning and speed for future projects.

Built With

  • and-code-transformations-llm-server:-vllm-for-efficient-local-llm-inference-tools-&-utilities:-docker
  • ast
  • black
  • black-(code-formatting)
  • cli
  • docker
  • gpt-oss-120b
  • planning
  • pytest
  • pytest-(testing)
  • python-3.12
  • requests
  • requests-ai-/-llm:-gpt-oss-120b-for-reasoning
  • vllm
  • vs-code-devcontainer
Share this project:

Updates