Jelly

Inspiration

Claude Code is revolutionising the way we code. We believe we can improve its performance in long horizon programming tasks using Jelly.

The Overarching Idea Behind Jelly

Focus on the requirements. Our architecture keeps the requirements at the core of its design.
The agent receives feedback on its code and continually improves upon it.
The ability to continuously improve until it's able to achieve anything. If the requirements are too advanced for the current system, modification requirements are produced and the agent builds a new more capable system recursively until it's capable enough to handle the task.

The How

A lot of thought went into the logic loop that enables Jelly. Its core system revolves around a planning agent, a programming agent, a test generating agent, and a test executor agent. The planner interacts with the user to create a comprehensive requirements file from which all the other agents refer back to. The programmer creates code and also iterates on it using feedback from the test generator. The latter generates diverse test cases independently of the codebase – a deliberate choice that reduces bias. Crucially, the test generator can also dynamically search for and install MCPs to run test cases on, allowing for near-universal tech stack compatibility. The code executor runs the test cases in the terminal or on the MCPs. This cycles until the code passes all the required test cases before it is presented to the user.

The exact loop that allows Jelly to write effective code enables it to update its own codebase, whenever it deems it necessary, to fulfil the user requirements.

We used Claude and Gemini to plan and code the project.

Challenges we ran into

Ran into a P0 bug in the MCP Python SDK ( issue #1452), that made it impossible to use MCPs in a standard way. Coming up with the novel multi-agent system and testing the design decisions to make sure led to actual improvements in the output of the model.