Java TDD-Based Robust AI Code File Implementation Service

The agentic test runner in progress
Process Flowchart

Inspiration

I do a lot of experimentation in my own time on automated AI code modification. My personal R&D lab for this is my Jetbrains IDE integrated plugin Codactor (https://codactor.com), which I intend to open source in celebration of this Hackathon. Furthermore, I work as a Software Engineer 3 on the Agentic AI Platform and AI Code Modernization team at PayPal. I'm here representing myself of course, not PayPal, with approval from my manager to compete with that stipulation.

I believe entire programs can be constructed with automation using AI, but only if we first solve the problem of robustly and provably implementing individual code files successfully with a scalable service. I have made here a service to do so utilizing TDD principles as a first step in that direction.

Other inspirations: automated program generators such as AutoGPT, Devin, Cursor, etc.

What it does

This program utilizes TDD (Test Driven Development) principles to automatically generate code implementations, and subsequently fixes any problems found through recursive unit testing to ensure the code implementation works and passes all tests.

This agentic AI process makes strategic use of LLM's for their creativity, as well as explicit computation where necessary to achieve its goal.

The following is its detailed process:

Rather than simply implementing the code, the program first makes the Java interface for the code file utilizing the class name and the file description provided.
Based on the code file name, description, and interface code (which details the class method structure), it creates a description of all unit tests it will build for this code file.

^ This is useful for when your intended code file implementation has many aspects, for example 8 distinct traits. As you know, the more things you ask an LLM to do in a single prompt, the more likely it is to only succeed on implementing a portion of those requests. For instance, it implements code with 5 out of the 8 requested traits, with 3 left out.

This is a way of, for instance, breaking up those 8 traits into 8 simple, separate unit tests. Each gets a dedicated prompt for this next step:

For each unit test description coupled with the class name, implementation file path, and interface code, asynchronously create the unit test code to test the implementation (it's told to specifically utilize JUnit4, as JUnit5 seems to not compile well using my tools).
Finally, generate the implementation code.
Next, we begin the recursive testing/fixing process: dynamically compile and load the interface, implementation, and all unit test code files, then run the unit tests.
If all unit tests pass, terminate the service. Otherwise:
Fetch the first compilation error or test failure for either the unit tests or the implementation itself. Bring this code + the implementation code to the AI and ask it what should be fixed: the implementation code or the unit test itself?
Based on the AI LLM's decision, have the LLM modify the unit test code OR the implementation code to ensure the subject unit test passes.
Rinse and repeat: dynamically recompile + load the interface, implementation, and all unit test classes, then run the tests.
If there are more errors than previously OR less total unit tests are present, revert the most recent code change.
Revert to step 6 and repeat the recursive loop until all tests pass.
The code file implementation is tested and provably successful!

How we built it

I built this service on top of my JetBrains IDE plugin, Codactor: an AI assistant built directly into the code editor which also serves as my R&D lab in AI Coding Automation https://codactor.com. The code for this plugin will be open sourced in honor of this submission.

Challenges we ran into

Intellij community edition does not provide programatic api access to its JUnit plugin/unit test runner features available in the UI. As such, I had to essentially recreate this functionality from scratch.
I found that, for a unit test run to reflect the absolute most recent changes, the implementation code file and the unit tests being run must first be: a. Dynamically recompiled, and b. Dynamically loaded with a custom class loader that does not defer to the cache for these specific files.
I had to have the AI LLM write the unit tests JUnit4 as I discovered that JUnit5 would not function with the dynamic compiler/loader/unit test runner tools I developed, and I could not find publicly accessible programatic tools that allowed JUnit5 to work.

Accomplishments that we're proud of

This service sets out to do exactly what we had hoped: Utilizes TDD to ensure the robust creation and implementation of a code file. And most importantly:

It fixes itself intelligently like I hypothesized that it would! Given any outcome, be it compilation errors or test failures in either the unit tests or implementation itself, it stitches its way into getting all unit tests to compile and pass.

Imagine this process in a CI/CD pipeline! We are now one step closer to developing robust working programs automatically with AI.

What we learned

This process WILL work as intended.
All of the challenges we ran into detail a specific insight learned about this process.

What's next for Java TDD-Based Robust AI Code File Implementation Service

Now I have 2 diverging paths:

Continue the research: Look into scaling this up to generate full programs with this process running for each code file.
Strengthen the process: Integrate this automated process with a cloud based CI/CD pipeline to ensure maximum scalability, and robustness.

Built With

Updates

Jack Hays started this project — Feb 16, 2025 02:55 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.