QAI | Devpost

Inspiration

Companies struggle to maintain effective CI/CD pipelines due to the difficulties of maintaining unit tests, thinking of the countless edge cases, and race conditions that can happen when creating even a simple PR. Our goal was to automate testing into a comprehensive suite that can supplement human reviews by providing multiple alternative perspectives.

What it does + How we built it

QAI is triggered by a Github Action when a PR is made. This starts the pipeline, where the PR comments and changed code is passed in as context through an LLM (GPT-4) along with general codebase context. It creates a series of testing suites, which are general frameworks for approaching the updates, which each have a lot of sequential, automatically generated tests. As the CUA agents execute these tasks through our Python / FastAPI backend, the results are passed back in. The agents will constant be recording video, where the raw files are stored onto S3 and accessed to show replays of possible errors. If all the tests pass, the CI/CD check passes, and if it fails, ideas about fixes are generated. Overall, these results are stored in our Supabase where they can be displayed.

Challenges we ran into

The deployment of the entire pipeline was difficult due to the challenges with integrating our CI/CD pipeline with the CUA agent. Another issue was adjusting the prompts and response format so that the agent was able to understand the natural language commands and actually translate that into action on the browser. The general UX flow was difficult to refine due to the various moving parts on the frontend and a lot of information to display.

Accomplishments that we're proud of + what we learned

Regardless of all of the challenges, we're happy that we were able to have a product that works well. We're proud of deploying several computer use agent that was actually capable of performing actions, executing on testing commands, and generally finding issues with our PRs that we didn't even think of. We learned a lot about CI/CD pipelines and Github Actions, as well as building efficient ways to compress context — we maintained a running summary of the codebase to feed into the LLM, updating it with the PR contents if it passed all tests.

What's next for QAI

We want to improve on the range of tests it can perform, including more complex tasks like multi-agent collaboration when testing a chat app for example. Furthermore, we want to build in code fixing when the CI fails, and having the agent push a commit with the fixes, making it provide solutions. We know that many companies struggle with the same problems that inspired us to create QAI, so we'd like to have more developers / companies use it and provide feedback on the end to end flow.