Inspiration

As developers, we know the pain of having to maintain a large codebase.

The more a project grows, the more difficult it becomes to keep it clean and efficient.

The idea was to create a tool that streamlines and enhances the code quality for software developers and organizations.

It aims to help developers easily identify and refactor code duplications and improve the overall maintainability of their projects.

The inspiration stems from the common need in the software development industry to reduce redundancy and enhance code efficiency.

What it does

AI Code Dryer is a software solution that simplifies code quality improvement in the following ways:

  • Cloning, parsing, and embedding: After setup, the app clones the repository and parses its files to generate meaningful chunks of code, generates embeddings, and stores them for similarity searches. It supports various programming languages with a focus on PHP, Python, JavaScript, and TypeScript.

  • Semantic Search: The app provides a way to easily and quickly perform similarity searches within the repository, allowing users to query the code base using natural language.

  • Integration with Jira/BitBucket: Users can view a summary of detected duplications directly within their BitBucket repository and Jira project page.

  • Code Analysis and Clustering: The app detects clusters of duplications, which are then grouped and stored separately. On each cluster, the user will be able to perform two main actions:

  1. Instructions: The app will generate instructions, in natural language, on how to refactor each detected group of code duplications. Each instruction is accompanied by a more reusable code snippet that could replace the existing ones.

  2. Autonomous Refactor: An AI agent will automatically generate new code files, and refactor the existing ones involving duplications. The updated code will be pushed in a dedicated git branch.

How we built it

AI Code Dryer is built as a software solution that integrates various components and technologies. Here's a summary of the stack used:

  • Frontend: the app is integrated directly within BitBucket and Jira. To do so, it uses the Forge Cloud SDK with UI Kit and TypeScript.

  • Backend: handles the heavy lifting of the application. It manages user registrations, Git repository operations, code parsing, embedding, clustering, and AI-driven code improvement processes. It’s built-in Python with FastAPI to easily add other AI models in the future.

  • Database: the app involves several dedicated MongoDB databases to store code chunks, embeddings, clusters, and other related information. Each organization will have its own database to guarantee the segregation of the environments.

  • Machine Learning: after generating the embeddings using OpenAI’s ada-02 model, we generate clusters using Agglomerative Hierarchical Clustering, and we store them as vectors to perform KNN searches. To refactor the code and generate instructions, we use a mix of ad hoc prompts for each language and function calling.

Challenges we ran into

While developing the app, we have presented several challenges, including:

  • Code Parsing: Parsing and storing code chunks from various programming languages in a structured manner could be challenging. Existing solutions, such as the one provided by LangChain, do not split the code into meaningful chunks, therefore, we had to use a different AST parser to extract the code chunks.

  • AI Refactoring: Implementing autonomous AI-driven code refactoring is a significant task, as it involves understanding code logic and generating refactoring suggestions. We experimented with several approaches, from entirely autonomous agents to specific prompt chains, before finding the right balance between autonomy and accuracy.

  • Integration Complexity: Integrating with BitBucket and Jira to provide a seamless user experience within these platforms can be complex. At first, we planned to create one unique app for both platforms, but we then discovered that cross-platform apps are not supported, forcing us to rework the integration.

Accomplishments that we're proud of

  • Search by natural language: We are proud of the semantic search feature, which allows users to query the code base using natural language. We've been surprised by the usability of this feature and the accuracy of the results.
  • AI-driven refactoring: At the beginning we were skeptical about the possibility of implementing an AI-driven refactoring feature, but we are proud of the results we achieved.
  • Overall Submission: We're enthusiastic about the chance to take part in this hackathon and the opportunity to present our project to the community, as we were afraid we wouldn't be able to complete it in time.

What we learned

  • AST Parsing: we learned how to parse various programming languages with their ASTs.
  • OpenAI Functions: we experimented with OpenAI's functions to generate structured outputs.
  • AI Agents: we studied and developed AI agents with different scopes and capabilities.

What's next for AI Code Dryer

The future of AI Code Dryer could involve several enhancements and expansions:

  • VS Code Extension: We plan to develop a VS Code extension to allow developers to use the tool directly within their IDE.
  • Supported Languages: We plan to improve and test further the app's ability to handle other programming languages.
  • Improve Jira Integration: Allow one-click Jira issue creation using the AI-generated refactoring instructions.

Notes for the Codegeist Unleashed Hackathon Judges

Please, use the PRO version during the demo, and feel free to skip the email.

If you need a quick repository example, feel free to fork this one (forked from Vercel):

https://bitbucket.org/ai-code-dryer/next-commerce

Built With

Share this project:

Updates