I saw the following blog post on Hacker News. I found it super interesting and I really liked the idea. The author didn't share how they built the tool but I was able to work out how to recreate it myself.
What it does
It underlines potential mistakes in your code using a code-generation AI.
How I built it
I used a notebook on Google Colab to calculate the predicted probabilities of different tokens appearing in certain places inside your code.
Challenges I ran into
Large language models are large. The CodeGen model was way too large to run on my computer was too and crashed the Python interpreter. I had to run the model using Google Colab and import the results into a user interface I created on my computer.
Accomplishments that I am proud of
I was able to make use of the Salesforce CodeGen model to calculate the probability of certain tokens appearing in different contexts of your code. This is cool.
I was also able to create a semantic linter. This is also cool.
The model was able to spot the incorrect syntax for array declaration in C. It was also able to pick up inconsistencies between the intention to add and the usage of the
-= in-place subtraction operator. It was also able to detect the description of the computation as a product was incorrect and the model was also able to recognise that the number inside the for loop condition should be 5 and not 6.
What I learned
I became more comfortable using transformer models. I also learned that the human body requires sleep and rest is important.
What's next for Coderly
Thinking about societal good. Too often vulnerable people depend on critical code that we cannot let go wrong. I believe that an AI-powered semantic linter is a step in an alternative direction, it is very interesting to explore.