One day, I wrote 5 hours worth of code, but I was unorganized, and lazy. So I thought to myself, theres a hackathon, why not make a ML for commenting your code bc no programmer actually does it!!!
Tir is actually a name of an Egyptian Goddess, for books and literature, I thought it would be fitting to have this name as it's converting code, into human readable stuff!
What it does
This program takes Java code as an input, and produces comments as an output. It features a GUI, and an API to interact with, because I used google cloud to process the ML.
How we built it
- DNN: Deep Neural Network
- NLP: Natural Language Processing.
The model was created in two layers, one featuring a DNN, to sort the code out, logistically, and an NLP, to produce natural language for an output.
Data gathering was mainly created by Apache CommonsIO Java library. The data had been scraped off of github to create comment-code pairs for the NLP, and to have a set to differentiate for the DNN. The comments are expected to be high quality, as the maintainers for Apache are either people who no-life programming, or college students who want to stuff commits in their github profile, to earn an internship from a FAANG company; making it a credible source of data.
A GUI had been created in java, meant to have a better viewing experience for the functions.
The Java section of the project also includes an unfinished REST library, and a pretty cool Java code tokenizer!
Challenges we ran into
The main problem here was that this was my first time using Python in ages, and my first time interacting with any NLP models. I struck gold at around 5AM, when the model started functioning; proceeded to throw it into Google Cloud, bc I had free hours on their TPUs.
Another problem in the beginning was sourcing data. The data I had sourced in the beginning was low quality, at one point the model was (literally) throwing swear words at me, [def not going to show that]; who knew programmers swear in their code (tf2 code leak lol)
Accomplishments that we're proud of
I'm proud of the fact that it actually worked! The main goal, which was to convert functions to comments, worked plenty well. I also was able to add extra stuff; for example, a RestAPI, and a not-bad-looking GUI.
I'm also proud of the fact that this application may change the landscape of comment writing and that it may possible /s.
What we learned
Since this was my fourth ML project, I learned plenty about NLPs and python, as I'm a Java/C++ main. See: Accomplishments that we're proud of, and Challenges we ran into for more information.
What's next for Tir
The model I used previously was dual layered, which was a pretty simplistic approach. To reiterate, it used a DNN to sort the code, and a NLP to convert the sorted code to natural language via the comments for training. This approach would be great for fast training, however it forces all the comments to appear on one function, which would be great for JavaDoc descriptions, however it doesn't explain logic completely.
An alternative approach would be to have a different model that selectively groups code based off of context, and put that through an NLP for more granular commenting.
Anyways, this would probably be a great research project for senior year in 2024-2025.
VERY IMPORTANT NOTE
THE VIDEO IS IN "https://drive.google.com/drive/folders/1OACX3mDBOcBrVR23K7NX6SFclO9Pz5PV?usp=sharing" DUE TO HAVING MULTIPLE VIDEOS BEING RECORDED, AND A VOICEOVER AS AN AUDIO FILE. DO NOT CLICK ON THE YOUTUBE LINK BELOW; IT IS A RICK ROLL
Log in or sign up for Devpost to join the conversation.