Why this?

When I'd download a new wallpaper for myself, or some file I'd found online it'd be impossible for me to ever find it later. Why? They'd have a garbled mess of letters and numbers as its file name. So, I wanted to change that and make it so the file names actually fit their images.

How it began

Initially, this project was going to be built using Python and LangChain, to interface with Ollama. The model that was going to be used was IBM's Granite 3.2 Vision model due to its small size and speed. However, almost all of this had to be scrapped halfway through the second day. LangChain did not easily allow for image file pass-through directly to Ollama, and I needed something that had clear instructions on how to do it.

What it became

So, this project pivoted to a new language and a new library. Java was chosen because of the library Ollama4j, a small project made specifically for allowing Ollama to interface with Java. Their documentation provided clear instructions on how to add image files to prompts in the code. This made completing the project significantly easier, and allowed for it to flow seamlessly. Granite 3.2 was also dropped due to its poor quality results, and lack of proper instruction following. In its place came LLaVa, a tried-and-true open multimodal LLM that I knew worked well, albeit a bit slow. All of these new elements came together to create a high quality final product that can continue to be improved upon.

What I learned

  • First time using Apache Maven for build automation
  • Learned how to use Ollama through Java
  • Became comfortable with recognizing when to start from scratch

What's next for renaime

  • adding RAG for document extraction so PDFs can also be automatically renamed
  • allowing for more models to be used, such as Google's brand new Gemma 3 (not yet on Ollama)

Built With

Share this project:

Updates