Inspiration

Originally I was in a group of four, writing a react web app to create a college-based digital marketplace for secondhand clothing. I was tasked with creating a tagging system for the images users would upload, as well as with creating a basic search engine for the site. My group somewhat fell apart, but the idea of creating a search-term-based model for something as individualistic stuck with me, and I kept working on it.

What it does

At the moment, not much. It's fine-tuned from the default ViT model that Torchvision provides, but it lacks the domain, positional understanding, and is somewhat inefficient for a model which was originally intended to be responsive for node.JS applications.

How we built it

I began with creating a Selenium script in Jupyter Notebooks to scrape the first 300 entries of a collection of trending search terms on Depop, the popular secondhand/thrifted ecommerce site. I then experimented with different methods of containerizing a ViT model for easy training, but experienced a lot of restrictions with packages like HuggingFace's dataset package. I then switched to a standard PyTorch TorchVision ViT model, and trained the model from there.

Challenges we ran into

Google Colab and setting up cloud compute and import statements was a consistent challenge, as well as the challenge of working with computer vision for the first time. I've worked with PyTorch for language classification and other NLP tasks before, but computer vision tasks were entirely out of my knowledge base before starting work on this model.

Accomplishments that we're proud of

  • Building a dataset from images and associated metadata
  • Evaluating multiple pre-trained datasets for their usability in ML model training
  • Training a PyTorch model locally against a time crunch

What we learned

  • Building a usable dataset is the main challenge of manually fine-tuning a computer vision model.

What's next for ViT for Clothing Analysis

Making it actually work! I plan to increase the domain of the model as much as I can, and build an API to output tags for databases that are used by frontend programmers.

Built With

Share this project:

Updates