Since 2020, we have become increasingly reliant on zoom and online documents. This causes our drives to be quickly crowded, and many of us can relate to this. Thus, we decided to write a program that could automatically sort our documents fast, efficient, and effective.
What it does
We used the unsupervised machine learning method of K-Means clustering to classify the text contents of documents and sort them into folders representing their respective topics.
How we built it
Our project was built using python. We needed to learn the google drive and authentication APIs in order to read documents and sort them. Then, we used scikit-learn's k-means clustering algorithm to group and organize documents.
Challenges we ran into
We encountered a number of problems, including figuring out how to properly log into Google Drive, how to loop through all the files in google drive, and how to move the files. We were all new to the Google Drive API, so it was a great learning experience to figure out the basics of creating folders, reading files, and moving files.
Accomplishments that we're proud of
We're proud of how well the K-means clustering works, and the fact that we got everything working related to Google Drive. We're also very proud of the logo and the puns we can make along with the name. We really wanted to drive home the puns.
What we learned
We learned how to use the Google Drive API in Python. We acquired valuable knowledge about AI and found that picking the right type of model makes a huge difference. Most importantly, we learned that if we kept trying through seemingly impossible challenges, we found solutions more often than not.
What's next for DriveThru
We think we can do more than just sort these files into unordered groups. After this, we can make it better by analyzing the content of the folders and giving suggested names for them. We could also make a webapp or chrome extension that allows you to select certain files or certain folders to perform the clustering on, instead of the whole Google Drive at once.