When our team attended the Opening Ceremony for the Datathon 2021, we were immediately intrigued by the Bill.com challenge. Although we all come from diverse academic backgrounds, we found a common interest in business modeling and technology. The chance to use an innovative algorithm on a federal dataset inspired us to work together for this year's competition.
What it does
We determine agency similarity with the following edge weight formula. For any given transaction, we compare the spending amount to the average money spent by the agency. Above average transactions indicate stronger relationships. We also take into account the number and frequency of interactions.
We then use an iterative agglomerative clustering algorithm. This algorithm starts with a candidate clustering for vendors. Based on this clustering, it finds the average weight between all vendors in a cluster and each agency. The two agencies with the smallest total difference in corresponding edge weights are combined into a cluster. Once agencies are clustered, the process repeats in the other direction using these agency candidate clusters to generate new vendor clusters. The algorithm iterates until convergence is observed.
How we built it
We built our project in Python using Pandas and NumPy. We used R for data visualization.
Challenges we ran into
Initially our agglomerative algorithm would only find one large cluster with many single node groups. By adding a group size tuning parameter, we simultaneously solved this problem while creating a way for users to affect grouping sensitivity.
Accomplishments that we're proud of
None of us had implemented this specific algorithm before yesterday. Because we had to design, understand, and implement this technique in one day, we are proud of the fact that not only was our algorithm successfully able to recommend groups, but that the groupings make clear logical sense. We believe our algorithm is fully capable of solving the given task while remaining generalizable, tunable, and scalable in real-world applications.
What we learned
All of us learned the operating principles behind our iterative agglomerative algorithm. We also gained significant experience visualizing complex data sets in an intuitive manner.
What's next for Team Aglo - Bill.com Challenge
We're excited to compete in future competitions such as hackrice!