Inspiration
Coming into this project, the only thing I knew about collaborative filtering was that it could be used to build recommendation systems. How to utilize the technique and the statistics behind the technique were completely foreign to me.
But through this project, I've learned a great deal about the fundamentals of collaborative filtering (ex: memory-based and model-based filters), and I understand why recommendation systems require significant attention to detail to execute properly. From choosing the proper metrics to identifying creative ways to evaluate the success of a filter, collaborative filtering poses interesting challenges to data scientists.
What it does
This project is a recommender system which recommends vendors to different agencies within Washington DC's local government. The system uses the transaction patterns of agencies similar to a given agency to recommend new vendors.
How we built it
I created a transaction matrix for agencies & vendors where each cell is the # of transactions between the specified agency & vendor. Each row of the matrix is a vector that represents the transactions of an agency. To identify similarity of agencies, I calculate the cosine of the angle between to agency vectors. Using the cosine similarity matrix, I can recommend additional vendors to an agency.
Challenges we ran into
I was initially unsure of whether to use the # of transactions between agencies & vendors or the average dollar amount of transactions. I ultimately settled on the former by using hierarchical clustering algorithms. When using the average dollar amount, the clustering created 1 massive cluster and several small clusters - a result that was unhelpful in distinguishing many of the agencies from one another. On the other hand, using the # of transactions could parse through the differences in transaction history between agencies.
Accomplishments that we're proud of
I had never completed a project with collaborative filtering techniques prior to this Datathon. But now I feel more confident taking on more complex data science challenges.
Built With
- jupyter
- python
Log in or sign up for Devpost to join the conversation.