Inspiration
We were inspired to do this project by the injustice of the Pink Tax. Pink Tax is a price markup that is applied to products that are specifically marketed towards women.
What it does
When you search for an Amazon product, the Pink Tax Patrol chrome extension takes note of the product information and sends it to a hacker-made gradient boosted decision tree algorithm that decides if a product is likely to be pink taxed. If so, the algorithm recommends other Amazon products that are not pink taxed. Those items are then displayed on the extension to be chosen from by the user.
How we built it
Chrome extension Using React.js, we created a chrome extension that displays the recommended non-pink taxed items. We also scrape the Amazon site for the critical product information that is sent to the backend for processing.
Docker Container We created an ubuntu 22.04 docker container to allow for the server and the inference engine to be ran on any of our computers without requiring a battle with dependencies. The Docker container also made it simple to run our server because we set it up to start automatically when the container started. Web Scraping Using Selenium and beautiful soup we were able to scrape data off of the active Amazon tab. After the data was collected from Amazon the data was initially poorly formatted and needed sanitization. To sanitize the data we used a variety of regex expressions and substitutions to turn a long 'product lines' string into a proper JSON file that we could use provide to our machine learning model. Machine Learning Gradient boosted random forests and a data set built from scratch to identify the pink tax. Cosine similarity to find similar products to recommend to the user
Challenges we ran into
- Pathing with both files in react and docker.
- Due to the lack of free Amazon API's to generate a database to inference off of, we instead turned to other options. Using the 100 free searches from 3 different Amazon API's, we were able to create a dataset with 10,000 elements.
- Connecting it all together
- Amazon recently (in the last month) updated their security measures, so there was no documentation on how to access product images without the affiliate API.
Accomplishments that we're proud of
- One hacker went from never having used react to building an entire chrome extension with it.
- One hacker created a Docker container and parsed a dataset to communicate between the ML algorithm and chrome extension front end.
- One hacker connected it all together using top-tier programming skills
- One hacker wrote a gradient-boosted random forests with dropout algorithm from scratch to perform NLP on the Amazon products.
What we learned
- Make a very clear list of what information everyone needs so that communication from section to section is made less painful
What's next for Pink Tax Patrol
Moving forward, we would love to implement the Pink Tax Patrol with other websites outside of Amazon. Being able to dynamically suggest equivalent products across websites would offer our customers a wider variety of products and give them an easier way to save money.
Built With
- ai
- amazon
- blood
- computer
- docker
- flask
- github
- joblib
- lightgbm
- machine-learning
- numpy
- pandas
- python
- react
- restful
- scikit-learn
- sweat
- tears
Log in or sign up for Devpost to join the conversation.