Pushy
What is it?
Receives webhooks from Github repos whenever anything is pushed to it or scrapes a repo for keys. Goes through the additions and checks using regex patterns and entropy calculations to detect secrets. Currently set up as a CI check.
How does it work?
Initial repository scanning
The initial repository scanning sets up the repo in the database and runs a scan against the master branch of the whole repo by using Git to clone the repo and then running a scan locally in the same way push scanning works.
Push scanning
For every push hook it receives from Github, it scans the addition diffs for possible keys. It scans line by line based off patterns in the database. The system currently supports whitelisting filepath (gitignore style glob), whitelisting patterns (regex), and blacklisting patterns (regex), applied in this order. In addition to the global patterns (applies to all repos), there is also per repo patterns which applies only to a specific repo. These repo patterns are combined with the global patterns during scanning.
Deployment
Deployed on http://pushy.tech/repo. Username: test@test.com, password: password.
Technologies used:
Written in Ruby on Rails with Sidekiq for background job processing. Deployed on Google Cloud Platform.
What could be improved:
UI could be MUCH better. Entropy and regex could be made to filter better. Machine learning could be added (this project allows for a machine learning API, but was unfortunately not implemented due to time constraints). The machine learning would use a Naive Bayes classifier to further filter out false positives. Features like control panels for repos, GitHub Oauth integration, etc. could all be added to make this better. It could also be deployed using Kubernetes, which would make the redundancy much higher.

Log in or sign up for Devpost to join the conversation.