Anonymous browsing, while necessary to ensure one's privacy, can be misused by a lot of people. Ex. People use Tor to download Torrents containing movies, and other copyrighted data, which causes a huge loss to many companies. Since Tor follows end end encryption, it makes it almost impossible for tracking.
What it does
This program uses Machine Learning techniques to detect which website the user might've visited using Tor, by figuring out the pattern in the packets sent and received.
How I built it
Wireshark is used to sniff the packets being transmitted and received. Then I used Feature Engineering which would help in getting useful features for the data to figure out patterns in communication between the host and the Tor Server. A Support Vector Machine with a Radial Basis Function Kernel suited best for this task. Packet sniffing is done using Wireshark, and the training was done using Scikit.
Challenges I ran into
Had a few problems installing and setting up Tor since I've mostly focused on Data Science and Machine Learning so far. Feature Engineering is a complicated task, it's hard to figure out what features are good for the training data.
Accomplishments that I'm proud of
I'm glad I learnt to build a tool that can help misuse of copyrighted data, and hopefully in future , can also be extended to tracking thieves, terror groups, etc.
What I learned
Pattern recognition with Machine Learning is a really important task. The most important task is feature engineering.
What's next for Anonymous browsing tracker
We can improve the accuracy of prediction using deep learning. I will be using a Siamese Recurrent Neural Network which is a powerful model capable of pattern recognition, although it has only be used for Sentence Similarity so far. This code will be uploaded next week.