Inspiration

Our inspiration for this project was the ever-growing amount of services that we sign up to in our lives, and how we often loose track of how far our personal information permeates the internet. Many of these services also become abandoned over time and leave themselves vulnerable to exposing the personal information we gave them.

What it does

PaperTrail is an online tool that helps users to visualise their digital hygiene and how large their internet footprint is. A user can enter their:

  • username
  • full name
  • organisations they are a part of (e.g. Newcastle University)
  • domains they own (e.g. fullname.com)

and PaperTrail will scour the internet to find high-confidence connections to the user. For example, finding a webpage where the full name is mentioned alongside the organisation (e.g. on a LinkedIn page). In addition to this, PaperTrail will warn the user if their accounts are registered on any websites that have been breached.

How we built it

The application was primarily written in Python using a Flask web server. We also made use of 3rd party APIs and tools, such as HaveIBeenPwned, Sherlock, and Bing Search API in order to get as broad of a data set as possible.

The key with using the Bing API was developing precise queries that would yield results closely related to the user.

The front-end was styled using the CSS framework Bootstrap in order to be able to rapidly prototype.

Challenges we ran into

The most challenging issues we face during the project was the availability of tools and the reliability of deployment methods. The vast scale of the internet means that a lot of specialist tools are costly or have severe rate limits. This meant that we needed to be creative with the limited amount of information we had and try to make as few requests as possible. Although we managed to deploy our application, it does have some limitations compared to running it locally. The intensive queries we make can often be a while and web browsers are less forgiving about timeouts when connecting to remote servers.

Accomplishments that we're proud of

We are proud that our application is functional and presentable within the short time frame. We are also proud of the fact that we created an application that could give people genuine use.

We believe that a tool like PaperTrail can really shed light on the extent of a persons digital footprint. Often, the general public are vastly unaware of how far their online presence can spread and the implications it can have.

What we learned

Coming from a background of primarily client-side development, we learned a great deal about networks and the APIs available for finding crawled data. We also developed our skill in data analytics as we were required to collate all this information into a understandable format.

What's next for Paper Trail

In the future, we will look to extend our sources of information (such as back links) and increase the accuracy of our results. Furthermore, we would like to extend our range of visualisation options, for example showing how connections interlink in the form of a graph diagram.

It is also our plan to add an information section to PaperTrail, to help teach users how to protect their online presence. This would be particularly useful for the most vulnerable users of the internet such as children and the elderly. Demonstration sessions could easily be setup with mock to show groups the most severe of cases.

Built With

Share this project:

Updates