Inspiration
This developer has taken a particular interest in the issue of digital privacy and the related field of digital forensics, and Specter provided an opportunity to engage with those interests in a practical sense. Specter in particular was inspired by the recent changes in privacy law enacted by a number of governments, including ones that allow internet service providers to sell their customers' internet search histories. The significance of this issue to internet privacy, and the fact that the profundity of the effects of these laws is not always evident to the public, made this issue an exciting and ambitious one to take on - especially as a developer's first hackathon project.
What it does
Specter is intended as a proof-of-concept tool; its goal is primarily to raise awareness of the importance of internet privacy, and to emphasize the amount of personal information that can be gathered merely from the day to day searches of the average individual. It also has applications in digital forensics, especially among law enforcement, as future iterations will hopefully allow for the accurate identification and location of individuals of interest, given search history data recovered either from personal electronic devices or from public computers.
The current iteration focuses primarily on the extraction of geographical data from the text of search queries, especially on queries that directly involve locations such as those for directions or weather data. Although this is only a subset of the possible information that can be extracted from the search query data, it provides a window into the amount of intrusive information that can be extracted from search histories. Future iterations will expand Specter to include a more comprehensive view of the kinds of information that the most basic of internet activities (surfing the internet) can broadcast to our internet service providers and to the internet as a whole.
The technical details
Specter relies on JSON files of search browsing history; the ones used in the development of the product were the developer's own search history, as available for download from Google. The JSON files are merged, processed into Python's native list structure, and then parsed for the requisite information. Once the geographical data was extracted, Google's Geocoding API is used to convert the data into coordinates, and an algorithm was run to predict likely locations for the unknown individual's home and work/school, and to estimate the error in the predictions based on the standard deviation of the data. The results are displayed in the form of a heatmap (see image above), with the brightest areas associated with the highest probably of being frequented by the unknown individual.
The scope
Specter's focus on the analysis of search term data provides an innovative way of analyzing browser histories, especially in the field of digital forensics; even in scenarios where an individual has been surfing the internet through an anonymous/public connection or where there is little to no useful browser metadata present, Specter makes it possible for law enforcement officials to gain insight into the identity of the unknown person. Whether it be in pursuit of a cybercriminal with a near invisible online presence, or gaining intelligence on a group threatening our national security, Specter offers the possibility of being able to use of common commodity of search data in service of the common good.
The existence of an automated tool - even as a proof of concept - to extract personally identifiable data from mere search history (without even the use of the attached metadata) also radically changes the field of cybersecurity. Current methods of staying anonymous on the internet may very well require improvement to withstand scrutiny from improved versions of such a tool; it would most certainly alter the future perspectives taken by cybersecurity experts in the future. Given the weakly protected nature of much search query data at the moment, the internet only stands to benefit from such a shift in paradigm.
Indeed, Specter was designed with the hope that - whether implicitly or directly - it might be able to effect positive change across the internet. Given the immediacy of the issue of internet privacy, and the increasing presence of the internet in modern life, Specter has the ability to affect thousands of lives by raising awareness about internet security and privacy. Even with the presence of security tools such as Tor and Opera, which allow for IP masking and anonymous browsing, the specter of one's identity still lives on in the data that one generates; Specter serves to highlight this reality and to empower people to take proactive measures to protect themselves, their identities, and their data online with the same rigor that they secure their physical possessions. Indeed, by highlighting the amount of personal information that online communication and messages can reveal, and exposing the methods by which it can be extracted, Specter is also able to contribute in particular to the reduction of doxxing and other online harassment by giving individuals an understanding of how such personal information can be uncovered, and thereby allowing (and indeed encouraging) them to take action to protect themselves. This type of empowerment and knowledge, perhaps more than any privacy tool, can help provide impetus for the development of a safer and more secure internet for everyone.
The product name
The product name, Specter, has a dual meaning; it alludes both to the product's capacity to allow for one to spectate over the personal details yielded by a simple search history, and to the trace evidence of identity that is left behind on online search data (which the product seeks to discern). The logo - which features the TouchID fingerprint image - illustrates the indelible connection between our digital lives (phones, etc.) and our private/personal ones, which Specter aims to highlight.
What's next
Specter can expand to take into account increasingly more of the terms and patterns contained in search query data. As it becomes a more accurate tool, it can serve as a benchmark for whether or not various methods of privatizing search data (e.g. browser extensions that conduct random searches in order to disguise a user's actual browser history amongst noise) are in fact effective in protecting privacy and the public.
Designed and programmed by Caitlyn Singam
Log in or sign up for Devpost to join the conversation.