The world is flooding with data and information. Everyone has become 'content creators'. People don't think twice before sharing or publishing something. Quite a small fraction of us care about the authenticity and ethics of data and information. Even the data consumers often get into the trap of unethical or inauthentic information that is poured upon us. We even lack time to cross-check every site we visit or every post we see. With the relevance to the current scenario of "Data data everywhere, not a drop to trust! we came forward with this project!
What it does
The project is an effort in the direction of coping with the very important issue of authenticity and security for the users.
We share a lot of data with the world and keeping the log of it is impossible, but who is imposter is here to help. It keeps log of all the data, breaches, shared information with various websites and if future there is data breach, you have the data set to find out the imposters among hundreds.
Additionallu, it is a Chrome Extension that helps you check the issues you might face on a site based on the cookies the site uses. We implemented a Fake News Detector for Twitter posts. This would blur out a tweet whose "fakeness-probability" is more than 0.9 or 90%. But would give you an option to deblur it in case you wish to see what these creepy imposters are up to, Haha! We also implemented a "Form/Input scraper" in order to scrape the labels of a Google form and judge the safety of it, however, a lot of it becomes pretty obvious! Like if some form is asking for your credit card details, our crawlers would cross-check it, etc. We also included a posting feature in the prototype in which people can come up and post their views on a site and others can decide if they wanna use it or how much do they wanna use it. This feature is a little glitchy for some reason but on it!
Your data can be categorised into 3 category:
- Most secure data: Credit, Debit cards etc
- Scamming purpose data: Mobile Number where you can get potential spam calls if you expose it to someone.
- Normal Data: Your email adresses, etc. It keeps a minimal record of everydata you give to the websites and categorise those data as according.
How we built it
Challenges we ran into
- Initially, a lot of things were a little unclear, even the theme of the sprint.
- We had never build a chrome extension in the past and that was a big learning experience for a our team and at the same time the biggest challenge
- Problem authentication due to not being the main and persistence window and that was breaking the connection endpoints
- Had to clean datasets and normalize distribution which was a big challenge for our team
- Also deploying the ML Model in tensorflow.js bugged us real time.
- Working on web scraping seemed very easy and quick in the beginning, and it is true, but for it to function everywhere irrespective of the platform we are on, was a little time taking and we ran into a few issues in this too.
- But, that's okay, this is just devpost deadline, the project is still open and our programming brains too!
- Fixing a time for the team mates to meet was becoming a little difficult due to the other commitments like Interviews, Exams, University Stuff.
Accomplishments that we're proud of
We were able to collaborate even after the various collaboration challenges we faced. We were able to delegate tasks in an organized manner. Kept each other motivated and had fun and laughters along side too!
Now we have bagged our first extension and first ML data model for some team members. We are proud of our learnings.
What we learned
- We learned about data ethics, the way how sites store our current information, session cookies, sharing and using our information etc.
- Learnt more about web scraping
- Built ML Models and learnt ML concepts alongside
- Authentication and chrome Extensions
- The frontend and backend nature of chrome extension and workflow of chrome extension
- Figma and designing
- Deploying ML models to heroku and aws
What's next for Who is Imposter?
The implementation of categorization of data labels and scrapers in a more efficient and broader range. A wider range of perspectives for the functionalities covering authenticity and security for the users. Discussion forum and help sections for users. managing and monitoring cookies in even better way by keeping log of expired cookies We are also looking to implement a tracker detection where in any case if people are tracking you, it may be found out and blocked. Applying ML and regression algorithms to predict the imposters(threatful sites) beforehand.
A brighter sunlight for our "who is imposter"