a preliminary version of the detection engine dashboard
our "fun logo"
Disinformation R>3

Overview

Since the CoVid19 crisis began, the scale, reach and impact of disinformation is reaching new levels.

As part of #EUvsCOVID we decided addressing the problem from an innovative angle: safeguard internet by predicting new web domains that will spread disinformation as they get registered and blocking them from gaining traffic.

Our MVP/POC uses research-level technology developed for IT security scope. We tuned it toward disinformation around CoVid19 to output both a dashboard to follow the creation of detected “future bad” web domains, and a Chrome Extension to inform users on the danger of accessing them. We have also scoped an API that could be used by other parties (NGOs, Registrars, ISPs, Ad Exchanges, Corporations, etc.) to block access to the same domains using their own tools - we leave to those partner the decision of what to do with our output – the difference between stopping disinformation and censorship is fine.

The unique value proposition is our ability to detect those sources well before they activate to spread disinformation, rather than the current methods that focus on content detection and therefore miss the critical initial days of spreading.

Much like the Covid19 has spread rapidly in some countries due to governments inaction due analysis paralysis and decision-making, we want to prevent the disinformation to appear all together. As the saying goes: “Prevention is better than waiting for a cure”.

In pre-crisis times disinformation was mostly about opinion shifting and political interests. Today, the potential cost in lives and health we are facing due the dilution of good information or the doubts casted on institutional solutions is unacceptable.

We foresee a number of developments and are confident the capability could prove widely impactful now, and after the crisis – as the information war is only at its start.

The Problem

We are not going to bother you explaining why disinformation is such a huge problem for the modern society, and how it has exploded since the pandemic is going on.

A couple of articles to refresh : Misinformation Has Created a New World Disorder – Scientific American ( https://www.scientificamerican.com/article/misinformation-has-created-a-new-world-disorder/ ) Short Assessment Of Narratives And Disinformation Around The Covid-19/Coronavirus Pandemic - EUvsDISINFO ( https://euvsdisinfo.eu/eeas-special-report-update-2-22-april/ )

We need to act now to stop disinformation spreading :

To save lives
To protect EU institutions
To safeguards our rights

But to date the typical solutions are based on information analysis, debunking and counter-views. Much progress can be brought in that sense, crowdsourced review, active citizen involvement in spreading the “known good” , and so on. The problem is the speed at which disinformation spread once published make any “post-processing” at best a containment of the issue. That is why a novel solution is required, one that avoid the appearance of disinformation all together, and a scalable one at that – able to work across many languages, countries and millions of internet resources.

The Hackathon

The original idea was brought by Luigi (France), as licensee for a novel technology on which he is building a start-up company. Scared by the impact of disinformation spreading during the Covid19 crisis, he decided to try and use such powerful capability for good after having observed Nabeel (Qatar), research lead at QCRI, was tinkering with it in a side hustle.

As the Hackathon registrations started to flow, the team formed quickly with Luca (Italy), Jerzy (Poland), Felix (Germany) and Ahmed (Cyprus) joining and aligning on the scope and plan for the next days. Aside Luigi and Nabeel, nobody had worked together before. The Slack channel became quickly invaluable to share updates and keep the work aligned.

With the workstreams cut around the competencies of the teams, work proceeded swiftly to build on the solid base of the research-level detection engine.

Direct exchanges or group synchronization calls continued as the components for the solution where assembled/built. The experience and autonomy of team members was apparent to mentors and helped progress smooth and productive.

Regular checkups with mentors helped ensure the team was on the right track and aligned with spirit of the event.

Our Solution

Profiling

At first a need exists to define “good” and “bad” sources of information to train the detection engine – as we are expert at technology we prefer to stay away from the complexity of deciding what is good or bad, fake or not, and rather prefer to use vetted and professional sources of truth to train our system. We identified sources thanks to suggestions from mentors, other hackathon participants and our knowledge and categorized them as “known bad domains” (eg. EUvsDISINFO, Poynter) and “known good domains” (eg. EUvsDISNFO “disproof”, scraping national and international authorities websites) and compiled a list of 6000 and 1200 domains respectively, that were then used to “tune” our detection engine.

Once the list formed, advanced analytics is used by an inference engine to configure and tune the detection engine (cf. Research papers submitted)

The process is iterative, the more “known good” and “known bad” are fed to the system, the better the tuning and the detection will be. As day progressed, new updates would come in …

Detecting & Monitoring

We use patented technology by Qatar Computing Research Institute (QCRI) licenced for evaluation purpose to Luigi.

A description of the technology and relevant research can be found in the attachments submitted on devpost.

The technology is originally “tuned” for IT risks (spyware, malware, virus, hacking) and a re-training had to be performed to focus it toward Covid19 disinformation. The output from the profiling work was instrumental for this quick evolution.

A dashboard updated daily provides visible trending and discovery of detected new domains. The dashboard is accessible on the homepage of our website: http://disinfobusters.eu

We cannot go to greater lengths in the description and details of the technology used as it is covered by patent, and NDAs are in place with QCRI. The quality and depth of the research papers provided shall help vetting the novelty and innovation leveraged.

Blocking

We built a working prototype for the hackathon to prove the concept, we decided to focus on a Browser (Chrome) Extension that would provide a warning page when a user tries to access one of the detected web domains.

Such extension, once installed, perform the following tasks : • Retrieve latest list of detected domains and their “maliciousness score” • Verify user visits to web sites against the retrieved list of “bad domains” • If there is a match, a new browser tab is open with warning information

Technology used are JavaScript programming language, HTML, CSS and Google Extension Development instructions. (https://developer.chrome.com/extensions)

Post Hackathon we foresee development of a robust API standard for 3rd parties to access the latest list or to verify if a given domain is malicious, this will enable automation, world-wide reach and faster scaling. We do not believe the browser extension to be a long term sustainable solution for it requires end user education and intent that will be too costly to achieve.

3rd parties we have identified and want to follow-up with to partner are:

_ NGOs operating in the space _ We contacted for validation the Global Disinformation Index (disinformationindex.org) and confirmed their interest to partner with us to further their mission: The Global Disinformation Index (GDI) aims to disrupt, defund and down-rank disinformation sites. We collectively work with governments, business and civil society. We operate on three core principles of neutrality, independence and transparency.

_ Domain Registrars _ A domain registrar (e.g. GoDaddy) has delegated authority to assign and maintain required ownership information for internet domains. We foresee their interest in enforcing terms and condition and revoking domain registration for fraudulent uses. Potentially our technology could be embedded in their verification system to block domain registration altogether.

_ Ad-Exchanges _ (e.g. Outbrain, Taboola, Criteo) fund indirectly disinformation by allocating AdSpace to fake news spreading domains. EUvsDISINFO estimates the value routed to fake news sites to be 76M € in 2019 only. Having access to our API could help ad-exchanges to prevent allocating such advertising budgets and avoid their customers reputation being damaged by posting ads on disinformation websites.

_ IT Security Providers _ (e.g. SonicWall, Palo Alto, Sophos) service their customers by preventing inbound attacks, but also by blocking access to dangerous resources. Network devices like firewalls and proxy servers already retrieve a list of “known bad” domains from available “block lists”. A web domain will be added to the lists by having been identified as source of malicious activity (malware, spyware, hacking, virus, etc.). They would benefit by our unique capability to identify such domain before any activity is performed.

_ Browser Software Vendors _ (e.g. Google, Microsoft, Mozilla, Opera, Apple) much like our Chrome extension, internet browsers include technology (aka SafeBrowsing) to prevent users to visit domains at risk. They could enhance their protection by augmenting the lists they used with output from Disinfobusters’ API.

_ Ad-Blockers _, operating similar to our Chrome extension, are a category of software used to limit access to certain domains or to limit pop-ups and other annoying information to be presented to the user. They could subscribe to our service to provide a more comprehensive protection – not just Advertising, but also disinformation - to their customers.

The service provided by the API will be a source of revenue to support the cost of operating the infrastructure and the continued development of capabilities. The service fee could vary according the type of customers (public vs commercial) and the frequency of use (how often API is used to retrieve list of domains) or volume (how many endpoints are polling the API).

While we have not developed a full business model analysis, we know that the service could replicate existing pricing models (block list providers) and considering the lack of competition at this stage have good opportunity to succeed winning partnership. Of course any endorsement / support by EU Commission will help accelerate the growth.

A full profit & losses projection and sensibility analysis around investments required will be developed post Hackathon submission to be ready for EIC support and assistance.

How we impact the crisis

By reducing publishing and subsequent spread of disinformation we can expect to:

Save lives and harm to people finding in disinformation articles “alternative” medicines and remedy (eg. 1/3rd of UK citizen believe Vodka will disinfect hands, Coronavirus can be treated with hot water)
Save lives by re-establishing trust in institutions and rule of law, and so limiting risky behaviors (non-following of quarantine rules, use of PPEs, etc.)
Limit funding to miscreants and organized crime, using disinformation to drive new revenue via scams, phishing and other deceptive campaigns (eg. Sale of “magic soap” that cleanse the body from coronavirus)
Protect EU institutions and social integrity by limiting the impact of information war from countries interested in EU dissolution or weakening (ref. EUvsDISINFO reports)
Protect EU institutions by growth of conspiracy theories and false accusation to democratic leaders of inaction and collusion.
Protect Civil Rights and Democracies

We support President of the European Commission, Ursula von der Leyen in her speech and statements :

https://ec.europa.eu/info/live-work-travel-eu/health/coronavirus-response/fighting-disinformation_en

Next Steps and Help Needed

For our prototype to become a true instrument of disinformation management, we see the following needs:

_ Partnership with EU institutions _ We would like to exchange with the team behind EUvsDISINFO, the European External Action Service’s East StratCom Task Force, and others EU teams working on disinformation issues. We expect from them :  Oversight about our approach and direction of development  Collaboration in developing our profiling system through access to their databases  Validation and adoption of our service for EU interest

_ Development of profiling input _ To further the finesse of the profiling we will:  connect with other hackathon projects that have been focusing on content evaluation, tagging and scoring – to build collaboration opportunities so their output becomes our input;  work with other organizations and NGOs (eg. Disinformationindex.org) - the same that would be beneficiaries of our output in most cases – to have access to their databases and source data, and build integration into our input pipe to the detection engine tuning;  develop ML/AI based profiling tools that could digest content and provide recommendations.

_ Company or NGO _ We did not think to how would we pursue this effort from a legal entity perspective. Various options are available , we would like the EIC support with their experts and mentors to assist in the discussion :  Build/Expand a commercial entity to build a business to provide the service described. The business of block-list providers is healthy and growing, and our novel approach will complement it. A commercial venture for the use of the technology in the IT security space is already being set-up, perhaps disinformation might be another business line or adjacency  Set-up an NGO to pursue the same goals in a non-profit context.

_ Partnership with 3rd parties _ As described in the “blocking” section of the solution, our “customers” are multiple, some public institutions, other non-governmental, other commercial. Our team will need to establish those relationships and build contractual agreements for respective engagement and eventual fees.

_ Funding _ Last but not least, we foresee funding needs that could come from EU EIC program , Hackathon Prize & partners and be complemented by private investor and crowdfunding , for:  Infrastructure build-up : the prototype runs on QCRI research-level computing, a move to scalable cloud infrastructure is required for developing scope and scale.  Hiring of personnel to engage with 3rd party to “sell” the Disinfobusters service  Hiring of developers and CTO to further develop the various components of the solution

Sizing the details will be completed over the coming days post Hackathon submission, trusting to be selected for EIC follow-up.

With the right level of funding and talent, we expect to be able to turn the prototype in a fully scalable solution in matter of weeks.

What happens after the crisis ends

Our novel approach of confronting the current problem of disinformation campaigns right at the beginning was already relevant before the current CoVid19 crisis.

It gained more importance within the crisis as the volume and harm of disinformation has grown immensely.

But the crisis shall pass, and we already know disinformation will continue and keep doing harm.

Our approach will be of utmost importance in winning information wars by reason in approaching political discourses and beyond.

The Disinfobusters are here to stay and help shoot down “bad domains” detected by our technology, before they start publishing and spreading their campaigns.

Github Chrome Extension: link

Github Detection Dashboard : link

Website : link

DEVPOST : link

Youtube video : link

Hackathon criteria

Impact Potential

Our solution is for everybody, everywhere, whatever their health condition is. The threat from disinformation goes against human rights, it is not just for EU but the world. And we can impact the entire internet, because we have a fully automated, machine-driven solution that scale. (Ref “how we impact the crisis” section)

We are using novel research, not yet available for mainstream use, and we are solving the problem in a totally different way from how it has been tackled so far. We target the source of disinformation, before it start publishing and spreading and we do not need to know or check the content published to act.

It can help millions, even billions. We think with right level of support from EU and partnership we can quickly “inoculate” the vast majority of internet users.

Technical Complexity & Novelty

Research behind the project is the result of many years of investment from QCRI, its implementation requires a sophisticated infrastructure integrated in innovative ways (ref “Our Solution” section and research papers shared in submission)

No other solution exists that PREVENT the distribution of disinformation, all the others are focused on containment and disproof.

Prototype Completion

Our prototype works, and the Chrome Extension prove it – Check the video or install it. The dashboard helps measuring the size and scale of the findings. With appropriate funding and support we can turn to a final product in matter of weeks.

Built With

advanced-analytics
chrome
forest-tree-methods
google-cloudplatform
hadoop
inference
javascript
machine-learning
r

Submitted to

The European Commission's EUvsVirus Hackathon

Created by

DevOps/Platform Engineer. Working with startups mainly on infrastructure code and platform development. Some 10+ years of general experience in IT.

Jerzy Kopaczewski
I worked on the Chrome extension and helped get the deliverables ready.

Felix T.
As licencee of the original patented technology I've brought the idea to the team and helped focus efforts

Luigi LENGUITO
Entrepreneur at heart, I love technology and multiculturalism
Private user
Ahmed Jumaah
Web developer, HCI, health informatics