Inspiration

The 2017 Equifax data breach exposed how traditional vulnerability scoring systems like CVSS alone are insufficient for real-world prioritization. Despite having a known high-severity flaw (CVE-2017-5638), Equifax failed to patch it on time because they overlooked its high exploit probability and this can happen across different sectors like healthcare,banking, leading to massive losses and security casualities. This inspired us to build a smarter, data-driven model that doesn’t just look at theoretical severity but also considers how likely a vulnerability is to be exploited and how attackers actually use it.


What We Built

We developed a Predictive Security Scoring System (PSSS) that combines:

  • CVSS (severity) from the National Vulnerability Database (NVD)
  • EPSS (Exploit Prediction Scoring System) for real-world exploit likelihood
  • MITRE ATT&CK mappings to link vulnerabilities with real adversary tactics and techniques

Using Python, we built a complete data pipeline that:

  1. Extracts and cleans CVE data from NVD.
  2. Predicts missing CVSS metric values using TF-IDF + Logistic Regression trained on vulnerability descriptions.
  3. Merges EPSS exploit probabilities for each CVE.
  4. Maps each CVE to ATT&CK techniques (e.g., T1190 – Initial Access, T1068 – Privilege Escalation).
  5. Computes a unified PSSS_final score using weighted contributions from CVSS, EPSS, and ATT&CK.

The result is a dynamic ranking of vulnerabilities that reflects both severity and active threat context.


🧠 What We Learned

  • How to work with large JSON and CSV datasets from public cybersecurity repositories.
  • How text mining and machine learning can fill gaps in structured security data.
  • How to interpret and apply MITRE ATT&CK mappings to enrich vulnerability context.
  • That real-world exploitability (EPSS) often outweighs raw severity (CVSS) in prioritization decisions.

Challenges

  • Parsing and cleaning thousands of CVE records with inconsistent metadata.
  • Managing the class imbalance in CVSS metric prediction.
  • Matching CVE IDs accurately across NVD, EPSS, and MITRE datasets.
  • Selecting appropriate weights (α, β, γ) to balance severity, exploit probability, and attacker behavior.

Impact

By combining these three intelligence sources, our PSSS model provides context-aware, predictive vulnerability prioritization.
It helps security teams focus on vulnerabilities that are not just severe, but actively exploited and aligned with known attack techniques—preventing incidents like Equifax from repeating.


Deployment Areas / usage by:

  • SOC analysts for real-time threat prioritization

  • Incident response teams for triaging alerts

  • Vulnerability management teams for patch planning

Built With

Share this project:

Updates