URL Threat Evaluation and Risk Scoring System

Inspiration

Phishing attacks and malicious websites have become one of the most common cybersecurity threats worldwide. Every day, users are tricked into visiting fake websites that imitate trusted services, leading to stolen credentials, financial losses, and malware infections. While modern browsers provide basic warnings, attackers can now obtain valid SSL certificates and create highly convincing phishing pages, making manual detection difficult for average users.

We wanted to build a solution that goes beyond simple blacklist checking and empowers users with a clear understanding of website risks. Our goal was to create a lightweight, accessible, and intelligent system capable of evaluating URLs in real time and providing actionable security insights.

What It Does

The URL Threat Evaluation and Risk Scoring System analyzes websites using a hybrid approach that combines heuristic security checks with machine learning-based phishing detection.

The system performs:

  • SSL certificate validation
  • WHOIS lookup and domain age analysis
  • Security header inspection
  • Open port analysis
  • Keyword-based phishing detection
  • Unicode and Punycode spoofing detection
  • ASCII encoding analysis
  • Machine learning phishing probability prediction
  • Overall risk score generation

Based on the collected evidence, websites are classified as:

  • Safe
  • Suspicious
  • Dangerous

The platform also provides detailed explanations of detected risks so users can make informed decisions before interacting with a website.

How We Built It

Frontend

  • HTML
  • CSS
  • JavaScript
  • Responsive UI with Light and Dark Theme support

Backend

  • Python
  • Flask REST API

Security Analysis Components

  • SSL certificate validation
  • WHOIS domain intelligence
  • Domain age verification
  • Security header analysis
  • URL structure analysis
  • Port scanning checks
  • ASCII and Unicode inspection

Machine Learning

A Logistic Regression model was integrated to estimate phishing probability using URL-based security features. The model complements traditional heuristic checks and helps identify suspicious patterns that static rules may miss.

Deployment

  • Git and GitHub for version control
  • Branching workflow using Dev and Beta branches
  • Vercel deployment for frontend hosting

Challenges We Ran Into

One of the biggest challenges was balancing accuracy with usability. Many legitimate websites intentionally omit certain security headers for compatibility or performance reasons. Initially, the system classified these sites as high risk, resulting in false positives. We refined the scoring mechanism to account for real-world scenarios and reduce unnecessary warnings.

Another challenge was handling slow or unstable internet connections. External services such as WHOIS lookups and SSL validation occasionally experienced delays, which could affect analysis results. To improve reliability, we implemented error handling and status reporting to clearly communicate when network conditions might impact accuracy.

Domain spoofing detection also presented difficulties. Attackers increasingly use Unicode characters and visually similar domain names to deceive users. Detecting these patterns required additional processing through Punycode conversion and ASCII analysis.

What We Learned

Throughout the project, we gained practical experience in cybersecurity, web security standards, machine learning integration, and secure software development.

Key learnings included:

  • Understanding phishing attack techniques and detection methods
  • Working with SSL certificates and trust validation
  • Analyzing WHOIS and domain intelligence data
  • Building REST APIs using Flask
  • Integrating machine learning models into security applications
  • Designing meaningful risk scoring systems
  • Handling false positives and real-world edge cases
  • Creating user-friendly cybersecurity tools

Future Improvements

We plan to enhance the project by:

  • Integrating advanced AI-based phishing detection models
  • Expanding threat intelligence feeds
  • Adding real-time malware reputation services
  • Implementing browser extension support
  • Providing detailed threat reports and export functionality
  • Improving detection of newly registered phishing domains
  • Supporting large-scale URL reputation analysis

Impact

This project demonstrates how combining traditional cybersecurity techniques with machine learning can create a practical and accessible solution for detecting malicious websites. By providing transparent risk scores and clear explanations, the system helps users identify threats more effectively and browse the internet with greater confidence.

Built With

Share this project:

Updates