Automated Suspect Profiling tool

💡 Inspiration

In today's digital age, criminals leave extensive digital footprints across multiple platforms, but law enforcement agencies struggle to manually aggregate this scattered information. Traditional investigation methods are time-consuming, often taking weeks to compile a suspect's online presence. We were inspired by real-world cases where crucial evidence was hidden across social media, forums, and the dark web.

Our goal was to build an automated OSINT (Open Source Intelligence) tool that could gather, correlate, and analyze publicly available information from 183+ platforms in minutes rather than weeks, empowering investigators with actionable intelligence while maintaining ethical boundaries.

🔍 What it does

OSINT Profiler is a comprehensive criminal profiling system that:

Multi-Source Scanning: Automatically searches Twitter, Reddit, GitHub, 183+ social platforms (via Sherlock), dark web forums, and public databases
Intelligent Correlation: Cross-references findings to identify connections between usernames, emails, phone numbers, and locations across platforms
Phone Intelligence: Analyzes Indian (+91) and international phone numbers with web mention tracking
Batch Processing: Scans multiple suspects simultaneously with automatic report generation
Professional Reporting: Generates law enforcement-grade PDF reports and interactive network graphs visualizing relationships
Real-time Analysis: Scores intelligence value (0-100) and filters high-priority findings
Database Management: Stores all findings in SQLite for historical analysis and comparison

Users simply enter a username, email, phone number, or domain, and the system automatically profiles the target's complete digital footprint within minutes.

⚙️ How we built it

Technology Stack:

Frontend: PyQt6 for modern, responsive GUI with tabbed interface
Backend: Python with modular architecture (20+ specialized modules)
Database: SQLite with SQLAlchemy ORM for efficient data persistence
Scraping: BeautifulSoup, Requests with fallback mechanisms and rate limiting
OSINT Tools: Sherlock (username enumeration), Nitter (Twitter), Ahmia (dark web)
Reporting: ReportLab (PDF generation), PyVis (interactive network graphs)
Analysis: Custom correlation engine with similarity matching and entity extraction

Architecture:

MVC pattern with separation of concerns
Worker threads for non-blocking UI during scans
Intelligent retry logic with multiple data source fallbacks
Regex-based entity extraction for emails, phones, URLs
Confidence scoring system for correlations (65%-100%)

Development Process:

Built core scanning modules for each platform
Implemented database layer with ORM
Created GUI with real-time logging
Developed correlation engine to find cross-platform connections
Added professional PDF and graph generation
Implemented batch scanning and unified reporting
Extensive testing and bug fixing (false positives, phone formatting)

🚧 Challenges we ran into

Rate Limiting & API Restrictions: Public APIs and scraping endpoints frequently rate-limit requests. We solved this by implementing multiple Nitter instances as fallbacks, adding 2-second delays between scans, and using exponential backoff.
False Positive Correlations: Initially, our correlation engine flagged organizational accounts (NASA, SpaceX) as identity matches when they were merely Twitter mentions. We fixed this by creating an exclusion list and improving confidence scoring algorithms.
Phone Number Normalization: Extracting phone numbers from text was challenging due to various formats and date patterns. We refined our regex to preserve country codes (+91) and exclude dates.
Dark Web Access: Ahmia searches require Tor proxy. We added graceful fallback when Tor isn't available and clear error messages.
Cross-Platform Data Inconsistency: Different platforms return data in vastly different formats. We built platform-specific parsers with robust error handling.
GUI Responsiveness: Long-running scans blocked the UI. We implemented QThread workers to keep the interface responsive during batch operations.

🏆 Accomplishments that we're proud of

✅ 183+ Platform Coverage: Successfully integrated Sherlock to scan over 183 social platforms automatically

✅ Advanced Correlation Engine: Built from scratch - identifies bio matches, location verification, and cross-platform username patterns with 75-100% confidence scoring

✅ Production-Ready Quality: Law enforcement-grade PDF reports with professional layouts, intelligence classification, and detailed findings

✅ Batch Intelligence: Automated sequential scanning of multiple suspects with unified reporting - a feature not commonly found in existing OSINT tools

✅ Interactive Visualizations: Network graphs that clearly show relationships and correlations with color-coded sources

✅ Clean Architecture: Modular design with 35+ Python files, 5000+ lines of code, following best practices

✅ Real-World Applicability: This isn't just an academic project - it's a functional tool that could be deployed for actual investigations

✅ Ethical Design: Built-in safeguards and clear legal disclaimers to prevent misuse

📚 What we learned

Technical Skills:

Advanced web scraping with anti-detection techniques
Multi-threaded programming for responsive UIs
Database design and ORM optimization
PDF generation with ReportLab styling
Network graph visualization with PyVis
Regex mastery for entity extraction

Cybersecurity Concepts:

OSINT methodologies and ethical boundaries
Correlation analysis and confidence scoring
Intelligence filtering and prioritization
Dark web navigation and Tor integration
Privacy implications of public data aggregation

Software Engineering:

Importance of modular architecture for maintainability
Fallback mechanisms for robust applications
User experience design for complex systems
Error handling and graceful degradation
Testing and debugging strategies

Lessons Learned:

Start with multiple data source fallbacks from day one
User feedback is crucial - our batch scanning feature came from recognizing investigators need to profile multiple suspects
Performance matters - threading transformed user experience
Documentation and code comments save hours during debugging

🚀 What's next for Automated Suspect Profiling Tool

Short-term Enhancements:

Machine Learning Integration: Train models to predict likely aliases and associated accounts based on behavioral patterns
Geolocation Mapping: Visualize suspect locations on interactive maps using extracted location data
Sentiment Analysis: Analyze tone and sentiment in social media posts to assess threat levels
Automated Alerts: Real-time monitoring for new activity from profiled suspects
Mobile App: Cross-platform deployment for field investigators

Long-term Vision:

API Development: RESTful API for integration with existing law enforcement systems
Blockchain Analysis: Add cryptocurrency wallet tracking and transaction analysis
Image Recognition: Facial recognition across social media to find additional accounts
Collaboration Features: Multi-user access with role-based permissions for investigation teams
AI-Powered Recommendations: Suggest additional search strategies based on initial findings
Compliance Framework: GDPR/legal compliance tools for proper authorization documentation
Cloud Deployment: Scalable cloud infrastructure for large-scale operations

Academic Extensions:

Publish research paper on correlation algorithms
Open-source core modules for OSINT community
Develop training curriculum for cybersecurity students
Create ethical OSINT certification program

Our vision is to make OSINT Profiler the industry-standard tool for ethical digital investigations, bridging the gap between scattered online data and actionable intelligence while maintaining the highest ethical standards.

Built With

ahmia-dark-web-search
beautifulsoup4
dns-mx-records
dnspython
git
github
github-api
json
lxml
nitter-api
phonenumbers
pyinstaller
pyqt6
python-3.10+
python-whois
pyvis
qt-designer
qthread
reddit
regex
reportlab
requests
sherlock
sqlalchemy
sqlite
threading
urllib3
whois-services

Updates

Sanika Patil started this project — Feb 07, 2026 03:34 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.