💡 Inspiration
In today's digital age, criminals leave extensive digital footprints across multiple platforms, but law enforcement agencies struggle to manually aggregate this scattered information. Traditional investigation methods are time-consuming, often taking weeks to compile a suspect's online presence. We were inspired by real-world cases where crucial evidence was hidden across social media, forums, and the dark web.
Our goal was to build an automated OSINT (Open Source Intelligence) tool that could gather, correlate, and analyze publicly available information from 183+ platforms in minutes rather than weeks, empowering investigators with actionable intelligence while maintaining ethical boundaries.
🔍 What it does
OSINT Profiler is a comprehensive criminal profiling system that:
- Multi-Source Scanning: Automatically searches Twitter, Reddit, GitHub, 183+ social platforms (via Sherlock), dark web forums, and public databases
- Intelligent Correlation: Cross-references findings to identify connections between usernames, emails, phone numbers, and locations across platforms
- Phone Intelligence: Analyzes Indian (+91) and international phone numbers with web mention tracking
- Batch Processing: Scans multiple suspects simultaneously with automatic report generation
- Professional Reporting: Generates law enforcement-grade PDF reports and interactive network graphs visualizing relationships
- Real-time Analysis: Scores intelligence value (0-100) and filters high-priority findings
- Database Management: Stores all findings in SQLite for historical analysis and comparison
Users simply enter a username, email, phone number, or domain, and the system automatically profiles the target's complete digital footprint within minutes.
⚙️ How we built it
Technology Stack:
- Frontend: PyQt6 for modern, responsive GUI with tabbed interface
- Backend: Python with modular architecture (20+ specialized modules)
- Database: SQLite with SQLAlchemy ORM for efficient data persistence
- Scraping: BeautifulSoup, Requests with fallback mechanisms and rate limiting
- OSINT Tools: Sherlock (username enumeration), Nitter (Twitter), Ahmia (dark web)
- Reporting: ReportLab (PDF generation), PyVis (interactive network graphs)
- Analysis: Custom correlation engine with similarity matching and entity extraction
Architecture:
- MVC pattern with separation of concerns
- Worker threads for non-blocking UI during scans
- Intelligent retry logic with multiple data source fallbacks
- Regex-based entity extraction for emails, phones, URLs
- Confidence scoring system for correlations (65%-100%)
Development Process:
- Built core scanning modules for each platform
- Implemented database layer with ORM
- Created GUI with real-time logging
- Developed correlation engine to find cross-platform connections
- Added professional PDF and graph generation
- Implemented batch scanning and unified reporting
- Extensive testing and bug fixing (false positives, phone formatting)
🚧 Challenges we ran into
Rate Limiting & API Restrictions: Public APIs and scraping endpoints frequently rate-limit requests. We solved this by implementing multiple Nitter instances as fallbacks, adding 2-second delays between scans, and using exponential backoff.
False Positive Correlations: Initially, our correlation engine flagged organizational accounts (NASA, SpaceX) as identity matches when they were merely Twitter mentions. We fixed this by creating an exclusion list and improving confidence scoring algorithms.
Phone Number Normalization: Extracting phone numbers from text was challenging due to various formats and date patterns. We refined our regex to preserve country codes (+91) and exclude dates.
Dark Web Access: Ahmia searches require Tor proxy. We added graceful fallback when Tor isn't available and clear error messages.
Cross-Platform Data Inconsistency: Different platforms return data in vastly different formats. We built platform-specific parsers with robust error handling.
GUI Responsiveness: Long-running scans blocked the UI. We implemented QThread workers to keep the interface responsive during batch operations.
🏆 Accomplishments that we're proud of
✅ 183+ Platform Coverage: Successfully integrated Sherlock to scan over 183 social platforms automatically
✅ Advanced Correlation Engine: Built from scratch - identifies bio matches, location verification, and cross-platform username patterns with 75-100% confidence scoring
✅ Production-Ready Quality: Law enforcement-grade PDF reports with professional layouts, intelligence classification, and detailed findings
✅ Batch Intelligence: Automated sequential scanning of multiple suspects with unified reporting - a feature not commonly found in existing OSINT tools
✅ Interactive Visualizations: Network graphs that clearly show relationships and correlations with color-coded sources
✅ Clean Architecture: Modular design with 35+ Python files, 5000+ lines of code, following best practices
✅ Real-World Applicability: This isn't just an academic project - it's a functional tool that could be deployed for actual investigations
✅ Ethical Design: Built-in safeguards and clear legal disclaimers to prevent misuse
📚 What we learned
Technical Skills:
- Advanced web scraping with anti-detection techniques
- Multi-threaded programming for responsive UIs
- Database design and ORM optimization
- PDF generation with ReportLab styling
- Network graph visualization with PyVis
- Regex mastery for entity extraction
Cybersecurity Concepts:
- OSINT methodologies and ethical boundaries
- Correlation analysis and confidence scoring
- Intelligence filtering and prioritization
- Dark web navigation and Tor integration
- Privacy implications of public data aggregation
Software Engineering:
- Importance of modular architecture for maintainability
- Fallback mechanisms for robust applications
- User experience design for complex systems
- Error handling and graceful degradation
- Testing and debugging strategies
Lessons Learned:
- Start with multiple data source fallbacks from day one
- User feedback is crucial - our batch scanning feature came from recognizing investigators need to profile multiple suspects
- Performance matters - threading transformed user experience
- Documentation and code comments save hours during debugging
🚀 What's next for Automated Suspect Profiling Tool
Short-term Enhancements:
- Machine Learning Integration: Train models to predict likely aliases and associated accounts based on behavioral patterns
- Geolocation Mapping: Visualize suspect locations on interactive maps using extracted location data
- Sentiment Analysis: Analyze tone and sentiment in social media posts to assess threat levels
- Automated Alerts: Real-time monitoring for new activity from profiled suspects
- Mobile App: Cross-platform deployment for field investigators
Long-term Vision:
- API Development: RESTful API for integration with existing law enforcement systems
- Blockchain Analysis: Add cryptocurrency wallet tracking and transaction analysis
- Image Recognition: Facial recognition across social media to find additional accounts
- Collaboration Features: Multi-user access with role-based permissions for investigation teams
- AI-Powered Recommendations: Suggest additional search strategies based on initial findings
- Compliance Framework: GDPR/legal compliance tools for proper authorization documentation
- Cloud Deployment: Scalable cloud infrastructure for large-scale operations
Academic Extensions:
- Publish research paper on correlation algorithms
- Open-source core modules for OSINT community
- Develop training curriculum for cybersecurity students
- Create ethical OSINT certification program
Our vision is to make OSINT Profiler the industry-standard tool for ethical digital investigations, bridging the gap between scattered online data and actionable intelligence while maintaining the highest ethical standards.
Log in or sign up for Devpost to join the conversation.