Inspiration
The increasing use of encrypted messaging apps by organized crime.
- Real cases where criminals were caught using writing style analysis (e.g., the Unabomber).
- The challenge of profiling when traditional metadata is absent or anonymized.
- The opportunity to combine linguistic style, emotional tone, and behavioral cues for deeper analysis. ## What it does
How we built it
Python for processing and ML
- spaCy, NLTK for NLP preprocessing
- Empath, Text2Emotion for psycholinguistic feature extraction
- scikit-learn, XGBoost, BERT for classification
- Streamlit for the interactive user interface
- Neo4j / NetworkX (optional) for visualizing criminal networks
🧱 Workflow
- Data Collection: Acquired and generated encrypted-style chat datasets (synthetic and anonymized).
- Preprocessing: Cleaned and normalized texts, interpreted emojis, handled coded slang.
- Feature Extraction:
- Stylometric: word length, POS tags, punctuation, etc.
- Psycholinguistic: emotion scores, power/dominance cues, social language.
- Role Classification: Trained ML models to predict criminal roles using labeled feature sets.
- Visualization: Developed an easy-to-use dashboard for real-time analysis and insights.
Challenges we ran into
🚫 Lack of Real Labeled Data Legal and ethical barriers made it hard to access actual encrypted criminal chat datasets.
🧠 Obfuscated & Coded Language Use of slang, emojis, and indirect phrasing made language interpretation difficult.
🔄 Feature Fusion Complexity Merging stylometric, psycholinguistic, and contextual features into one model was non-trivial.
📉 Model Interpretability Explaining why a user was classified as a smuggler or supplier was crucial—but hard without transparency tools.
🌐 Domain Adaptation Generic NLP models struggled to adapt to criminal lingo without fine-tuning on domain-specific data.
Accomplishments that we're proud of
🧠 Role Prediction Using Language Only Accurately classified roles like supplier, smuggler, or middleman based purely on chat patterns.
✍️ Combined Stylometry & Psycholinguistics Successfully fused writing style and psychological cues into a unified profiling system.
🔍 Decrypted Coded Communication Patterns Handled slang, emojis, and metaphorical language to extract real behavioral signals.
📊 Built a Real-Time Profiling Dashboard Created an interactive interface for investigators to visualize roles, risk levels, and linguistic fingerprints.
🧪 Created a Domain-Specific NLP Dataset Generated a synthetic but realistic criminal chat dataset tailored for stylometric and behavioral analysis.
What we learned
- Stylometry matters: Writing style can help distinguish roles and even individuals.
- Psycholinguistic signals are subtle: Emotional cues and cognitive markers help expose a user’s intent.
- Coded language is prevalent: Emojis, slang, and metaphors are used to hide meaning—yet patterns still emerge.
- Multimodal features improve accuracy: Merging text structure with emotional and semantic insights yields better predictions.
What's next for NeoNarcoNLP
Multilingual Support – Extend to regional and darkweb languages/slang.
Real-World Dataset Integration – Collaborate with law enforcement (where ethical/legal) for real encrypted data.
Role & Risk Scoring – Add features for threat level and behavioral intent classification.
Chatbot Integration – Enable real-time suspect profiling via chat interfaces.
Deployment as API – Offer as a secure tool for digital forensic teams and investigators.
Built With
- and
- empath**
- huggingface-transformers**
- ml;
- natural-language-processing
- nltk**
- scikit-learn**
- streamlit**
- text2emotion**
- the
- ui;
- we-used-**python**-with-**spacy**
Log in or sign up for Devpost to join the conversation.