NeoNarcoNLP

Inspiration

The increasing use of encrypted messaging apps by organized crime.

Real cases where criminals were caught using writing style analysis (e.g., the Unabomber).
The challenge of profiling when traditional metadata is absent or anonymized.
The opportunity to combine linguistic style, emotional tone, and behavioral cues for deeper analysis. ## What it does

How we built it

Python for processing and ML

spaCy, NLTK for NLP preprocessing
Empath, Text2Emotion for psycholinguistic feature extraction
scikit-learn, XGBoost, BERT for classification
Streamlit for the interactive user interface
Neo4j / NetworkX (optional) for visualizing criminal networks

🧱 Workflow

Data Collection: Acquired and generated encrypted-style chat datasets (synthetic and anonymized).
Preprocessing: Cleaned and normalized texts, interpreted emojis, handled coded slang.
Feature Extraction:
- Stylometric: word length, POS tags, punctuation, etc.
- Psycholinguistic: emotion scores, power/dominance cues, social language.
Role Classification: Trained ML models to predict criminal roles using labeled feature sets.
Visualization: Developed an easy-to-use dashboard for real-time analysis and insights.

Challenges we ran into

🚫 Lack of Real Labeled Data Legal and ethical barriers made it hard to access actual encrypted criminal chat datasets.

🧠 Obfuscated & Coded Language Use of slang, emojis, and indirect phrasing made language interpretation difficult.

🔄 Feature Fusion Complexity Merging stylometric, psycholinguistic, and contextual features into one model was non-trivial.

📉 Model Interpretability Explaining why a user was classified as a smuggler or supplier was crucial—but hard without transparency tools.

🌐 Domain Adaptation Generic NLP models struggled to adapt to criminal lingo without fine-tuning on domain-specific data.

Accomplishments that we're proud of

🧠 Role Prediction Using Language Only Accurately classified roles like supplier, smuggler, or middleman based purely on chat patterns.

✍️ Combined Stylometry & Psycholinguistics Successfully fused writing style and psychological cues into a unified profiling system.

🔍 Decrypted Coded Communication Patterns Handled slang, emojis, and metaphorical language to extract real behavioral signals.

📊 Built a Real-Time Profiling Dashboard Created an interactive interface for investigators to visualize roles, risk levels, and linguistic fingerprints.

🧪 Created a Domain-Specific NLP Dataset Generated a synthetic but realistic criminal chat dataset tailored for stylometric and behavioral analysis.

What we learned

Stylometry matters: Writing style can help distinguish roles and even individuals.
Psycholinguistic signals are subtle: Emotional cues and cognitive markers help expose a user’s intent.
Coded language is prevalent: Emojis, slang, and metaphors are used to hide meaning—yet patterns still emerge.
Multimodal features improve accuracy: Merging text structure with emotional and semantic insights yields better predictions.