SentinelIQ: NLP Cyber Threat & Phishing Detection

Inspiration

With cyber threats evolving rapidly, traditional rule-based firewalls are no longer enough. Hackers use sophisticated linguistic obfuscation, zero-day phishing, and LLM prompt injections to bypass standard security filters. We were inspired to build an intelligent "linguistic firewall" that understands the semantic intent of text, stopping these advanced semantic attacks dead in their tracks using modern Natural Language Processing (NLP).

What it does

SentinelIQ is an advanced AI Cyber Threat Detection Platform. It acts as an intelligent pipeline that analyzes raw text, emails, URLs, and LLM prompts in real-time. It uses Deep Learning to classify whether the input is a phishing attempt, a prompt injection (jailbreak), or a safe payload. It also features an Explainable AI (XAI) dashboard that extracts Named Entities (NER) to highlight the exact trigger words, hidden IPs, or malicious domains that caused the threat alert.

How we built it

We built the backend using Python and FastAPI for high-speed asynchronous inferencing. At the core of our NLP pipeline, we utilized HuggingFace Transformers, PyTorch, and heavily optimized Regular Expressions for text sanitization and homoglyph detection. The frontend is a highly responsive dashboard built with React, TypeScript, and TailwindCSS, communicating seamlessly with our backend. The entire platform is deployed serverlessly across Vercel (frontend) and Render (backend).

Challenges we ran into

One of our biggest hurdles was deploying memory-intensive Deep Learning NLP models within the strict constraints of cloud free-tiers. We had to heavily optimize our Python environment (strictly enforcing Python 3.11 and specific torch wheel versions) and engineer a dynamic "Lightweight Mode" that gracefully falls back to rapid heuristic engines when server memory limits are reached. Additionally, resolving strict Pyright type-checking errors across our complex ML modules required extensive architectural refactoring.

Accomplishments that we're proud of

We successfully implemented a true end-to-end NLP pipeline from scratch—starting with raw RegEx cleaning, moving to Byte-Pair tokenization, and finishing with deep sequence classification. We are incredibly proud of achieving sub-200ms latency, proving that heavy semantic NLP analysis can be executed fast enough to act as a real-time web interceptor.

What we learned

We learned that traditional NLP methods (like aggressive stop-word removal) actually hurt cybersecurity models, as threat actors actively use common grammatical connectors to disguise payload paths. We learned how to leverage context-aware attention mechanisms to solve this. We also gained immense practical experience in CI/CD pipeline troubleshooting, bridging complex Python machine learning environments with modern React frontends.

What's next for SentinelIQ: NLP Cyber Threat & Phishing Detection

We plan to implement Extractive Text Summarization (using algorithms like LexRank) to automatically generate one-sentence threat summaries for busy security analysts. We also aim to expand our NLP models to natively support regional Indian dialects, protecting users against localized, non-English social engineering attacks.