Shadow-SaaS-Detector

Inspiration

The Problem is Real: Every IT department faces the same nightmare—employees secretly sign up for unauthorized SaaS tools. You discover Slack integrations storing customer data, shadow expense accounts, rogue HR software with access to SSNs. No one in IT knows about them. No audit trail. No control.

One organization we researched had $1,500/month in duplicate tools silently draining budget—Asana + Monday.com + Notion all doing the same job in different teams. But the real pain? The security exposure: An unauthorized password manager with access to 200+ employee passwords. A random recruiting app with SSNs. A free analytics tool processing customer data.

IT managers waste 20+ hours monthly manually hunting through spreadsheets and expense reports. When they finally find something, they have no playbook—just hope that revoking access doesn't break critical workflows.

Shadow SaaS Detector solves this: Find every unauthorized tool in minutes. Score the risk. Get a remediation playbook. See the savings. Act with confidence.

What it does

Shadow SaaS Detector is an enterprise-grade shadow IT discovery platform that finds unauthorized SaaS applications and tells you exactly what to do about them.

How it works:

Upload your organization's data (expense reports, browser history, employee roster)
Detect all unauthorized SaaS apps in seconds using AI-powered keyword matching
Score each app's risk: CRITICAL (immediate revocation), HIGH (review required), MEDIUM (consolidation candidate), LOW (monitor)
Analyze compliance violations (GDPR/CCPA/SOC 2/HIPAA), identify duplicate tools, calculate savings
Act using step-by-step playbooks, simulation tools, and audit logs

Core Features:

🔍 SaaS Detection Engine: Matches expenses + browser history against 500+ SaaS database
🤖 AI Risk Scoring: Google Gemini analyzes data access, security reputation, compliance risk (with rule-based fallback)
📊 Executive Dashboard: Real-time view of all shadow apps by risk level, department, category
⚠️ Threat Ticker: Live feed of critical risks as they're detected
🗺️ Attack Surface Map: Visualize which departments access risky tools and what data is exposed
💡 AI Insights: Get AI-powered analysis on risk, consolidation opportunities, and compliance violations
📋 Smart Consolidation: Identifies 3 identical tools where you could eliminate 2 (with ROI)
📜 Compliance Reports: Auto-generated GDPR/CCPA audit reports (exportable as HTML/PDF)
💰 Cost Simulator: Test impact of revoking apps before you do it
📋 Playbooks: Step-by-step remediation guides for each app
📄 Executive Brief: C-level one-pager with KPIs (shadow spend, critical risks, consolidation savings)

How we built it

Frontend (React 19 + Vite + TypeScript):

Dashboard: Multi-tab interface (Dashboard, Threat Map, Simulator, AI Insights, Demo)
Upload Wizard: Multi-step form for expenses, browser history, roster, Slack apps
Interactive Visualizations: Recharts for cost breakdown, risk distribution, department heat maps
Rich Components: Playbook modals, threat ticker, executive brief (with print support)
708 lines of intentional CSS: Dark theme, smooth animations, form styling
Code: src/components/ (Dashboard.tsx, AIInsights.tsx, Simulator.tsx, etc.)

Backend (Express.js + TypeScript + Google Generative AI):

Detection Engine (detector.ts): CSV parsing, SaaS database matching, browser history analysis
AI Risk Scoring (ai-risk-scorer.ts): Multi-prompt Gemini integration with rule-based fallback
AI Compliance (ai-compliance.ts): GDPR/CCPA/SOC 2/HIPAA violation detection
AI Consolidation (ai-consolidator.ts): Identifies category duplicates, calculates savings
Simulators (simulator.ts): Risk modeling, impact prediction
Playbook Engine (playbook.ts): Generates step-by-step remediation guides + email drafts
Audit Trails (playbook.ts): Logs all simulated revocations to JSON
API Routes: /api/upload, /api/ai/risk-assessment, /api/ai/consolidation, /api/ai/compliance, /api/playbook, /api/simulate

Data Processing:

SaaS Database: 500+ entries (saas_database.json) with keywords, risk levels, data permissions
Expenses: CSV parsing with expense → SaaS matching
Browser History: JSON parsing, domain extraction, SaaS correlation
Slack Apps: Real Slack workspace integration (OAuth flow ready)
Caching: In-memory cache for AI assessments (avoid API waste)

Testing & Quality:

Unit Tests: Vitest covering detector, simulator, playbook logic
E2E Tests: Playwright test suite for full user workflows
Type Safety: Full-stack TypeScript, no any types in core logic
Code Linting: ESLint configured for React + TypeScript

Challenges we ran into

Risk Scoring Without ML: Needed to score apps intelligently without training ML models. Solution: Multi-factor weighting (data access 40%, risk_level 30%, user attribution 20%, compliance violations 10%) + AI fallback when API available.
Handling Unstructured Expense Data: Expense descriptions vary ("Slack $100", "Monthly subscription: Slack", "Slack Pro Team Plan"). Solution: Keyword-based matching against SaaS database keywords field (case-insensitive, substring match).
Browser History Correlation: Browser history has millions of URLs; finding SaaS apps is expensive. Solution: Extract domain, match against known SaaS domains, cache results.
Compliance Framework Accuracy: GDPR/CCPA violations sound similar but have different requirements. Solution: Built separate checkers for each framework, rules-based with AI enhancement.
Graceful AI Fallback: Google API might timeout or be unavailable. Solution: Implemented rule-based fallback for every AI feature (risk, consolidation, compliance) that produces identical JSON output.
Data Privacy: Handling employee emails, SSNs, browser history safely. Solution: In-memory processing only, no logging of sensitive data, audit trail focuses on app decisions not employee details.

Accomplishments we're proud of

✅ Full-stack AI integration - Detection + Risk Scoring + Consolidation + Compliance all AI-augmented
✅ Graceful degradation - AI features work with or without API key (fallback logic for each)
✅ Real data processing - Handles messy CSVs, JSON, browser history, OAuth app lists
✅ Multi-framework compliance - GDPR, CCPA, SOC 2, HIPAA violation detection built-in
✅ Production-ready code - Full TypeScript, proper error handling, modular services
✅ Comprehensive testing - 17+ unit tests + Playwright E2E suite with auto-screenshots
✅ Beautiful, functional UI - Custom CSS animations, dark theme, responsive design
✅ Executive-grade reports - Exportable HTML/PDF compliance reports with styled tables & charts
✅ Simulation engine - Test revocation impact before executing (cost modeling, workflow impact)
✅ Audit trails - Every simulated action logged to JSON (audit_log.json, revokes_demo.json)
✅ Clean commits - 11 commits with clear messages showing iterative development

What we learned

Data-driven security decisions beat guessing: Exposing shadow IT as CSV/dashboard makes decision-making fast and evidence-based.
Graceful AI fallback is essential: Never hardcake LLM responses—always have a deterministic fallback for reliability.
IT managers care about consolidation ROI more than risk scores: Show "$50k/year savings possible" and they act faster than "CRITICAL RISK."
Compliance frameworks are rules engines: GDPR/CCPA can be modeled as rule sets, with AI enhancement for edge cases.
Full-stack TypeScript eliminates integration bugs: Type safety in data transformations (CSV → detection → risk → export) prevents silent failures.
Simulations reduce executive anxiety: "Here's what happens when we revoke access" is more persuasive than "This is risky."

What's next for shadow-saas-detector

Live integrations: OAuth connections to Google Workspace, Microsoft 365, Okta, GitHub for real-time SaaS discovery (currently CSV upload only)
Continuous monitoring: Recurring browser history sync instead of one-time uploads
Automated enforcement: Network-level app blocking + SSO controls + cloud access broker integration
Custom policies: IT team defines their own risk rules ("No AI tools without contract review")
Slack bot: @SaaSDetector check [app-name] for instant risk lookup
Industry benchmarking: "Compare your shadow spend to similar companies in your industry"
Cost tracking: Timeline view of shadow spend month-over-month
API for integrators: Partner ecosystem (MDR, ITSM, DLP vendors can leverage detection)

Built With

aiplaywright-(e2e)
express.js
generative
google
json-based
railway
react
recharts-(data-visualization)
typescript
vite
vitest

Updates

Pranjal J started this project — Mar 15, 2026 11:13 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.