Inspiration

With over $35 trillion in ESG-labeled funds globally, investors, regulators, and the public need objective tools to distinguish genuine environmental commitment from performative messaging. Companies publish sustainability press releases while simultaneously receiving EPA violations — and nobody connects the two at scale.

Manually researching greenwashing for a single company takes an ESG analyst 20-40 hours — searching EPA databases, cross-referencing corporate press releases, and compiling evidence. EnviroWash was built to automate that entire workflow for 368+ companies simultaneously.

What it does

EnviroWash automates the entire corporate greenwashing detection workflow — from ingesting data across 11 sources, to AI-powered scoring, to automated pairing of claims against reality. What would take an ESG analyst 40+ hours per company is done automatically every 4 hours for 368+ companies, running on a free-tier Vercel deployment.

Every 4 hours, the automation pipeline:

  1. Ingests articles from 11 data sources (3 PR wire RSS feeds, 2 news APIs, 6 government databases)
  2. Deduplicates and clusters articles by company using AI (Claude Haiku)
  3. Scores each event with a novel dual-scoring system (Claude Sonnet + deterministic validation)
  4. Resolves company identities across sources via 5-step matching (exact → alias → EPA parent → fuzzy → create new)
  5. Pairs corporate claims with environmental reality — both within weeks and across months/years
  6. Calculates a Wash Index that quantifies the greenwashing gap

Cross-week temporal detection is the key automation innovation: a company's sustainability press release from January is automatically paired with an EPA violation that surfaces in June. The temporal gap increases pairing confidence — greenwashing patterns that span months are more damning than same-week coincidences.

Every Sunday, the week's data is frozen into an immutable snapshot — creating an auditable record that companies cannot retroactively alter.

Who needs this:

  • ESG analysts and investors — automated due diligence on $35T+ in ESG-labeled assets
  • Investigative journalists — data-backed corporate accountability with verifiable evidence
  • Regulators — automated monitoring of ESG disclosure compliance at scale
  • NGOs — reproducible greenwashing evidence for advocacy
  • Corporate compliance teams — proactive greenwashing risk identification

How we built it

  • Frontend: Next.js 16 + React 19 + Tailwind CSS v4 with Recharts for data visualization
  • Database: Supabase PostgreSQL with Row Level Security on all 7 tables and isolated envirowash schema
  • AI: Claude API — Haiku for clustering (cheap/fast), Sonnet for scoring (accurate)
  • Pipeline: 3 independent Vercel cron jobs (Ingest → Process → Freeze), each under 60 seconds
  • Scoring: Novel dual-scoring — G-Score (0-100) measures environmental reality across 5 weighted drivers; C-Score (0-100) measures claim prominence. Wash Index = (C × G) / 100 × gap_confidence
  • Company Resolution: 5-step resolver — exact match → alias → EPA parent → fuzzy (Jaccard ≥0.8) → create new
  • Security: RLS on all tables, CRON_SECRET authentication on pipeline endpoints, auth-protected admin dashboard, service role isolation
  • Testing: 267 automated tests with Vitest — scoring formulas (25+ per file), ingestion/dedup (47 tests), company resolution (16 tests), cross-week temporal, API parsing
  • Code Quality: TypeScript strict mode throughout, clean separation of ingestion/processing/scoring/display layers, consistent error handling with graceful degradation per source

Key Metrics

  • 565 events scored across 143 weeks of data (Dec 2024 – Feb 2026)
  • 77 deduplicated wash pairs across 29 ranked companies (max Wash Index: 84.2) — greedy 1:1 matching ensures each event appears in at most one pair
  • 368 companies tracked across 8 sectors with AI-powered classification
  • 267 automated tests with zero production runtime errors
  • 23 pages (14 public + 9 admin dashboard)
  • 3 public REST API endpoints with filtering, pagination, and documented response shapes
  • Lighthouse: Performance 80 / Accessibility 100 / Best Practices 100 / SEO 100
  • CSV export, email alerts, RSS feed, social sharing, dark mode

Challenges we ran into

  • Rate limiting: GDELT and GNews APIs have strict rate limits. We implemented batching delays and graceful degradation per data source — if one source fails, the other 10 continue.
  • Company resolution: The same company appears under many names ("Volkswagen AG", "VW Group", "Volkswagen"). Our 5-step resolver handles exact, alias, parent company, and fuzzy matching.
  • Scoring validation: AI models can hallucinate scores. We never trust Claude's final_score directly — AI provides component inputs, but final G-Score and C-Score are always recomputed locally using deterministic formulas.
  • Time budget: All pipeline operations must complete within Vercel's 60-second limit on the free tier. We split into 3 independent cron jobs and maintain a 50-second budget with 10-second buffer.
  • Cross-week pairing at scale: Pairing events across 143 weeks required efficient matching with ESG category constraints to avoid false positives while catching genuine months-spanning patterns.

Accomplishments that we're proud of

  • First platform to computationally pair corporate sustainability claims against EPA government data at scale
  • Cross-week temporal detection catches greenwashing patterns spanning months — not just same-day coincidences
  • The Wash Index formula only flags greenwashing when BOTH a loud claim AND significant harm exist — minimizing false positives
  • Deterministic validation means every score is reproducible and auditable
  • Full admin dashboard with score override, audit trail, and manual pipeline triggers
  • Production-quality UX: dark mode, animated charts, company comparison, sector heatmap, social sharing, CSV export, email alerts, RSS feed
  • Fully autonomous pipeline running 24/7 on Vercel free tier

What we learned

  • Government data is the most powerful tool for accountability — EPA filings can't be edited or deleted by companies
  • Dual scoring (reality vs claims) is far more nuanced than simple sentiment analysis
  • Cross-week temporal analysis reveals patterns invisible to same-week analysis — some of the most egregious greenwashing spans months
  • Deterministic validation of AI outputs is essential for trust and reproducibility
  • Split pipeline architecture (separate ingestion from processing) dramatically improves reliability

What's next for EnviroWash

  • International expansion: Add EU CSRD reporting data, ESMA filings, UK Environment Agency, and CDP Climate Disclosures to detect greenwashing globally
  • Real-time push notifications: Alert subscribers instantly when new wash pairs are detected
  • Browser extension: Flag greenwashing claims in articles as users read them, powered by the EnviroWash API
  • ESG data provider integrations: Connect with MSCI, Sustainalytics, and Bloomberg ESG data for richer company profiles
  • Corporate accountability reports: Auto-generate quarterly PDF reports ranking companies by sector with trend analysis
  • Government agency partnerships: Provide bulk API access for regulators to integrate EnviroWash data into enforcement workflows

Built With

Share this project:

Updates