Inspiration

Credit analysts, risk officers, and hedge fund managers spend weeks manually reading SEC 10-K filings to spot early signs of corporate distress. By the time a distress signal surfaces in market prices, it is already too late to act. We asked: what if a machine could read every 10-K the moment it is filed and flag risk before the market does?

What it does

Distress Radar is an interactive early-warning system that automatically ingests real SEC 10-K filings, extracts NLP-based risk signals from MD&A disclosures, combines them with market-proxy features, and scores each company on a 0–100 distress scale using an XGBoost model with SHAP explainability.

The live dashboard lets users:

  • Search any company by ticker or name
  • See a distress score and risk level (Critical / High / Low)
  • Understand why the model triggered via a SHAP feature explanation chart
  • Read the actual MD&A excerpt that drove the signal
  • Trace back to the SEC source filing with a direct link

How we built it

The full pipeline runs inside a Zerve notebook and is deployed as a live Streamlit app wired directly to notebook outputs via Zerve's deployment variable system.

Pipeline stages:

  1. Data ingestion — fetched real 10-K filings from SEC EDGAR for 18 companies including known distress cases (BBBY, SIVB, CVNA, PTON, LCID) and healthy blue-chips (AAPL, MSFT, JPM, JNJ)
  2. NLP feature extraction — computed hedge density, FinBERT sentiment, topic drift score, going-concern flag, average sentence length, and distress keyword count from MD&A text
  3. Market-proxy features — generated volatility, price decline, and volume spike proxies from ticker metadata and filing date with no external API required
  4. Model training — trained an XGBoost classifier on 10 features with temporal-aware splitting; achieved AUC = 1.0 on the validated demo dataset
  5. SHAP explainability — computed per-company SHAP values; top drivers were avg_sentence_length, distress_keyword_count, hedge_density, and days_since_filing
  6. Deployment — single Streamlit app deployed on Zerve, reading the enriched scored dataframe directly from the notebook block via Zerve's variable loader

Challenges we ran into

  • Label leakage: early versions of topic drift and market features accidentally used the distress label in their computation, causing XGBoost to ignore all other features. Fixed by rebuilding both functions to use only ticker hash and filing date.
  • Deployment variable wiring: Zerve's variable() loader only works in deployment context, not in notebook blocks — we hit this error before correctly separating notebook code from deployment code.
  • Block name mismatch: the deployment was pointing at a non-existent block name until we read the actual DAG metadata and identified the correct source block.
  • 503 runtime errors: the Streamlit ranked table caused intermittent connection failures; resolved by adding numeric coercion guards and schema normalization before any widget rendering.

Accomplishments we are proud of

  • Built a fully end-to-end pipeline from raw SEC filing to live interactive dashboard in one hackathon session
  • Successfully eliminated label leakage and confirmed multi-feature SHAP contributions across text and market features
  • Deployed a live public URL backed by a real notebook ML pipeline, not hardcoded data
  • The SVB case study shows a score of 58/100 flagged from its 2023-02-24 filing — a compelling real-world validation

What we learned

  • Zerve's notebook-to-deployment variable system is a powerful pattern but requires exact block naming discipline
  • SHAP zero-importance is almost always a sign of feature leakage, not model failure
  • Keeping the deployment file schema-tolerant by normalizing missing columns is essential for production stability

What is next

  • Live ingestion for any public company ticker via real-time SEC EDGAR API
  • Expand to 500+ companies with quarterly re-scoring
  • Add peer-group comparison: flag when a company's language diverges from sector peers
  • Integrate financial ratio features alongside the NLP signals
  • Build an alert subscription system for risk analysts

Built With

Share this project:

Updates