Distress Radar: Early Warning from SEC Filings

Inspiration

Credit analysts, risk officers, and hedge fund managers spend weeks manually reading SEC 10-K filings to spot early signs of corporate distress. By the time a distress signal surfaces in market prices, it is already too late to act. We asked: what if a machine could read every 10-K the moment it is filed and flag risk before the market does?

What it does

Distress Radar is an interactive early-warning system that automatically ingests real SEC 10-K filings, extracts NLP-based risk signals from MD&A disclosures, combines them with market-proxy features, and scores each company on a 0–100 distress scale using an XGBoost model with SHAP explainability.

The live dashboard lets users:

Search any company by ticker or name
See a distress score and risk level (Critical / High / Low)
Understand why the model triggered via a SHAP feature explanation chart
Read the actual MD&A excerpt that drove the signal
Trace back to the SEC source filing with a direct link

How we built it

The full pipeline runs inside a Zerve notebook and is deployed as a live Streamlit app wired directly to notebook outputs via Zerve's deployment variable system.

Pipeline stages:

Data ingestion — fetched real 10-K filings from SEC EDGAR for 18 companies including known distress cases (BBBY, SIVB, CVNA, PTON, LCID) and healthy blue-chips (AAPL, MSFT, JPM, JNJ)
NLP feature extraction — computed hedge density, FinBERT sentiment, topic drift score, going-concern flag, average sentence length, and distress keyword count from MD&A text
Market-proxy features — generated volatility, price decline, and volume spike proxies from ticker metadata and filing date with no external API required
Model training — trained an XGBoost classifier on 10 features with temporal-aware splitting; achieved AUC = 1.0 on the validated demo dataset
SHAP explainability — computed per-company SHAP values; top drivers were avg_sentence_length, distress_keyword_count, hedge_density, and days_since_filing
Deployment — single Streamlit app deployed on Zerve, reading the enriched scored dataframe directly from the notebook block via Zerve's variable loader

Challenges we ran into

Label leakage: early versions of topic drift and market features accidentally used the distress label in their computation, causing XGBoost to ignore all other features. Fixed by rebuilding both functions to use only ticker hash and filing date.
Deployment variable wiring: Zerve's variable() loader only works in deployment context, not in notebook blocks — we hit this error before correctly separating notebook code from deployment code.
Block name mismatch: the deployment was pointing at a non-existent block name until we read the actual DAG metadata and identified the correct source block.
503 runtime errors: the Streamlit ranked table caused intermittent connection failures; resolved by adding numeric coercion guards and schema normalization before any widget rendering.

Accomplishments we are proud of

Built a fully end-to-end pipeline from raw SEC filing to live interactive dashboard in one hackathon session
Successfully eliminated label leakage and confirmed multi-feature SHAP contributions across text and market features
Deployed a live public URL backed by a real notebook ML pipeline, not hardcoded data
The SVB case study shows a score of 58/100 flagged from its 2023-02-24 filing — a compelling real-world validation

What we learned

Zerve's notebook-to-deployment variable system is a powerful pattern but requires exact block naming discipline
SHAP zero-importance is almost always a sign of feature leakage, not model failure
Keeping the deployment file schema-tolerant by normalizing missing columns is essential for production stability

What is next

Live ingestion for any public company ticker via real-time SEC EDGAR API
Expand to 500+ companies with quarterly re-scoring
Add peer-group comparison: flag when a company's language diverges from sector peers
Integrate financial ratio features alongside the NLP signals
Build an alert subscription system for risk analysts

Built With

cosine
finbert
hugging-face-transformers
pandas
plotly
python
scikit-learn
sec-edgar-api
shap
streamlit
tf-idf
xgboost
zerve

Updates

Deepak Jadhav started this project — Apr 29, 2026 01:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.