Inspiration
Every knowledge-worker vertical wakes up to the same problem: fragmented sources, delayed information, unstructured data. An M&A banker scrapes SEC EDGAR + press wires by hand. A litigator stitches together SCOTUSblog + court dockets + DOJ press. A VC associate reads TechCrunch + Crunchbase + a dozen newsletters. Each spends 2–4 hours every morning rebuilding the same picture. Tier-1 tools (CapIQ, PitchBook, Westlaw) cost $20K+/seat per vertical and don't speak to each other.
The insight: this is one workflow problem, not three. We wrote a persona mapping — five personas × four industries — and saw that the architecture should be the product, with the industry being a configuration knob.
What it does
Dealflow is the morning intelligence platform for deal professionals, litigators, and venture investors. One ingestion pipeline, three verticals.
- M&A. 57 sources — SEC EDGAR (10 form types), press wires, law-firm tombstones, PE/IB news, AI search. XBRL financial enrichment, EV multiples, 3-year stock chart vs. S&P 500, DCF, and a 1,000-trial Monte Carlo.
- Legal. 6 sources — SCOTUSblog, Justia, ABA, Reuters Legal, DOJ press, UK CMA. Inline IRAC analysis (Issue · Rule · Application · Conclusion) generated per case, with named precedents and a downstream-impact section.
- VC. 6 sources — TechCrunch, Crunchbase, VentureBeat, Axios Pro Rata, plus AI-search queries. Lead-investor mapping, post-money tracking, comparable rounds, plausible-exit candidates.
A header pill switches between the three. Underneath, one ClickHouse, one scheduler, one FastAPI service powers all three.
How we built it
| Layer | Stack |
|---|---|
| Frontend | Next.js 16, React 19, Tailwind, Recharts, TanStack Table |
| Backend | FastAPI, GPT-4o for extraction + briefs + IRAC, APScheduler for daily 7am UTC crons |
| Crawler | One GenericSpider dispatching to 6 parsers (HTML, RSS, EDGAR JSON, NewsAPI, Serper, Nimble) |
| Stores | SQLite (system of record), Chroma (RAG), ClickHouse Cloud (analytics + event firehose) |
| Sponsors | Nimble (replaces Tavily for web search + chat grounding), ClickHouse (deals mirror + pipeline runs + source-yield events + API request log), Senso (GEO + brand layer) |
| Deploy | Railway (API + scheduler + persistent disk), Vercel (frontend), ClickHouse Cloud |
The key abstraction is industry_config.py — one registry mapping each vertical to a CSV path, a stage list (M&A: rumor → … → completed; Legal: filed → … → closed; VC: sourced → … → exited), an extraction prompt set, and a UI label set. The DB columns stay generic (acquirer, target, deal_value) — they relabel per industry at the UI layer. 80% of the runtime is shared; only the prompts, sources, and labels fork.
An LLM orchestration agent picks which M&A sources to crawl each run from yield telemetry. The priority score is
$$\text{priority} = \text{base} \times \big(0.4 \cdot \overline{\text{yield}} + 0.4 \cdot \text{yield}_{\text{last}} + 0.2 \cdot \text{recency}\big)$$
so high-signal sources get pulled more often, drifting sources get flagged after 3 consecutive zero-yield runs.
Challenges we ran into
- The M&A → multi-industry pivot mid-hackathon. Required adding an
industrycolumn to SQLite and ClickHouse, namespacing dedup keys to prevent cross-vertical collisions, gating the SEC XBRL enricher to M&A only, and routing the LLM-orchestration agent (which is M&A-coded) to static-spider mode for the other verticals. ~3,200 net lines across 30+ files, additive only — the existing 121 M&A records were auto-taggedindustry='ma'by aDEFAULT 'ma'migration and nothing regressed. NEXT_PUBLIC_API_URLbaked into the wrong Railway service. The frontend was pointed at a dead service for a stretch — surface symptom was every dashboard returning 404 on data load. Fixed via Vercel env update + force-rebuild.- Vercel's GitHub auto-deploy + Railway's GitHub creds both broke at different times and had to be manually re-bootstrapped via the CLI.
- A
router.push('/?…')bug inDealFilterssent every filter change back to the marketing landing page from any dashboard. Fix was a one-lineusePathname()swap, but it took clicking 30 filters to spot. - Python 3.9 still doesn't support
X | Nonetype hints at runtime, even withfrom __future__ import annotations— only when the annotation isn't a default value. Cost us one cycle of stack traces. - iCloud-synced project dir kept creating Finder duplicates (
WorkflowDiagram 2.tsx) that broke the Next.js typecheck. Cleaned up with onefind+rm.
What we learned
- Prompts are configuration, not code. The cost of adding a second vertical was a CSV + one prompt set — not a fork of the codebase. The third vertical took two hours total. A fourth (CRE) is essentially free.
- Right tool for the right write. SQLite for the typed system-of-record, Chroma for retrieval, ClickHouse for the firehose. Trying to make any one store do all three is a category error.
- Generic data, industry-aware UI. We never renamed the
acquirercolumn in the database — Plaintiff / Defendant / Lead investor are display labels sourced from a registry. This kept the schema migration to a single additive column. - The architecture is the demo. Showing the same engine power three visibly different products beats any architecture slide.
What's next
- Fourth vertical: Commercial Real Estate — REIT filings + property listings + CMBS data. ~1 day of work.
- Per-vertical enrichment partners — PACER for legal court dockets; Crunchbase API for VC cap-table depth.
- Watchlists + email alerts — schema exists; SMTP wiring is the gap.
- Self-serve source admin —
/adminUI for adding CSV sources instead of file edits.
Built With
- apscheduler
- beautiful-soup
- chroma
- chromadb
- clickhouse
- devpost-usually-wants-a-comma-separated-tag-list
- fastapi
- flatten-like-this:-python
- github
- gpt
- gpt-4o
- huggingface
- javascript
- json
- langchain
- newsapi
- next.js
- nimble
- not-subheadings.-if-yours-is-one-input
- numpy
- openai
- pydantic
- python
- railway
- react
- react-markdown
- recharts
- scipy
- sec-edgar
- senso
- sentence-transformers
- sentry
- serper
- sql
- sqlite
- tailwind
- tailwind-css
- tanstack-table
- typescript
- vercel
- yfinance
Log in or sign up for Devpost to join the conversation.