Japan Finsight - Catalyst Intelligence Platform

Inspiration

The Tokyo Stock Exchange forced 3,800 companies to publish capital efficiency improvement plans in 2023—the largest forced corporate transformation in modern history.

The opportunity: 20-30% filed weak, vague plans. These 760-1,140 companies are prime activist targets for the next 5-7 years.

The problem: Finding them requires analyzing thousands of Japanese regulatory filings across three siloed data sources. No one can do this systematically at scale.

I built the platform to find activist targets before the activists do.

What it does

Catalyst Intelligence Platform combines three data sources to identify companies under reform pressure:

1. Reform Pressure Ranking - Scores all 2,327 TSE companies (0-100)

2. Cross-Shareholding Network Analysis - Maps reciprocal ownership (A owns B, B owns A). When one company faces activist pressure to sell, counterparties must also sell—creating cascade effects.

3. Large Shareholding Tracker - Monitors 5%+ ownership filings (EDINET Doc 350) to identify activist campaigns.

4. Natural Language SQL - Ask questions in plain English:

  • "Show me non-compliant companies with activist pressure"
  • "Find cross-shareholdings > 5% where both companies have PBR < 1.0"

The edge: Multi-catalyst detection—find companies with activist pressure + weak TSE response + cross-holding unwind before they're priced in.

How I built it

Data: 2,327 TSE companies, EDINET Doc 350 (activist filings), EDINET Doc 120 (shareholdings)

Pipeline:

  1. Custom EDINET scrapers extract Japanese PDFs
  2. Gemini 2.5 Flash extracts structured data (investor names, ownership %, company names)
  3. PostgreSQL stores normalized data (activist_filings, shareholdings, tse_reform_status, corporate_entity)
  4. Reform pressure algorithm weights multiple catalysts
  5. Cross-shareholding detector finds bidirectional relationships via SQL self-joins
  6. Claude Sonnet 4.5 converts natural language to SQL
  7. Flask dashboard renders results with English company names

Tech: Python, Flask, PostgreSQL, Gemini API, Claude API, BeautifulSoup, Pydantic, SQLAlchemy

Key innovation: Multi-catalyst scoring—first platform to combine activist filings + TSE compliance + cross-holdings in one interface.

Challenges we ran into

  1. EDINET extraction complexity - Doc 350/120 have inconsistent formats. Solution: Pydantic schemas + LLM extraction with retry logic
  2. Company name matching - Same company has different names across datasets. Solution: EDINET code as primary key + English name enrichment
  3. Cross-shareholding networks - Detecting bidirectional relationships without double-counting. Solution: SQL self-joins + bidirectional matching
  4. Reform pressure scoring - No ground truth for "weak responder." Solution: Weighted scoring based on activist investment framework
  5. SQL injection risk - Whitelist SELECT queries only + parameterized execution
  6. Data completeness - Extraction ongoing. Solution: Dashboard shows data quality status

Accomplishments that we're proud of

✅ Analyzed 2,327 TSE companies with reform pressure scores ✅ Cross-shareholding network detection with cascade risk analysis ✅ Natural language SQL generates correct queries from plain English ✅ Multi-catalyst detection finds triple-threat companies ✅ Production dashboard deployed at japanfinsight.com/catalyst-intelligence ✅ English name enrichment for international accessibility

What we learned

Multi-catalyst analysis is the edge - Single metrics (just PBR or just TSE compliance) are commodity. Real alpha comes from intersection of 3+ catalysts.

LLM extraction works at scale - Gemini 2.5 Flash handles Japanese financial PDFs with $0.01-0.05 cost per document.

Natural language reduces friction - Financial analysts want "show me X with Y" not SQL joins. Example queries are critical for discovery.

Cross-shareholding unwinding is underappreciated - Cascade effects can unlock 40% locked registers overnight. Most platforms miss this entirely.

Domain expertise drives scoring - Activist investment framework from industry panels informed our weighting algorithm—beats naive metrics.

What's next for Japan FinSight

Near-term (1-3 months):

  • Backfill Doc 350/120 extraction (target: 1,000+ filings, complete shareholding network)
  • Add TSE reform plan quality scoring via LLM
  • Saved queries + email alerts for new activist filings

Monetization:

  • Institutional: API access, custom dashboards, alerts $999/mo
  • Research partnerships: Co-invest with activist funds

Vision: Become the Bloomberg Terminal for Japanese corporate governance—the platform for identifying and tracking reform opportunities in Japan's $6 trillion equity market.

Built With

Share this project:

Updates