What it does

FinSight:

Processes S&P 500 company filings and global regulations

Splits documents into chunks and generates multilingual embeddings

Summarizes and extracts structured financial, risk, and regulatory insights

Builds semantic networks connecting companies and directives

Provides interactive 3D visualizations to explore relationships, risks, and strategic impacts

How is it built

  1. Python for data processing and orchestration
  2. Pandas for CSV handling and chunked processing
  3. NumPy for embedding aggregation and vector math
  4. Bedrock LLMs for summarization and structured extraction (Claude Sonnet for text, Amazon Titan for embeddings)
  5. NetworkX and Plotly/3D visualization libraries for semantic networks
  6. Parallel processing to handle large datasets efficiently, with incremental CSV saving for fault tolerance

Challenges faced

The major challenge was data ingestion: LLMs have strict content and context limits, making it difficult to process large filings and regulatory texts in one go.

Handling multilingual embeddings (Chinese, Japanese, etc.) correctly.

Ensuring memory efficiency with 1GB+ CSVs.

Maintaining semantic order across chunked document embeddings.

Building interactive visualizations without clutter from thousands of nodes and edges.

Accomplishments that we're proud of

Successfully processed large, multilingual financial and regulatory datasets

Built a robust structured data extraction pipeline that includes financial metrics, risks, mitigation strategies, competitive advantage, and compliance costs

Developed interactive semantic network visualizations showing relationships between companies and regulations

Pipeline is scalable, fault-tolerant, and fully parallelized

Lessons learned

Preprocessing and chunking large documents is crucial for LLM-based extraction

Embedding aggregation needs careful handling of empty or malformed data

Semantic network analysis can reveal hidden patterns and regulatory impacts not obvious from raw data

Incremental, memory-efficient processing is essential for large-scale datasets

What's next for FinSight

Integrate company and regulatory structured data to build a quantitative model for market analysis.

Extend coverage beyond S&P 500 to global companies and regulations.

Add automated alerting for high-impact regulatory changes.

Improve visualizations with time-series trends and predictive risk scoring.

Enable real-time updates from live filings and directives.

Enhance insight generation to produce actionable, data-driven recommendations for market strategies.

Built With

Share this project:

Updates