What it does
FinSight:
Processes S&P 500 company filings and global regulations
Splits documents into chunks and generates multilingual embeddings
Summarizes and extracts structured financial, risk, and regulatory insights
Builds semantic networks connecting companies and directives
Provides interactive 3D visualizations to explore relationships, risks, and strategic impacts
How is it built
- Python for data processing and orchestration
- Pandas for CSV handling and chunked processing
- NumPy for embedding aggregation and vector math
- Bedrock LLMs for summarization and structured extraction (Claude Sonnet for text, Amazon Titan for embeddings)
- NetworkX and Plotly/3D visualization libraries for semantic networks
- Parallel processing to handle large datasets efficiently, with incremental CSV saving for fault tolerance
Challenges faced
The major challenge was data ingestion: LLMs have strict content and context limits, making it difficult to process large filings and regulatory texts in one go.
Handling multilingual embeddings (Chinese, Japanese, etc.) correctly.
Ensuring memory efficiency with 1GB+ CSVs.
Maintaining semantic order across chunked document embeddings.
Building interactive visualizations without clutter from thousands of nodes and edges.
Accomplishments that we're proud of
Successfully processed large, multilingual financial and regulatory datasets
Built a robust structured data extraction pipeline that includes financial metrics, risks, mitigation strategies, competitive advantage, and compliance costs
Developed interactive semantic network visualizations showing relationships between companies and regulations
Pipeline is scalable, fault-tolerant, and fully parallelized
Lessons learned
Preprocessing and chunking large documents is crucial for LLM-based extraction
Embedding aggregation needs careful handling of empty or malformed data
Semantic network analysis can reveal hidden patterns and regulatory impacts not obvious from raw data
Incremental, memory-efficient processing is essential for large-scale datasets
What's next for FinSight
Integrate company and regulatory structured data to build a quantitative model for market analysis.
Extend coverage beyond S&P 500 to global companies and regulations.
Add automated alerting for high-impact regulatory changes.
Improve visualizations with time-series trends and predictive risk scoring.
Enable real-time updates from live filings and directives.
Enhance insight generation to produce actionable, data-driven recommendations for market strategies.
Built With
- aws-bedrock
- claude-haiku
- plotly
- python
- scikit-learn
- vector-embedding
- vibes
Log in or sign up for Devpost to join the conversation.