We believe in a systematic data driven approach to understand the exponentially growing blockchain data. We want to democratize the ability to make sense out of on-chain data by enabling project builders, traders, collectors and the rest of the community to have the source of truth and end-to-end data infrastructure. We are inspired to build a powerful platform that could handle the data volumes of a high throughput blockchain and provide the high quality data to allow signal discovery and secondary data development.
What it does
Sophon is a one-stop community-driven data platform that supports arbitrary live query, realtime complex analysis and insights extraction for Solana.
How we built it
- Data Ingestion: we ingested the on-chain data from Solana onto our database.
- Data ETL: raw data is orchestrated using airflow and processed using spark.
- Data Discovery: ingest the metadata into the Datahub.
- Data Analytics: we stored the data in parquet format ready to serve the analytical queries. We also provide the platform to build intermediate tables
- Data dashboard: analyze and visualize data.
- GraphQL: to provide developers with straightforward ways to design, develop ways to query data.ability to dictate exactly what they need from the server, and receive that data in a predictable way.
- Data Platform Orchestration: using terraform and Kubernetes to orchestrate the workflow.
Challenges we ran into
- Scalability: Solana data scale is 10k transactions per sec, 250GB processed data per day.
Initially we had problems supporting the data volume. Then we built a platform to ingest and serve 10 TB data (9/14-10/15).
- For example, dune analytics using Postgres to orchestrate and store the data, Postgres could cumulatively serve up to 1TB of data in total before falling apart. Sophon platform can support infinite scaling.
- Data quality: Initially we had missing and duplicate data. We added data quality checks to increase observability for identifying data deficits. We built an airflow pipeline to systematically fix the data gap.
- Robustness: a certain component may be slow or unresponsive, debug and find the root cause of the issue is nontrivial in a distributed environment.
Accomplishments that we're proud of
- Sophon is the only platform able to handle Solana data volume and its growing throughput.
- Sophon provides arbitrary complex analytics query , dashboard creation and graphQL indexing enriched on chain data.
- Sophon supports infinite horizontal scaling enabling on-demand data streaming, real time analytics, and anomalies detection.
What we learned
- How to crawl 250GB per day data volume
- How to properly ingest and process large data
- How to increase observability and ensure high data quality
- Platform orchestration
What's next for Sophon
- Further enrich data. Generate materialized tables for project specific, account specific insights.
- Partner with exisiting and emerging Solana projects to help build templates and guidelines for essential insights.
- Enable trading data for Defi project to allow quantitative strategy development, anomaly detection and backtesting.
- Add stats and dashboard for Gaming and NFT insights.
- Support secondary data development and machine learning capabilities.