FileDAG Storage V2

Cover

Inspiration

When we organize files or objects in the Merkle-DAG structure, a part of multiple files or objects may share some data blocks. This method has obvious benefits, such as reducing data redundancy, especially for multi-version systems, and saving bandwidth on data transmissions through the network. However, it also has the disadvantage that data management will become more complex and meet some foreseeable challenges, including:

The management module of files or objects needs to be abstracted based on Merkle-DAG.
It's nearly impossible to delete files or objects directly, only through garbage collection to release data blocks that don't need anymore.

Using a multi-user mode to store data resources would become even more complicated since the current open-sourced implementation of IPFS protocol does not support multi-user usage. Supporting multi-users and maintaining a larger DAG pool will expand the advantages of reducing data redundancy and saving network bandwidth. However, it will also increase the difficulty of data management at the same time.

For commercial decentralized storage services, it's necessary to ensure data reliability, availability, fault tolerance, and meet the requirements of different application scenarios.

What it does

FileDAG Storage is a distributed storage service built on the IPFS technology stack, focusing more on data management, reliability, availability, fault tolerance, and clustering storage nodes.

For storing a large capacity of data onto the Filecoin network, especially the PiB-scale data, it's unavoidable to use the IPFS technology stack and build related tools, services, and infrastructure. The original IPFS implementation can solve the problem of how individual users can use IPFS for content distribution and data sharing. However, the data capacity was relatively small. Using IPFS to store and distribute massive data will meet many challenges that it's not limited to those mentioned before. FileDAG Storage aims to provide technical solutions to support data management, preprocess, and transmission for clients and storage providers and high-efficient data retrieval and availability for users.

How we built it

FileDAG Storage is built based on IPFS stacks and a series of related technologies, which aims to meet the needs of real application scenarios of the Web3 world.

Challenges we ran into

FileDAG Storage aims to provide a solution to above challenges, including:

learn from object storage services about how to manage data
process the release of data blocks by reference counting
use distributed storage nodes to provide data availability and use erasure coding technology to improve data reliability and fault tolerance

Accomplishments that we're proud of

Milestone 1

Goal: build fundamental data structure and the overall architecture of this project Description:

Development of single DAG Node:
- supports API of the block store
- providers basic storage service for DAG Pool
DAG Pool:
- multi-user access
- authentication mechanism
Object store:
- implements basic data structure, such as user, region, bucket, and object
- implements API of user authentication

Millstone 2 (nearly completed)

Goal: implement data management Description:

DAG Pool:
- reference records of data blocks
- strategy of data pin
- interruptible garbage collection mechanism of DAG Pool
Object store:
- API of bucket-related operations
- API of object manipulation
- API of permission operation