Inspiration

The current CORTX integration with IPFS relies on the IPFS S3 data store plugin talking to the S3 object storage REST API implemented by the CORTX RADOS Gateway (RGW) server. This isn't a very efficient design for several reasons. RGW implements a distributed object storage service oriented around objects and buckets and exposing S3-compatible and Swift-compatible REST APIs to clients relying on HTTP as the network transport. arch Basic S3 server architecture From Seagate | Meet the Architect – CORTX MOTR

IPFS however implements its own object storage scheme using content addressing, CIDs and Merkle DAGs to store structured hierachical object data. ht Storing a folder and files as a Merkle DAG. From Introducing Merkle DAGs.

Each node in an IPFS object graph is identified by a CID and persisted to storage as an immutable block using a a simple pluggable key-value data store interface. ipfsds Storing IPFS data using key-value data stores. From: A Technical Guide to IPFS

Several implementations of this data store interface have been developed inluding those using the LevelDB and BadgerDB key-value stores.

The on-disk storage and persistence layer in IPFS is thus optimized for simple, fast, key-value stores, not full-blown distributed object storage solutions like RGW and S3. Since the RGW server uses its own storage scheme and ultimately talks to the CORTX Motr key-value, using the RGW object storage service as a data store for IPFS CIDs involves multiple levels of redundancies. In addition the IPFS S3 plugin only exposes configuration for a generic S3 server i.e. bucket name, acccess key etc...there is no way for an IPFS server to expose or consume CORTX-specific configuration for data storage.

The best way for IPFS and Filecoin to take full advantage of the capabilities and scalability of CORTX would be to integrate IPFS directly with the Motr key-value store. This would remove the overhead of making HTTP REST calls and uploads for IPFS data storage and eliminate the need to run a CORTX RGW server node, improving performance and scalability and simplifying deployment of CORTX-integrated IPFS servers considerably.

What it does

vid

go-ds-motr is a IPFS data store plugin implementation that uses the Go bindings to the CORTX Motr C API to store IPFS data directly in indexes in the Motr key-value store. This allows IPFS servers to use the full capabilities and scalability of CORTX, instead of relying on a generic S3 REST API and HTTP calls. go-ds-motr stores and retrieves IPFS blocks from Motr using the native Motr client API when requested by the other IPFS subsystems using Motr key ids derived from the IPFS CIDs. go-ds-motr can consume CORTX-specific configuration and parameters specified via the IPFS configuration file and can access the full range of native functionality exposed by the Motr client API.

In simple benchmarks go-ds-motr is vastly more performant than the S3 data store plugin:

Adding 93Mb file to IPFS using S3 data store from cold start:

[root@cortx-ova-rgw go-ds-motr]# time ../go-ipfs/cmd/ipfs/ipfs add "01 Track01.flac"                                                                                                           
added QmXUdQD5gHs483TCYFTEgFsve4J1sgfM4FGs9XLZzE3obv 01 Track01.flac                                                                                                                           
 93.83 MiB / 93.83 MiB [=======================================================================================================================================================================
======================] 100.00%                                                                                                                                                                
real    1m20.728s                                                                                                                                                                              
user    0m0.437s                                                                                                                                                                               
sys     0m0.334s                                                                                                            

Adding 93Mb file to IPFS using go-ds-motr from cold start:

[root@venus go-ds-motr]# time ../go-ipfs/cmd/ipfs/ipfs add "01 Track01.flac"                                                                                                                   
added QmXUdQD5gHs483TCYFTEgFsve4J1sgfM4FGs9XLZzE3obv 01 Track01.flac                                                                                                                           
 93.83 MiB / 93.83 MiB [==============================================================================================================================================================] 100.00%
real    0m6.308s                                                                                                
user    0m0.615s                                                                                             
sys     0m0.117s

Installation

See the README on the project repo or documentation for the pull request to the CORTX main repo.

Benchmarking

You can run benchmark.sh from the go-ds-motr repo to get a idea of how performant the data store is benchmark

How we built it

I used the Go bindings to the Motr C API to create 2 modules:

  • The CLI module provides interactive functions for testing connectivity to a Motr key-value store, creating indexes and other utility functions.
  • The motords module implements the IPFS data store plugin interface.

The IPFS data store interface consists of a set of functions like Get, Put, Has etc. that each data store must implement. Each function passes a unique key as input. This is relatively simple to translate to the Motr key-value API. I used the FNV-1 hash function to generate a 128-bit identifier foe each IPFS CID key and Go functions from the Motr mkv package to implement the corresponding data store functions.

The major challenge to building an IPFS data store plugin for Motr is that keys in an IPFS data store are hierachical and can be queried e.g an IPFS block key might look like /blocks/CIQFTFEEHEDF6KLBT32BFAGLXEZL4UWFNWM4LFTLMXQBCERZ6CMLX3Y.... and the IPFS data store must implement a Query function which can say 'find all keys that begin with /blocks/CIQ.. or /pins/..'... Motr however doesn't currently support any kind of native query or search facility. As best as I can tell query functions in the current CORTX RGW S3 implementation using Motr are implemented by storing S3 metadata only in an distributed object cache and querying the cache, synchronizing changes when needed.

So I used a similar approach: a LevelDB database is used alongside Motr to store IPFS keys only which correspond to values stored in the Motr store and provide a query and search facility for IPFS keys. So Motr is used to store the data that corresponds to each IPFS CID while a LevelDB database is used as an independent metadata cache to facilitate querying on IPFS CIDs. Put and Delete operations write both to the Motr store and to LevelDB to indicate that an IPFS block exists in Motr corresponding to this CID. Since IPFS CIDs are immutable there is no need for any more synchronization of metadata once it is written once.

This approach will not affect performance as no IPFS data is actually stored in or retrieved from LevelDB, however it does reduce the reliabiity considerably as the LevelDB CID index is file-based. Future implementations will use a more robust approach following what the RGW S3 implementation does.

Challenges we ran into

The main challenge I ran into I think was in setting up a Motr environment or building Motr, things didn't always go according to the instructions or docs. You have to look at the script and see what it's doing and figure out how to get it to progress to the next step. In open-source docs always tend to lag behind code and scripts.

Accomplishments that we're proud of

This was my first significant Go project. I was glad I was able to learn Go and contribute code to the IPFS project.

What we learned

I learned a lot about Motr and interfacing with it via its native API. I learned Go and how to develop and build plugins for the go-ipfs server. I also learned some sysadmin things for Rocky Linux / CentOS.

What's next for go-ds-motr

More testing, benchmarking.

Built With

  • cortx
  • go
  • ipfs
Share this project:

Updates