Each Swarm node contributes local storage and any service running in the Swarm can mount the shared providing it with state.
Our hack aggregates the local storage of each Docker Swarm node and provides a shared volume that can be mounted by any Docker service to make it stateful. As nodes are added/removed, the storage rebalances and scales automatically as expected ensuring availability and increased capacity.
Docker tools used:
- Docker Swarm 2.0
- Docker volume plugin
- Docker Hub
While stateless services have many advantages, such as being deterministic and scaling well, there are many services which require some form of state. A major limitation of Docker right now is that it does not facilitate the creation of stateful services. A host's storage can be used however this then means that for state to persist, a service must always run on the same host. There are a number of volume plugins that try to solve this with varying success but none that currently take full advantage of Swarm 2.0 and Docker Services.
As we are building a secure, decentralised filesystem, we saw this as an opportunity. Our solution is designed to provide a strongly consistent POSIX filesystem that scales linearly which is ideal for large deployments with nodes coming and going without warning as may be the case in a Docker Swarm.
What it does
Infinit provides a simple way to add hyper-converged storage to your Docker Swarm. Hyper-converged storage means that as a compute node is added to a Swarm, Infinit aggregates its storage to the cluster. This storage can be used by any service running on the Swarm to provide a place to store common files or share state. As nodes join or leave the Swarm, Infinit automatically scales the storage and rebalances or replicates blocks to ensure high availability and redundancy of the data stored on the Infinit volumes.
How we built it
Docker integration for Infinit was built on top of our decentralised filesystem which is detailed on our website. In order to make Infinit work with Docker we wrote a daemon (in C++) to expose Infinit as a Docker volume plugin. This along with our Hub ("beyond" internally -- a service that facilitates storing and transferring configuration information) and an Infinit service for contributing a node's local storage to the cluster were packaged into containers so that they can be easily distributed.
Challenges we ran into
There were a mix of issues that we encountered while developing our Docker Swarm integration:
- We require Docker privileged mode so that we can mount volumes on host machines. This is provided by Docker Engine but is not accessible when creating a service. We patched Docker v1.12.0-rc4 to workaround this: https://github.com/infinit/docker/tree/feature/privileged-service. Another workaround would be to use a mount helper container but this would need to be manually launched on each Swarm node.
- Discovery does not work between containers running the same service. Worked around by launching our Hub on the Swarm manager.
- Host Ubuntu kernel issue: mounting an Infinit volume from Docker causes "value too large for defined data type" error. We installed a kernel on each Swarm node which does not have this issue: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5.7-yakkety/
- We require enabling shared mounts for our Docker volume plugin to work as a Docker service (see: mount --make-shared).
- Docker does not always query volume plugins to get the latest list available. If a volume was deleted outside of Docker (i.e.: not with the
docker volume rmcommand), Docker thinks that it still exists. We use a cache buster when creating volumes to get around this.
Accomplishments that we're proud of
We are proud to have created an easy to deploy, scalable foundation for building stateful services on top of Docker Swarm. We think that this will be extremely useful for a large part of the Docker community.
What we learned
The challenges we encountered taught us more about kernel and mount subtleties. By working on this project, we learned a lot about the power of orchestration and services being built-in to Docker. We see this as extremely powerful especially when combined with the ability to create stateful services.
What's next for Infinit - Docker 1.12 Hackathon
We aim to publicly release the Docker volume plugin that we developed during this hackathon soon so that everyone can benefit from stateful Docker services!
We are also hard at work preparing our filesystem code to be open sourced. Check out our projects on GitHub.
Steps to reproduce the demo
Setup three machines running Ubuntu 16.04 along with:
- A kernel that does not have the bug described in the challenges section.
- Our patched version of Docker.
Make a Swarm from these three machines with one Swarm Manager.
Create an overlay network to be used by Infinit.
$> docker network create --driver overlay infinit
Start the first Infinit service -- the Hub (internally called "beyond") -- on the Swarm manager. This allows Infinit nodes to share initial configuration and fetch endpoints so that they can connect directly. This Hub image is pre-configured with a docker user and an Infinit network named docker.
$> docker service create --name svc_beyond \ --constraint 'node.role == manager' \ --publish 80:8080 \ --network infinit \ mefyl/beyond-swarm --host 0.0.0.0 --port 8080
Once the Hub has been launched, start the second service which contributes the local Swarm node's storage to the network. These images are pre-configured to contribute to the network docker using the user docker. The network is configured to replicate volume data three times which will automatically be maintained as Swarm nodes come and go. Passing
INFINIT_BEYOND in the environment allows the storage nodes to find the Hub using the overlay network. As the service is run globally, the storage from all three Swarm nodes will be aggregated.
$> docker service create --name svc_storage --mode global \ --network infinit \ --env INFINIT_BEYOND=svc_beyond:8080 \ --env INFINIT_RDV= mefyl/infinit-swarm
Now that the network has storage, run the final service -- the Infinit daemon -- which provides the Docker volume plugin. This allows creating and mounting of Infinit volumes on any node in the Swarm. As before, passing
INFINIT_BEYOND allows the service to find the Hub. Passing
INFINIT_USER tells the daemon which Infinit user to perform actions as. Mounting the host's local
/usr/lib/docker/plugins directories to the container ensures that the daemon can automatically manage Infinit mounts and provide the Docker volume plugin. The daemon is run passing an Infinit user name and password. It will use these credentials to fetch all configuration information it needs from the Hub.
$> docker service create --name svc_plugin --mode global \ --network infinit \ --env INFINIT_RDV= \ --env INFINIT_BEYOND=svc_beyond:8080 \ --env INFINIT_USER=docker \ --mount source=/run,target=/run,readonly=0,type=bind,bind-propagation=shared \ --mount type=bind,source=/usr/lib/docker/plugins,target=/usr/lib/docker/plugin \ bearclaw/infinit:docker-demo2 \ infinit-daemon --run --login-user docker:SECRET
Once the daemon is running, volumes can be created using Docker. The options passed when creating the volume tell the daemon to create the volume on the docker network we already have and to not use caching so that the changes to the volume are live.
$> docker volume create \ --driver infinit \ --name docker@cachebuster$(date +%s) \ --opt network=docker \ --opt nocache
To demonstrate that all the nodes are sharing the same volume, create a new global service that periodically writes the node's hostname along with a timestamp.
$> docker service create --name svc_test --mode global \ --mount type=volume,source=docker/docker,target=/shared \ alpine sh -c \ 'while true; do touch /shared/$(hostname)-$(date +%F-%T); sleep 5; done'
Mount the shared volume in another container on the Swarm to check the contents of the shared volume which is mounted to
$> docker run --rm -it \ --volume-driver infinit -v docker/docker:/shared \ alpine sh # ls /shared
This should show a list of files written by each of the nodes which will be added to every five seconds.