Intro to Docker Swarm: Part 1 - Overview

What is Docker Swarm?

Docker Swarm1 is a utility that is used to create a cluster of Docker hosts that can be interacted with as if it were a single host. I was introduced to it a few days before it was announced at DockerCon EU2 at the Docker Global Hack Day3 that I participated in at work. During the introduction to hackday, a few really cool new technologies were announced4 including Docker Swarm, Docker Machine, and Docker Compose. Since Ansible fills the role of Machine and Compose, Swarm stuck out as particularly interesting to me.

Docker Global Hack Day 2014

Victor Vieux and Andrea Luzzardi announced the concept and demonstrated the basic workings of Swarm during the intros and made a statement that I found to be very interesting. They said that though the POC (proof of concept) was functional and able to demo, they were going to throw away all of that code and start from scratch. I thought that was great and try to keep that in mind when POC’ing a new technology.

The daemon is written in Go and at this point in time latest commit a0901ce8d6 is definitely Alpha software. Things are moving at a very rapid pace at this point in time and functionality + feature set vary almost daily. That being said, @vieux is extremely responsive with adding functionality and fixing bugs via GitHub Issues5. I would not recommend using it in production yet, but it is a very promising technology.

How does it work

Interacting with and operating Swarm is (by-design) very similar to dealing with a single Docker host. This allows interoperability with existing toolchains without having to make too many modifications (the major ones being splitting builds off of the Swarm cluster). Swarm is a daemon that is run on a Linux machine bound to a network interface on the same port that a standalone Docker instance (http/2375 or https/2376) would be. The Swarm daemon accepts connections from the standard Docker client >=1.4.0 and proxies them back to the Docker daemons configured behind Swarm which are also listening on the standard Docker ports. It can distribute the create commands based on a few different packing algorithms in combination with tags that the Docker daemons have been started with. This makes the creation of a partitioned cluster of heterogeneous Docker hosts that is exposed as a single Docker endpoint extremely simple.

Interacting with Swarm is ‘more or less’ the same as interacting with a single non-clustered Docker instance, but there are a few caveats. There is not 1-1 support for all Docker commands. This is due to both architectural and time based reasons. Some commands are just not implemented yet and I would imagine some might never be. Right now almost everything needed for running containers is available, including (amongst others):

  • docker run
  • docker create
  • docker inspect
  • docker kill
  • docker logs
  • docker start

This subset is the essential part of what is needed to begin playing with the tool in runtime. Here is an overview of how the technologies are used in the most basic configuration:

  • The Docker hosts are brought up with --label key=value listening on the network.
  • The Swarm daemon is brought up and pointed at a file containing a list of the Docker hosts that make up the cluster as well the ports they are listening on.
  • Swarm reaches out to each of the Docker hosts and determines their tags, health, and amount of resources in order to maintain a list of the backends and their metadata.
  • The client interacts with Swarm via it’s network port (2375). You interact with Swarm the same way you would with Docker: create, destroy, run, attach, and get logs of running containers amongst other things.
  • When a command is issued to Swarm, Swarm:
    • decides where to route the command based off of the provided constraint tags, health of the backends, and the scheduling algorithm.
    • executes the command against the proper Docker daemon
    • returns the result in the same format as Docker does

Basic Docker Swarm Diagram

The Swarm daemon itself is only a scheduler and a router. It does not actually run the containers itself meaning that if Swarm goes down, the containers it has provisioned are still up on the backend Docker hosts. In addition, since it doesn’t handle any of the network routing (network connections need to be routed directly to the backend Docker host) running containers will still be available even if the Swarm daemon dies. When Swarm recovers from such a crash, it is able to query the backends in order to rebuild its list of metadata.

Due to the design of Swarm, interaction with Swarm for all runtime activities is just about the same as it would be for other Docker daemon: the Docker client, docker-py, docker-api gem, etc.. Build commands have not yet been figured out, but you can get by for runtime today. Unfortunately at this exact time Ansible does not seem to work with Swarm in TLS mode6, but it appears to affect the Docker daemon itself not just Swarm.

This concludes the 1st post regarding Docker Swarm. I apologize for the lack of technical detail, but it will be coming in subsequent posts in the form of architectures, snippets, and some hands-on activities :) Look out for Part 2: Docker Swarm Configuration Options and Requirements coming soon!

All of the research behind these blog posts was made possible due to the awesome company I work for: Rally Software in Boulder, CO. We get at least 1 hack week per quarter and it enables us to hack on awesome things like Docker Swarm. If you would like to cut to the chase and directly start playing with a Vagrant example, here is the repo that is the output of my Q1 2014 hack week efforts:

  1. https://github.com/docker/swarm 

  2. http://blog.docker.com/2015/01/dockercon-eu-introducing-docker-swarm/ 

  3. http://www.meetup.com/Docker-meetups/events/148163592/ 

  4. http://blog.docker.com/2014/12/announcing-docker-machine-swarm-and-compose-for-orchestrating-distributed-apps/ 

  5. https://github.com/docker/swarm/issues 

  6. https://github.com/ansible/ansible/issues/10032 

Docker1 has one of the most gentle learning slopes of a new technology to enter the mainstream in a long time. A developer can get up and running in a very short amount of time2 and begin realizing value almost immediately with Docker, but the hard part comes when trying to secure the new technology for use in a production like environment. Production has a much higher standard when it comes to availability, security, and repeatability. This can lead to problems as the differences between the development and production environment are both:

  • Fairly complex to replicate in a secure way: It is not feasible to pass around the private key for your production certificate in the name of development environment automation. On the other hand, to generate a full CA and all of the certificates and keys required can be daunting.
  • Functionally quite a large delta: The difference between an insecure, non-tls environment and an SSL one can be significant. For instance, at this exact exact point in time it appears that Ansible does not yet support a TLS enabled Docker host3.

In order to attack this problem, we should attempt to replicate the prod environment when it is feasible and especially if it is easy and cheap. To this end, let’s create the full certificate chain needed to run a secure Docker Swarm4 cluster. I think you will find that it is both easy and cheap :)

The script

This is a bash script I used5 that will output everything within the directory it is run. It accomplishes the following things:

  • Creates a Certificate Authority
  • Creates a cert/key for Docker Swarm (supporting both client and server auth)
  • Generates 3 certificates for the individual Docker hosts with SAN IPs

It is required to set a config as we need to add a SAN IP Address entry to the certificate and CSR. This is required because without it, Swarm will spit out the following error:

ERRO[0282] Get https://10.100.199.201:2376/v1.15/info: x509: cannot validate certificate for 10.100.199.201 because it doesn't contain any IP SANs

Disclaimer: I am no SSL wizard and so some of the settings in the openssl.cnf may be insecure, not needed, or even both. In addition, you can see that this is totally insecure as

  • the password is in the script
  • the passwords are removed from the keys
  • many other reasons

Please don’t use these exact script or the generated certs for production use!

gen_ssl.sh

#!/bin/bash

export OPENSSL_CONF=openssl.cnf


echo 'Creating CA (ca-key.pem, ca.pem)'
echo 01 > ca.srl
openssl genrsa -des3 -passout pass:password -out ca-key.pem 2048
openssl req -new -passin pass:password \
        -subj '/CN=Non-Prod Test CA/C=US' \
        -x509 -days 365 -key ca-key.pem -out ca.pem


echo 'Creating Swarm certificates (swarm-key.pem, swarm-cert.pem)'
openssl genrsa -des3 -passout pass:password -out swarm-key.pem 2048
openssl req -passin pass:password -subj '/CN=dockerswarm01' -new -key swarm-key.pem -out swarm-client.csr
echo 'extendedKeyUsage = clientAuth,serverAuth' > extfile.cnf
openssl x509 -passin pass:password -req -days 365 -in swarm-client.csr -CA ca.pem -CAkey ca-key.pem -out swarm-cert.pem -extfile extfile.cnf
openssl rsa -passin pass:password -in swarm-key.pem -out swarm-key.pem

# Set the default keys to be Swarm
cp -rp swarm-key.pem key.pem
cp -rp swarm-cert.pem cert.pem

echo 'Creating host certificates (dockerhost01-3-key.pem, dockerhost01-3-cert.pem)'
openssl genrsa -passout pass:password -des3 -out dockerhost01-key.pem 2048
openssl req -passin pass:password -subj '/CN=dockerhost01' -new -key dockerhost01-key.pem -out dockerhost01.csr
openssl x509 -passin pass:password -req -days 365 -in dockerhost01.csr -CA ca.pem -CAkey ca-key.pem -out dockerhost01-cert.pem -extfile openssl.cnf
openssl rsa -passin pass:password -in dockerhost01-key.pem -out dockerhost01-key.pem

openssl genrsa -passout pass:password -des3 -out dockerhost02-key.pem 2048
openssl req -passin pass:password -subj '/CN=dockerhost02' -new -key dockerhost02-key.pem -out dockerhost02.csr
openssl x509 -passin pass:password -req -days 365 -in dockerhost02.csr -CA ca.pem -CAkey ca-key.pem -out dockerhost02-cert.pem -extfile openssl.cnf
openssl rsa -passin pass:password -in dockerhost02-key.pem -out dockerhost02-key.pem

openssl genrsa -passout pass:password -des3 -out dockerhost03-key.pem 2048
openssl req -passin pass:password -subj '/CN=dockerhost03' -new -key dockerhost03-key.pem -out dockerhost03.csr
openssl x509 -passin pass:password -req -days 365 -in dockerhost03.csr -CA ca.pem -CAkey ca-key.pem -out dockerhost03-cert.pem -extfile openssl.cnf
openssl rsa -passin pass:password -in dockerhost03-key.pem -out dockerhost03-key.pem

# We don't need the CSRs once the cert has been generated
rm -f *.csr

openssl.cnf

#
# OpenSSL example configuration file.
# This is mostly being used for generation of certificate requests.
#

# This definition stops the following lines choking if HOME isn't
# defined.
HOME			= .
RANDFILE		= $ENV::HOME/.rnd
oid_section		= new_oids
extensions		= v3_req

[ new_oids ]
tsa_policy1 = 1.2.3.4.1
tsa_policy2 = 1.2.3.4.5.6
tsa_policy3 = 1.2.3.4.5.7

####################################################################
[ ca ]
default_ca	= CA_default		# The default ca section

####################################################################
[ CA_default ]
dir		= ./tls		# Where everything is kept
certs		= $dir/certs		# Where the issued certs are kept
crl_dir		= $dir/crl		# Where the issued crl are kept
database	= $dir/index.txt	# database index file.
new_certs_dir	= $dir/newcerts		# default place for new certs.
certificate	= $dir/cacert.pem 	# The CA certificate
serial		= $dir/serial 		# The current serial number
crlnumber	= $dir/crlnumber
crl		= $dir/crl.pem 		# The current CRL
private_key	= $dir/private/cakey.pem# The private key
RANDFILE	= $dir/private/.rand	# private random number file
x509_extensions	= usr_cert		# The extentions to add to the cert
name_opt 	= ca_default		# Subject Name options
cert_opt 	= ca_default		# Certificate field options
default_days	= 365			# how long to certify for
default_crl_days= 30			# how long before next CRL
default_md	= default		# use public key default MD
preserve	= no			# keep passed DN ordering
policy		= policy_match

[ policy_match ]
countryName		= match
stateOrProvinceName	= match
organizationName	= match
organizationalUnitName	= optional
commonName		= supplied
emailAddress		= optional

[ policy_anything ]
countryName		= optional
stateOrProvinceName	= optional
localityName		= optional
organizationName	= optional
organizationalUnitName	= optional
commonName		= supplied
emailAddress		= optional

####################################################################
[ req ]
default_bits		= 1024
default_keyfile 	= privkey.pem
distinguished_name	= req_distinguished_name
attributes		= req_attributes
x509_extensions	= v3_ca	# The extentions to add to the self signed cert
string_mask = utf8only
req_extensions = v3_req # The extensions to add to a certificate request

[ req_distinguished_name ]
countryName			= Country Name (2 letter code)
countryName_default		= AU
countryName_min			= 2
countryName_max			= 2
stateOrProvinceName		= State or Province Name (full name)
stateOrProvinceName_default	= Some-State
localityName			= Locality Name (eg, city)
0.organizationName		= Organization Name (eg, company)
0.organizationName_default	= Internet Widgits Pty Ltd
organizationalUnitName		= Organizational Unit Name (eg, section)
commonName			= Common Name (e.g. server FQDN or YOUR name)
commonName_max			= 64
emailAddress			= Email Address
emailAddress_max		= 64

[ req_attributes ]
challengePassword		= A challenge password
challengePassword_min		= 4
challengePassword_max		= 20
unstructuredName		= An optional company name

[ usr_cert ]
basicConstraints=CA:FALSE
nsComment			= "OpenSSL Generated Certificate"
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer

[ v3_req ]
# Extensions to add to a certificate request
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names

[ v3_ca ]
subjectAltName = @alt_names
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid:always,issuer
basicConstraints = CA:true

[ crl_ext ]
authorityKeyIdentifier=keyid:always

[ alt_names ]
# The IPs of the Docker and Swarm hosts
IP.1 = 10.100.199.200
IP.2 = 10.100.199.201
IP.3 = 10.100.199.202
IP.4 = 10.100.199.203

Installation

Once you have generated the TLS keys and certificates they must be installed on the target machine. I prefer to just copy the certificate and the key files into /etc/pki/tls/certs/ and /etc/pki/tls/private/ respectively. Once they are installed, you can then fire up your Docker and Swarm daemons like so:

Docker

/usr/bin/docker -d \
  --tlsverify \
  --tlscacert=/etc/pki/tls/certs/ca.pem \
  --tlscert=/etc/pki/tls/certs/dockerhost01-cert.pem \
  --tlskey=/etc/pki/tls/private/dockerhost01-key.pem \
  -H tcp://0.0.0.0:2376

Swarm

/usr/local/bin/swarm manage \
  --tlsverify \
  --tlscacert=/etc/pki/tls/certs/ca.pem \
  --tlscert=/etc/pki/tls/certs/swarm-cert.pem \
  --tlskey=/etc/pki/tls/private/swarm-key.pem  \
  --discovery file:///etc/swarm_config \
  -H tcp://0.0.0.0:2376

Using the TLS enabled Docker daemon

Now in order to use the Docker daemon, you will have to present a client cert that was generated from the same CA as the certificate Docker/Swarm is using. We have generated one here and more can be made if needed. Set the following environment variables in order to tell the Docker client what to use for the TLS config:

export DOCKER_HOST=tcp://dockerswarm01:2376
export DOCKER_CERT_PATH="`pwd`"
export DOCKER_TLS_VERIFY=1

This will now enable the Docker client to communicate ‘securely’ with Docker Swarm and Docker Swarm to communicate securely with the Docker nodes behind it.

  1. https://www.docker.com 

  2. http://goo.gl/QlZ5qv 

  3. https://github.com/ansible/ansible/issues/10032 

  4. https://github.com/docker/swarm 

  5. https://github.com/technolo-g/docker-swarm-demo/blob/master/bin/gen_ssl.sh 

We all like to keep our code looking as neat as possible, but sometimes you also need to keep track of those small changes you’ve been making. A good way to ABC (Always Be Committin’) is to work in a branch and hack your way through the problem and then to clean up before submitting a PR.

Cut a new branch

What we are about to do here can be destructive as well as confusing. With tasks like that, it’s always nice to have a backup and so we are going to cut a new branch to work with off of the branch that all of the nasty more changes live in:

git checkout -b squashed_feature

Rebase from master

This will give us a branch that we can then safely rebase from master. This process will allow you to pick the commits you would like to squash and the ones you would like to keep. You can do this by running:

git rebase -i master

Squash commits

If you have merged master into your branch during your development process you will be unable to use this method. I normally will rebase my branch on master instead of merging in, but workflows may vary. Once you begin the rebase process, your git editor will open a file that looks like:

pick 3e33836 Initial commit of consul-template role
pick 7a09b99 remove boilerplate
pick 935e5e4 Default openssl.cnf
pick 6277156 Add all ips as SAN
pick 8cc1e89 Add SAN IP to certificates
pick e5aa77c Have to repro with hosts
pick 9819551 Change from 5 to 3 dockerhosts to reduce time

# Rebase d33881f..9819551 onto d33881f
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

In this view the most recent commit will be at the top and the oldest at the bottom. To squash all of the commits into a single one, change all the ‘pick’s but one to ‘squash’. Note: ‘s’ will also work instead of ‘squash’

pick 3e33836 Initial commit of consul-template role
s 7a09b99 remove boilerplate
s 935e5e4 Default openssl.cnf
s 6277156 Add all ips as SAN
s 8cc1e89 Add SAN IP to certificates
s e5aa77c Have to repro with hosts
s 9819551 Change from 5 to 3 dockerhosts to reduce time

...

Rewrite

You want to squash these changes as if you were to remove the line itself, the actual commit will be removed from history:

# This is a combination of 7 commits.
# The first commit's message is:
Initial commit of consul-template role

# This is the 2nd commit message:

remove boilerplate

...

At this screen there is no need to change the commit messages. After saving and quitting the editor, a new file will open to allow you to edit the commit message. In that file, just remove all of the messages and replace them with the one that you want.

You can now push your branch up and open a PR!

git push origin squashed_feature

Source