This is the second installment of an ongoing series on how to use Docker Swarm to deploy applications and allow services to automatically scale based on load. In the last article, we looked at how to monitor Swarm clusters using the TICK stack and how to visualize resource consumption, etc.
In this article, we’re going to explore launching services under Swarm, and automating external access to services using Docker Flow Proxy.
Docker Swarm services
Docker Swarm does a wonderful job of abstracting a service and allowing services to scale easily. It also makes it trivial for services to talk to each other on the same network using the routing mesh. External access to services is provided via a multi-host ingress network overlay that connects containers running on all cluster nodes. That means, for exposed services, that all nodes of the cluster listen for incoming connections on each of the nodes and forward the connections using the routing mesh to one of the individual service containers.
This maps an external port on all machines to one service running on the cluster. See the figure below illustrating how this works:
This figure shows a three-node swarm cluster. There is a service called my-web with two service containers running on node1 and node2. Each of the nodes participates in the ingress network. The published port for my-web is port 8080, and is exposed on each of the swarm node machines. Any connection request to any of the swarm nodes on port 8080 will be automatically forwarded to one of the instances of my-web on port 80 via the swarm load balancer. This reserves the external port 8080 for the my-web service. (Please see this page for more information on how the ingress network works.)
For internal connectivity, services can simply connect to each other by name, provided they are on the same overlay network. The internal service discovery exposed via DNS and the routing mesh ensures that the connection request ends up on a container for the target service.
For external access, while the ingress network simplifies access, each exposed service needs a separate reserved port on the ingress network. For a small number of externally exposed services, this isn’t a problem, but for larger clusters with lots of externally exposed services, it becomes a pain to manage individual ports. The other problem is that exposing services on non-standard ports to external clients may require opening up ports on their outgoing firewall to allow them to connect to the exposed ports. So we need a mechanism to multiplex more than one service behind an exposed port.
Enter Docker Flow Proxy. From the Docker Flow Proxy page:
“The goal of the Docker Flow Proxy project is to provide an easy way to reconfigure proxy every time a new service is deployed, or when a service is scaled. It does not try to “reinvent the wheel, ” but to leverage the existing leaders and combine them through an easy-to-use integration. It uses HAProxy as a proxy and adds custom logic that allows on-demand reconfiguration.”
In other words, it allows you to use HAProxy to multiplex multiple services using a single exposed port. For example, you could potentially map your web application as well as your REST API over a single port without any real work. Configuration is simple, and the system scales well because of HAProxy.
Common multiplexing methods
HAProxy is fairly powerful when it comes to multiplexing requests between backends (services). Some common methods are described here:
- IP-based: This is useful if you have a system with multiple interfaces exposed on the same network — or you have more than one swarm node you can run HAProxy on. In either case, each IP can be associated with one service each.
- Host-based: You could also potentially map multiple hostnames to one or more IP addresses. Each hostname maps to a single service. HAProxy can differentiate the requests based on the hostname and send them to the appropriate backend.
- URL-based: You could also use part of the URL path to map to different backend services.
- SNI-based: Sometimes you might prefer to have end-to-end encryption between the client and the service. This prevents HAProxy from being able to see the actual traffic. While this precludes HAProxy from routing based on the URL contents, it can still use the SNI field from the TLS traffic handshake without needing to decrypt the stream and use that to route traffic to the correct backend service.
Typically, to reconfigure HAProxy when you add a new service, you’d need to update the configuration file and then send HAProxy a SIGHUP signal. All this is now handled automatically by DFP.
So depending on how you plan to expose services, you can instruct DFP to route traffic per service.
Running Docker Flow Proxy
Docker Flow Proxy can be launched as a Docker stack. Here’s what the stack definition (DFP-stack.yml) looks like:
version: "3.2" services: proxy: image: vfarcic/docker-flow-proxy ports: - 80:80 - 443:443 networks: - proxy environment: - LISTENER_ADDRESS=swarm-listener - MODE=swarm - CHECK_RESOLVERS=true - EXTRA_GLOBAL=stats socket /var/lib/hastats/haproxy.sock mode 600 level admin - STATS_USER=admin - STATS_PASS=password - STATS_URI=/stats - STATS_PORT=80 volumes: - haproxy-stats:/var/lib/hastats deploy: mode: global swarm-listener: image: vfarcic/docker-flow-swarm-listener networks: - proxy volumes: - /var/run/docker.sock:/var/run/docker.sock environment: - DF_NOTIF_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure - DF_NOTIF_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove deploy: placement: constraints: [node.role == manager] networks: proxy: external: true volumes: haproxy-stats: external: true
As you can see from the YAML file, the stack consists of two services:
- proxy (vfarcic/docker-flow-proxy)
- swarm-listener (vfarcic/docker-flow-swarm-listener)
The proxy service runs a controller app called docker-flow-proxy along with HAProxy, which actually listens on ports (80 and 443 normally), and then multiplexes to backend services.
The swarm-listener runs a single application that listens to the docker control socket on a swarm master for swarm events. When it finds events for adding or removing services, it checks if the service is tagged with special tags that it uses to (un)configure HAProxy. It then contacts all docker-flow-proxy instances and passes on config changes to them. The docker-flow-proxy instances, in turn, re-create the HAProxy configuration file and reload HAProxy. This mechanism makes it trivial to add a new service to the cluster and have HAProxy route to it without any intervention. As long as the service is tagged correctly, it should all work automatically.
Before launching the stack, you need to create an overlay network. Let’s call this “proxy.” This is an external (not defined in the stack .yaml file) overlay network that’s used by DFP to allow it to talk internally between the swarm-listener and proxy and other services that will be launched into the cluster. All services that need to use DFP to expose services externally need to also bind to the proxy network. To create the proxy network, run this:
$ docker network create -d overlay proxy
Once this is done, you can launch the stack like so:
$ docker stack deploy -c DFP-stack.yml dfp
This spins up the stack and brings up the services we need to run Docker Flow Proxy.
Docker Flow Proxy has a few configuration knobs that can be set. (Take a look at this page for more details.) Most of the configuration deals with setting up environment variables for the proxy service. (See the stack for some examples of configuration knobs for DFP.)
Adding a stack
Now that the DFP stack is up and running, let’s add an example app stack to the mix. The voting app we’re using here is a modified example application that’s part of the Docker examples repo in GitHub. This is what the app looks like:
Here’s the Docker stack YAML file (voting-stack.yml) for the same:
version: "3" services: redis: image: redis:alpine networks: - backend deploy: replicas: 1 update_config: parallelism: 2 delay: 10s restart_policy: condition: on-failure db: image: postgres:9.4 volumes: - db-data:/var/lib/postgresql/data networks: - backend deploy: placement: constraints: [node.role == manager] vote: image: sebmoule/vote_vote networks: - backend - proxy depends_on: - redis deploy: replicas: 1 update_config: parallelism: 2 restart_policy: condition: on-failure labels: - com.df.notify=true - com.df.distribute=true - com.df.servicePath=/vote/ - com.df.reqPathSearch=/vote/ - com.df.reqPathReplace=/ - com.df.port=80 result: image: sebmoule/vote_result networks: - backend - proxy depends_on: - db deploy: replicas: 1 update_config: parallelism: 2 delay: 10s restart_policy: condition: on-failure labels: - com.df.notify=true - com.df.distribute=true - com.df.servicePath=/result/ - com.df.reqPathSearch=/result/ - com.df.reqPathReplace=/ - com.df.port=80 worker: image: dockersamples/examplevotingapp_worker networks: - backend deploy: mode: replicated replicas: 1 labels: [APP=VOTING] restart_policy: condition: on-failure delay: 10s max_attempts: 3 window: 120s placement: constraints: [node.role == manager] networks: proxy: external: true backend: volumes: db-data:
The stack consists of five services, two networks, and one data volume.
- redis: This service runs a single instance of the Redis server which acts as a distributed queue to temporarily hold all the votes cast via the voting app (see below). It’s only connected to the backend network.
- db: The data volume is used as storage for the Postgresql database service. Since we aren’t using a persistent volume driver, this is an easy way to obtain persistence by forcing the Postgresql service to always launch on our master node and using a named volume on that node to actually store the data. It’s only connected to the backend network.
- vote: This service runs a Python Flask web application which lets the user vote for one of two choices. (In this case, it’s cats and dogs.) It’s attached to the backend network (to talk to the redis service) and to the proxy network (to be accessible from the DFP proxy service).
- result: This service runs a Node.js application which takes the cumulative result from Postgres and displays the current vote tally. This service is tied to both the backend network (to talk to the db service) and to the proxy network (to be accessible from the DFP proxy service).
- worker: This is a simple C# application service that reads votes from a Redis list and populates the votes database on the db service with the result. This service only connects on the backend network since both entities it talks to (redis and db) are accessible on the backend network.
- Use hostname/domain-based multiplexing by mapping a new DNS name to the cluster for each new namespace.
- Use TLS for end-to-end encryption and multiplex based on SNI hostnames.
To launch the service, save the YAML file above to “voting-stack.yml” and then start it like this:
$ docker stack deploy -c voting-stack.yml
Once the stack is launched, browse to http://
DFP Service configuration
If you look at the voting application diagram, we see two services that the DFP proxy service talks to. These are the services that need external access. They are both accessed over the same port (80), but at different URLs. The stanza below (for the voting application) shows how this works:
vote: image: sebmoule/vote_vote --- snip --- labels: - com.df.notify=true - com.df.distribute=true - com.df.servicePath=/vote/ - com.df.reqPathSearchReplace=/vote/,/ - com.df.port=80
The labels here tell DFP how to set up access to the service over HAProxy. The important labels are com.df.port, which tells DFP that the service is listening on port 80; the label com.df.servicePath tells DFP to configure HAProxy to map a URL fragment to the service backend (here on port 80), and com.df.reqPathSearchReplace tells DFP to configure HAProxy to rewrite the request URL before it’s passed on to the backend service. While this request rewriting works fine when accessing a web service, you might need to leave it out when serving static pages, since the web pages might reference other content at the rewritten URL, which won’t match subsequent incoming requests.
While URL-based multiplexing is straightforward for a simple example, it’s not the best or easiest way to deploy a bunch of applications to a swarm cluster. For example, if you wanted to run different versions of the same application on a single cluster, it would be difficult to do at best, since the URL fragments to map on would clash.
A couple of ways of working around that are:
While writing this article, I came across a bug that caused some intermittent 503 errors. It appears to happen when I add and remove service stacks frequently. If you see this, please upgrade to the latest version of Docker first, and if that still doesn’t fix it, running this should:
$ docker service update --force dfp_proxy
So that concludes this introduction to Docker Flow Proxy. The next article is going to talk about how to tie the TICK stack with DFP’s HAProxy connection stats and then use that information to scale services based on incoming connection count.
– Docker Swarm: https://docs.docker.com/engine/swarm/
– HAProxy: http://www.haproxy.org/
-Docker Flow Proxy: https://proxy.dockerflow.com/
-Swarm Ingress Network: https://docs.docker.com/engine/swarm/ingress/
– TICK Stack: https://www.influxdata.com/time-series-platform/