In a partnership with Google, Netflix introduced open-sourced Spinnaker (November of 2015), a multi-cloud continuous delivery platform for releasing software changes with “high velocity” and “confidence.” Spinnaker is designed with integration and extensibility in mind with tools such as Jenkins, along with support for AWS, GCP, Kubernetes, OpenStack, and Cloud Foundry. Additionally, this platform facilitates the creation of building blocks which compose the deployment pipeline. Spinnaker pipelines can be triggered by the completion of a Jenkins job or a cron expression. As shown below, the pipeline is composed of multiple stages. This post will briefly discuss the advantages of using this tool for your deployment and the microservices that form Spinnaker.
Core Principles for Spinnaker
Google’s Software Engineer Manager, Steven Kim, outlined the core principles of continuous delivery at Google Cloud Next ‘17. He noted that continuous delivery cannot simply be better orchestration. It also means implementing the right principles: immutable infrastructure, deployment strategies, automation, and operation integration.
Spinnaker believes in deploying to the cloud using immutable infrastructure. There are two ways to change a server: mutable and immutable. Servers used to be maintained manually after initial creation. Manually maintaining servers has its drawbacks and can create “snowflake servers.” The main problem is that manual configuration leads to servers that are all configured slightly differently from one another, making the management and operation of them very difficult. Then there is the immutable option. New servers are created based on a “baked” image, which is generally automated. For any necessary modifications, this process will then be repeated. There are several advantages to this process. First, it prevents snowflake servers since each deployment is based on an image as opposed to manual configurations. Second, the infrastructure is versionable and can be easily replicated to a point in time for rollback/recovery.
Deployment Strategies: Metrics and Rollbacks
Spinnaker allows for safe deployments with both blue-green deployment and rolling blue-green deployment. However, with the version 1.1 release, Spinnaker enables even safer deployments with automated canary analysis (ACA).
Blue-green deployment reduces downtime risk by deploying to two identical production environments, called blue and green. At any time, 100% of the traffic directed to blue and green
exists on standby should green fail. The main drawback is that a company would need to double its resources to run this sort of operation. With the second option, rolling blue/green, deployment incrementally cuts over to the new stack—and with validation gates in between, you can run really robust smoke tests or functional probes. If something ever goes wrong, you can roll back immediately.
With the addition of automated canary analysis (ACA) in Spinnaker, canary analysis adds another layer of confidence to deployment. At Netflix, canary analysis used to be a manual process, where someone on the team would look at graphs and logs on the baseline and canary servers to see the metrics, error codes, response times/latency, exception, load average, and so forth. This becomes increasingly difficult with hundreds or thousands of servers. With Spinnaker, this process is automated and summarized within a report. With the ability to quickly look at metrics, you can safely deploy if the canary scores are met, or rollback when it fails.
Other Measures to Reduce Risk Deployments
Spinnaker has many more ways to reduce risky deployments in addition to easy rollbacks and automated canary analysis. Netflix’s Tools Engineer, Tomas Lin, has highlighted some of the techniques and tools available in Spinnaker to ensure deployments are safe.
- Concurrent execution is limited. This is to reduce the risk of deploying two conflicting pipelines.
- Execution time restriction. You can restrict the execution time of particular stages in Spinnaker. This ensures that riskier stages happen only when there are people in the office who are able to interfere manually if something goes wrong, and deploy at times where the servers are not at peak traffic.
- Manual judgement can be enabled to allow a real person (normally QA) to approve a process in the pipeline.
- Disabling pipelines. Pipelines can be disabled if there is an incorrect output.
- Precondition checks can be made in the pipeline, where the pipeline will stop if preconditions are not met.
- Chaos engineering. You can run automated experiments to test whether the fallback is working as expected with Chaos Automation Platform (ChAP). Additionally, you can test the survival of your application by intentionally destroying servers with Chaos Monkey (see Chaos Gorilla and Chaos Kong for large-scale failover).
Spinnaker is composed of multiple open source microservices: Deck, Gate, CloudDriver, Orca, Rosco, front50, Igor, and Echo. It may be a good idea to somewhat familiarize yourself with what the microservices do when working with Spinnaker. In order for Spinnaker to run properly, you need to ensure that all the services are running. (There may be times when you have to debug and figure out why one of these services may not be working.)
The following is a simplistic overview of each service:
- Deck is a static AngularJS-based UI (Spinnaker UI).
- Gate exposes APIs for external consumers of Spinnaker (including Deck). It is analogous to the front door to Spinnaker. The REST API fronts the following services: CloudDriver, front50, Igor, and Orca.
- CloudDriver is the main integration point for cloud providers (AWS, GCE, Cloud Foundry, Azure etc.), and it is responsible for all cloud provider-specific read and write operations.
- Orca is the orchestration engine for Spinnaker. It is responsible for taking a pipeline or task definition and managing stages and tasks, coordinating with other Spinnaker services. Orca pipelines are composed of stages, which in turn are composed of tasks, and persist a running execution to Redis.
- Rosco is a Packer-based bakery. Rosco bakes machine images and doles them out to cloud providers. This service helps maintain the principle of immutable infrastructure. Note that Rosco exposes a REST API which can be experimented with via the Swagger UI: http://localhost:8087/swagger-ui.html.
- Front50 is Spinnaker’s datastore, which stores all application, pipeline and notification metadata. It is by default Cassandra; however, it’s developed so that any datastore could work, such as S3.
- Igor facilitates the use of Jenkins in Spinnaker pipelines (a pipeline which can be triggered by a Jenkins job). Igor provides a single point of integration with Jenkins, Travis and Git repositories. Igor keeps track of the credentials for multiple Jenkins and/or Travis hosts and sends events to Echo whenever build information has changed.
- Echo provides outgoing events (email, Slack, etc) and incoming events; Echo listens in on Igor, front50 and Orca to trigger pipeline executions.
There are many advantages to using Spinnaker as mentioned above. Consider using this tool if you are deploying to different cloud providers. Spinnaker makes it easy to coordinate builds and deployment processes on multiple clouds with the same interface. Additionally, Spinnaker helps to standardize how pipelines are created, and the steps are very transparent in how code moves across environments. The primary drawback of the Spinnaker is that it is composed of multiple microservices, which can make managing the platform difficult. I advise using scripts available to monitor the health of these microservices. If you are using a single cloud provider, if may be simpler to manage using that provider’s deployment tool, such as CodeDeploy(AWS) Google Cloud Deployment Manager(GCP). If managing all the microservices is not a concern, I hope you give Spinnaker a try. Should you get stuck, Spinnaker has a very active community. Join the Slack channel and have fun!