The environment that hosts your QA lab is critical to a sustainable testing culture. For us at Blackboard, our traditional QA lab (which ran on a single virtual machine) was just not cutting it.
That’s why we started looking for a better way to meet our testing goals. The way forward turned out to be Mesos. Here’s the story of how we converted our QA lab to run on Mesos.
The QA Lab
When we first started to look at what we could do with our environments to get faster feedback, we were using a traditional VM (virtual machine). This could only support one Jenkins job at a time, which meant everything was queued. The agent had to be manually installed and maintained, and it needed to re-run all setup steps each time we wanted to execute a job. This alone took around 20 minutes (before we even ran the tests), and was too slow to meet our goals of running on every commit (much less impossible, since we couldn’t support all feature branches).
Source: Ashley Hunsberger, adapted from Transformative Culture, Selenium Conference, 2017
Mesos and Marathon at a Glance
As a very high level overview for anyone not familiar with Mesos and Marathon, Mesos is the “network OS” that aggregates individual servers into pools of resources that Jenkins and Marathon run on top of. Jenkins uses Mesos to build and test our products, and Marathon uses Mesos to run our products. Docker provides containers for our services and jobs to run in.
You may be a visual learner, so this is what some of that looks like in context:
Source: Ashley Hunsberger, adapted from Transformative Culture, Selenium Conference, 2017
From the QA Lab to Mesos
The benefits of moving from our in-house QA Lab to Mesos were clear to us, and it was time to migrate. We started to containerize everything, and the migration opened up the possibility for per-branch pipelines. We could automatically reprovision our environments (remember, we used to have to manually do this), and now time-consuming steps were only run when there were changes in any of our dependency packages, instead of every time we ran the job. What was 20 minutes went down to five minutes.
Source: Ashley Hunsberger, adapted from Transformative Culture, Selenium Conference, 2017
A Glossary for Reference
Here are some terms that have helped me out along the way while I learned about Mesos, that I hope are useful for you, too!
- Jenkins – Our build server that coordinates everything required to build and test our applications.
- Jenkins DSL (Domain-Specific Language) – A simple Jenkins-specific language that we can use to define jobs programmatically and store them in version control.
- Jenkins Agent – This is a server that Jenkins uses to execute build tasks. Scaling can be accomplished by the addition of more agents.
- Jenkins Master – The “brain” of Jenkins which coordinates all tasks across all Jenkins agents.
- Docker Containers – A lightweight way to provide an isolated environment for an application and all its dependencies to run in. The isolation is provided by the base Linux OS.
- Mesos – Mesos abstracts away all computers on a network to provide a unified overview of CPUs, memory, disks, etc. This allows you to treat a large collection of servers as a single large computer. Mesos is like the kernel of an OS but one that runs on many computers.
- Mesos Framework – A Mesos framework can submit various tasks and services to run on Mesos itself. For example, Jenkins has a Mesos Framework that allows it to use Mesos- based agents to run build jobs.
- Marathon – This is a Mesos framework that runs long-lived services on top of Mesos. Marathon runs individual services within Docker containers.