There are many ways to improve application availability. You can add more infrastructure in order to provide more resources to power your applications, and hopefully get an availability boost from it. You can improve the monitoring and resolution processes of your IT Ops team in order to reduce downtime.
And you can also write better, more resilient code. Improving code is often not easy, but it is the best thing developers can do to improve availability.
This article examines several strategies developers can follow in order to build more highly available applications.
All configuration should be externalized as much as possible. The easiest way to do this is by leveraging environment variables, which can be picked up at runtime so operational teams can deploy multiple instances in multiple environments, without ever needing to worry about adjusting property files or repackaging the app in any other way. Regardless of the language in use, environment variables are always available.
Environment variables are best for things that are required to start and get an application running, like server addresses, port numbers, and security credentials. These are items which operational teams will want to adjust potentially on every server, and will be included in the deployment pipeline.
Sample Ways to Access Environment Variables
You can also store configuration values in the database, but this is a less common approach. Normally, you’d use databases to store data that the application needs after it starts up, rather than variables that will need to be available when runtime starts.
Most applications are data-driven, and that data access and management layer can be a large source of high-availability problems. The largest sources of problems I’ve seen are related to locking in the data store.
By default, most data stores only lock the row or document that is explicitly being updated or deleted. A developer, however, can write an application in such a way that it locks more in order to improve availability and data protection. For example, an entire table or collection could be locked:
Good Updating the Schema
Exception handling for connections is extremely important in high-availability scenarios and is overlooked more often than any other problem I’ve experienced in my career. There are two main types of exceptions that need to be caught, handled and set to retry the connection:
- The stale exception from when the server has closed the connection, but the client still thinks it’s open. A common scenario here is a database server that has failed over to its secondary node, and the client tries to reuse a connection when its pool will fail.
- Failed to connect, either by timeout or connection refused. If a server crashes, depending on the database platform in use, failover may take a few seconds. That can be long enough to disrupt availability.
Depending on the language you are using to connect to a database, there will be a database connection manager available that will usually support pooling, which is an efficient way to use database connections.
If you are using Spring with Java, then this is what you would add to your datasource connection property:
Core Application Tier
The core of the application is where the business functionality is written and all the valuable work is done. To ensure it can be as scalable as possible, it is best to go as close to stateless as possible. In traditional applications, all the tiers of the application were in a single deployable unit, and being highly available and scalable was only really necessary for the largest of applications. In today's model of microservices and micro-frontends and mobile applications, having the session information and other stateful information isn’t nearly as valuable, and makes adding and removing instances of the application more complicated than it needs to be.
The idea of a stateless application is that no data about any request persists in the application once the request has finished. This means every request is self-contained. While at first this may not sound as efficient as keeping everything (like the client’s shopping cart) in memory, it provides a level of reproducibility and scalability that a stateful application just can’t achieve without getting into complex in-memory grid caches.
A stateless application, when receiving a request, will use the information in that request to retrieve everything it needs to process the transaction. The sample below shows the incoming HTTP headers that a stateless app could receive.
Request HTTP HeadersPOST /api/v1/add-item-to-cart HTTP/1.1 Host: store.example.com Authentication: Bearer token1234 Content-Type: application/x-www-form-urlencoded Content-Length: 22 item_id=42&num_items=5Line 1 is showing it is a POST so it is sending data to the server Line 2 shows the host it is sending to Line 3 is the authentication information, we will use the token to retrieve which user we are working on Line 4 shows the type of HTTP Post, other options are like application/xml or application/json Line 5 shows the length of the content being sent in Line 6 is the content being submitted, in this case adding 5 of item 42 to the cart
In the background, the service will look up the user, then update the cart record in the data store, and then return a response that looks like the following, showing it was successful with status code 200.
Response HTTP HeadersHTTP/1.1 200 OK Date: Mon, 15 May 2018 12:28:53 GMT Server: Apache Last-Modified: Wed, 15 May 2018 12:28:53 GMT Content-Length: 44 Content-Type: text/html Connection: Closed
Success!Line 1 is the status code (200 = Success) Line 2 is the date it was processed Line 3 is the server id which did the processing (Configurable, should be hidden in production) Line 4 is the date the document was last modified, for dynamic this should always be now Line 5 is the length of the content in the response Line 6 is the format of the response Line 7 is if this connection will stay open after this response Line 8+ is the content
If performance is a concern when you are considering stateless applications, just remember that the model of an externalized configuration allows operations to add and remove caching layers in front (ex: mem_cached for MySQL) of your applications and its dependencies, which can drastically speed up performance without requiring code changes or introducing a single point of failure.
Developers need to be mindful and consciously make an effort to keep up with and follow best practices to ensure the applications they are delivering can be deployed in a highly available configuration, and fully take advantage of the elastic infrastructure that is available on-demand in both public and private clouds.