We live in a world that becomes more connected with each passing day. Public cloud hosts like Amazon Web Services (AWS) provide platforms with a wide array of capabilities that quickly scale based on demand. As a result, we’ve seen an explosion of new applications and services that continue to change our daily lives for the better. Data is a critical component of all of these systems. They can ingest vast amounts of data, process or transform it, and then pass it on.
Event streaming platforms like Amazon Kinesis provide a way to handle these massive streams of data reliably. In this article, we will explore Kinesis, discuss its capabilities, and explain how to manage it effectively. We’ll describe how Kinesis works and how you can use it to ensure that your organization’s data streams are well managed and properly utilized with the effectiveness and reliability that your organization requires.
What is Amazon Kinesis, and when do you use it?
Amazon Kinesis is a managed streaming solution that’s available as a service on AWS. Kinesis can receive, buffer, and process various forms of data in real time, including video feeds, IoT data, and logging events from thousands of sources. You can use Kinesis to transmit data and events to machine learning systems, data analytics, business intelligence, and many other destinations.
Because Kinesis is managed and built on top of AWS infrastructure, you don’t have to worry about provisioning hardware or handling fluctuations in the volume or frequency of data. Many different applications use Amazon Kinesis, including some that you might interact with daily (like Zillow, Netflix, and Lyft). Whenever an organization needs to aggregate data from multiple sources, Kinesis is up to the task.
Configuring Kinesis based on your needs
Like most managed services available from AWS, users can configure Kinesis to address their cost and performance requirements. You can configure the stream’s capacity and data retention period based on the size and volume of data that the stream expects to process as well as how long records need to be available for processing by the stream’s consumers.
Fig. 1: Creating a new Kinesis data stream in AWS
A Kinesis data stream consists of a collection of shards that are defined when the user creates the stream. A Kinesis shard is the base unit of capacity within a Kinesis stream, and it allows up to one thousand PUT events per second. A single shard can accept up to one megabyte of input data and two megabytes of output data per second. The number of shards is defined when the stream is first created and can be adjusted using a process known as resharding.
Kinesis streams allow you to buffer events, which prevents downstream processors from being overwhelmed and allows different clients to retrieve events when they’re available. The default data retention period for a Kinesis stream is 24 hours, but the owner of the stream can increase this time to a maximum of 365 days. As with increasing the capacity or number of shards within the stream, increasing the data retention period also increases charges.
Monitoring and observability
When you create a new Kinesis stream in the AWS console, you’ll have access to several calculators to help you estimate how many shards you’ll need as well as the associated costs. Even the most accurately configured streams need monitoring to ensure that they are performing optimally. Insufficient capacity can result in processing delays, and too much capacity can hurt your organization’s bottom line.
AWS provides two levels of monitoring. The first is the collection of stream-level metrics, which are written to CloudWatch each minute. This includes the following metrics (collected over a specified period):
- Counts of records received and read from the stream.
- The volume of data received and requested from the stream (in bytes).
- The length of time (duration) during which records were in the stream before being read (in milliseconds).
- The latency involved in reading records from the stream in a single request.
- Counts of successful and failed attempts to write or read from the stream.
- Rate exceptions and throttling events related to writing to and reading from the stream.
The second metrics collection can be enabled for enhanced monitoring (for an additional charge), and it contains shard-level metrics that are reported to CloudWatch at one-minute intervals. These metrics (which are collected for each shard for a specified period) include:
- Counts of records received and read.
- The data volume is received and read (in bytes).
- Read and write throughput exceptions.
Knowing the capacity of the data moving through your stream, how long it takes, and whether the records reflect any throttling or exceptions during their interactions with the stream is critical to understanding the health and efficacy of the stream’s configuration. With that said, you don’t want to allocate resources to manually watching metrics from Kinesis streams.
It is possible to create alarms within CloudWatch to alert you when performance deteriorates or if your Kinesis stream gets backed up. However, CloudWatch is a generic tool that, while practical, is not the most intuitive or user-friendly tool when it comes to the nuances of Kinesis stream performance and optimization. Let’s explore how you can automate monitoring of your Kinesis streams and achieve actionable observability into how your streams are performing.
Amazon Kinesis and Sumo Logic
By combining the convenience and scalability of AWS Kinesis with an analytics platform like Sumo Logic, you can get access to a preconfigured dashboard that combines metrics and event information into easy-to-read and actionable information.
Fig. 2: The preconfigured AWS Kinesis App from Sumo Logic
You can try it out for yourself by signing up for a free trial of Sumo Logic and configuring a source for the Sumo Logic App for Amazon Kinesis – Streams. Sumo Logic provides detailed documentation that walks you through the steps for configuring a metrics collector and starting the data flow from your AWS account to your Sumo Logic account.
Sumo Logic App for Amazon Kinesis – Streams
Once metrics are flowing into your account, it’s just a matter of adding the Amazon Kinesis – Streams App from the app catalog. Then, you’ll be able to view a dashboard similar to the one shown above, except it will be populated with metrics from your AWS Kinesis streams. You can also view the raw metrics plotted on easy-to-read graphs.
Fig. 3: Raw AWS Kinesis metrics in Sumo Logic
The best way to experience the benefits of monitoring AWS Kinesis with Sumo Logic is to try it out for yourself with a free trial. You’ll also have access to a comprehensive support portal that includes documentation, an active user community, and a helpful knowledge base. Once you’ve connected the dashboard, you can go further by setting up alerts based on any of the metrics that your Sumo Logic account collects.