Automatic Diagnostics and Metrics



Being a Site Reliability Engineer (SRE) is not an easy job. You have to manage code deployment, configuration, monitoring, etc. so that everything works in production without any problems. Triage, troubleshooting, remediation, and support are, for the most part, done manually. No matter how good you are, these processes are error-prone and require a lot of effort.  Automating them is the goal of the new tooling movement around AIOps.

What is AIOps?

AIOps stands for Artificial Intelligence in IT Operations. It makes use of advanced machine learning algorithms and AI techniques to analyze Big Data from various IT and business operations tools, speeding up service delivery, increasing IT efficiency, and delivering superior user experience. AIOps breaks away from siloed operations management.

AIOps is essentially applying machine learning algorithms to the vast amounts of data available in order to provide insights and make a higher level of automation possible. IT Ops no longer needs to largely depend on human operators for the modern software development life cycle (SDLC). Solutions powered by AIOps retrieve their intelligence from a variety of resources and give analytics platforms access to this stored data.

Simply said, AIOps delivers automatic diagnostics and metric-driven continuous improvement for the development (dev) and operations (ops) teams across the entire SDLC.

What are the main features of AIOps in helping SRE?

Correlate and Analyze Disparate Datasets

One of the techniques used in AIOps is Topology Analytics. Using this technique your SRE team can consume and correlate intelligence from multiple architectural layers. The root cause of your issue can be identified this way and will also be automatically and effectively remediated. This is much faster and more efficient than simply manually tracking symptoms and fixing them.

Holistic Visibility of Your Digital Delivery Chain

By using AIOps, you can visualize two important parts of your digital delivery chain: user experience and network and application performance.

All this can be done in a holistic way through intuitive dashboards and reports.

Network performance will increase by using AIOps because it eliminates manual tasks and streamlines workflows, resulting in enhanced collaboration and establishing autonomous operations.

The end-users’ overall experience with the application will be improved by AIOps. With predictive insights and automated remediation, SREs can prevent issues or reduce the impact if they arise, so the users can continue working with the application.

Reducing Alarm Noise and Enhance Prediction

As already said, the SRE team’s main task is to be customer-obsessed and to make sure the users’ engagement with the application is as expected. One of the services related to this is monitoring.

Manually monitoring the code via traditional tools by an SRE can be time-consuming and fraught with errors because redundant and false (positive and negative) alerts – alarm noise – can be triggered. Machine learning techniques and tools are a major part of AIOps, and by using these techniques the software can be trained continuously so it can identify if the alert is redundant, false, or something that needs to be dealt with immediately. This alert recognition will enhance every subsequent monitoring cycle, improving the predictive insights of your SRE team.

Zero-Touch Automation

AIOps enables your SRE team to deliver a fully orchestrated and comprehensive service with just a push of a button. It can cover the entire stack, including traditional mainframes and modern cloud-native applications (microservices and serverless). This also is applicable to your process and remedial workflows, enhancing your configuration process. Zero-touch automation at your service!

Continuous Improvement Through Operational Data

Every professional in the SDLC knows you can measure the quality of your software by processing it with operational data, as used by the end-user. By using operational data in your DTAP street when developing, testing or deploying your environments, you can verify if your software is capable of processing this. This is much better than using mock data because you can never assure the software will be functioning correctly in production when using non-production-like data.

By using operational data with AIOps you will continuously improve the SDLC with an adequate amount of resources from your dev and ops teams. These AIOps features will benefit the whole SDLC.

The following are some key benefits of AIOps:

Boost Service Levels

The described predictive insights and holistic orchestration will boost the service levels because the time spent analyzing and fixing issues will be decreased; improving the users’ experience.

Enhance Operational Efficiency

Operational efficiency will be increased because manual tasks are eliminated, workflows are streamlined and collaboration through the whole SDLC cycle is enhanced.

Enhanced Scalability and Agility

By automation and visualization through AIOps, insights are developed that could improve the scalability of your software and your SDLC team. Collaterally, it will also increase the agility and speed of your DevOps projects.

AIOps will help the SRE by implementing the following features:

  • Disparate dataset correlation and analysis
  • Enhanced prediction by reducing alarm noise
  • Zero-touch automation
  • Continuous improvement through operational data

In conclusion, AIOps benefits the SRE by implementing automatic diagnostics and metric-driven continuous improvement for dev and ops across the entire SDLC.

To learn more about Broadcom’s solutions can help to fuel successful SRE adoption, read the white paper, Unlocking the Value of the SRE Model.

Cordny Nederkoorn is a software test engineer with over 10 years experience in finance, e-commerce and web development. He is also the founder of TestingSaaS, a social network about researching cloud applications with a focus on forensics, software testing and security. Cordny is a regular contributor at Fixate IO. LinkedIn


Leave a Comment

Your email address will not be published. Required fields are marked *

Skip to toolbar