While we are a long way from implementing Skynet, using machine learning combined with automation to make real-time decisions is here. In a recent talk at Sumo Logic Illuminate, Dave Frampton, General Manager of Cloud SIEM and Security Analytics, discusses the future of security with Vijaya Kaza, Head of Engineering and Data Science for Trust & Safety and Chief Security Officer at Airbnb. Kaza describes how automation, machine learning and AI can strengthen a company’s overall security posture.
Today, attackers are more organized and their methods are getting more sophisticated. Automation, machine learning (ML) and artificial intelligence (AI) offer a smart approach to address sophisticated attacks at scale.
The case for getting the basics right
“It’s very important to get the basics right and be able to scale it for long-term success,” notes Kaza. These basics come in the form of security hygiene, such as patch management—a foundational security task to address new vulnerabilities that emerge daily. Without workflow automation, the frequency of patching would be less, impacting overall security. When it comes to basics, more is better. For example, in the case of patch management, automation can go a step further to validate that a new patch works as expected. These small repetitive tasks, when collectively automated, help secure the basics and pay dividends down the road in terms of doing more with less. Automation of other basics such as creating templatized hardened configurations and development environments help reduce human error and drive higher levels of reliability.
Improve security by enhancing the user experience
Improving the user experience will have a big impact on the security posture of an organization. Automation to remove friction can make it easier for employees to take a more secure path to accomplish their day-to-day tasks. For example, security practitioners understand the importance of implementing “least-privilege access” for users in an organization. However, if there are too many manual steps involved in getting exceptions approved for greater access, it is tempting to over-provision access. Automating these workflows to streamline the user experience drives good security behavior.
Machine learning and artificial intelligence
When people hear about AI, some may see it as all hype with no application, while others see it as a way to make smart reasoned decisions. “While we are still several generations away from removing humans from the “loop” entirely in most security operations, many forms of AI, especially machine learning, are being used extensively already,” says Kaza.
Supervised machine learning
This type of machine learning leverages a large amount of historical, high-quality data to train models to detect problems early before other methods of detection identify a problem. Use cases include threat detection, anomaly detection, asset discovery, or posture assessment. Kaza describes how supervised machine learning represents an area where you feed labeled data into the training so it learns from a baseline.
Unsupervised machine learning
Unsupervised machine learning is a method where labeled data is not presented during the training process, so the model must self-discover any organic patterns within the dataset. There are many reasons for choosing one method over another depending on the problem you’re attempting to solve. For example, unsupervised machine learning can help identify bots by automatically differentiating bot behavior from humans, by using a clustering algorithm.
Scaling machine learning models
“Many companies are undergoing a digital transformation into the cloud era, which brings new opportunities and challenges,” said Kaza. As more sources of data become available, it can be difficult to ensure high quality for the data that is needed to train the models.
Creating and deploying an accurate machine learning model at scale needs several things: large data sets that are of high quality, ML infrastructure and high compute and storage capacity to process this large amount of data, and finally a mature ML practice and operations that include testing, monitoring, observability, experimentation and model analysis. Deploying a model is not a one-time operation. Models need constant tuning and retraining as new data becomes available, to ensure there is no degradation and drift in performance.
Data quality is even more important than data quantity
Having larger sets of data allows your model to identify underlying patterns that may not be apparent to the human eye. But if this is not quality data, it is effectively useless.
Ensuring high data quality requires investing in data governance and data culture. It involves:
- Establishing clear ownership for data sources
- Writing down data definitions and ongoing reviews to ensure data consistency and that there are no duplicate and conflicting data
- Implementing the right data architecture, and data platform to ensure the data is properly cleansed, prepared and made accessible to ML models on time.
A large quantity of data is usually a good thing, but it can lead to unintended problems. Security, privacy, bias and ethical considerations are very important. For example, a lack of diversity and a large imbalance of labeled data introduces bias and skews results. Another challenge includes ensuring live data doesn’t fall into the wrong hands.
Where to start
Implementing machine learning into your core operations to solve problems is no easy task. It requires people, processes, technology, and investment to identify high-quality data sources that represent some function of the business while building a culture that encourages experimentation and rewards innovation. This, combined with investing in the underlying infrastructure, will breed success.
What is the future of AI?
When asked about the future of AI, Kaza agrees that we are in the early stages. While much of the talk about AI is generally in the context of AI that is capable of simulating human reasoning and human behavior, there is much opportunity even with far less ambitious AI.
AI’s promise is in its ability to learn and get better over time with less human supervision. A future where the models are self-learning and manual data manipulation, analysis and labels are no longer needed could mean a very powerful application of AI for cybersecurity.
The future of AI is rooted in self-learning from structured and unstructured data in an unsupervised mode, enabling faster identification of potential problems, with limited human intervention.
Proactively manage risk
AI can self-learn and understand a dynamic environment, the assets and the users within it. As the environment changes, the models can learn about these changes and identify new risks, and make proactive recommendations for managing these new risks. This level of learning can also lead to tailored recommendations based on specific roles and user behaviors which can be of high value to the organization.
Combining AI and automation to solve future problems
As the AI understands the environments deeply, it could automatically generate tailored playbooks for faster remediation or automate tasks for self-healing.
In addition to managing risks, the AI can be leveraged to build threat simulations and test the efficiency of your security controls on an ongoing basis.
If you are embarking on a new ML project or automation of any sort, you should take a step back and understand the exact problem you want to solve. Don’t force a high-powered solution on a low-value problem. Identify specific areas or metrics for improvement such as reducing false alerts or reducing the number of tickets your department receives to maximize ROI. The goal should not be to eliminate human intervention, but rather enable fast detection and remediation. Start small, scale out, and build on your momentum.