Big data is an older fashionable term that, when combined with agile methodology, AI and ML and analytics, has become a powerful trend once again in data operations (DataOps). If you’re a systems architect, it pays to understand the philosophies, benefits and tools to power DataOps and improve your data-driven applications. This article serves as a guide to define enterprise architectures that support the needs of an organization.
Inside Statistical Process Control (SPC)
The DataOps Manifesto defines a set of key principles to guide systems architects and implementation teams. In summary, they focus on the customer and analytics, driving positive change through a continuous improvement process that relies on both teamwork and automation to help drive efficiency and overall data and system quality. This is based on statistical process control (SPC), which is a quality-driven approach to using data and associated statistics to monitor, control and improve systems and processes. Quality, in this sense, translates to maximum application uptime, increased usefulness to the customer and reduced waste in terms of delivery time and resource usage.
Data is king in SPC, and most data-driven application architects go through a set of phases when they design systems. These phases include:
- Data Discovery: Knowing the sources of data used by applications, as well as identifying new sources of data about your applications and their environment. This includes performance statistics and other systems and process measurement data.
- Data Relationships: From the data and their sources, knowing the relationships and hidden value in your data is the next phase. This is the realm of big data and analytics where you search for business meaning buried in the data to be used to expand your offerings to customers while also enhancing and improving the customer experience.
- Data Monitoring: This means you want to ensure system uptime by monitoring your data-critical systems (e.g., data feeds, databases, analytics and so on), but it also includes automation of the first two phases above.
In line with SPC, performance data is used to ensure that your systems and their processes are stable, with the uptime and responsiveness needed to meet customer demand. To achieve this, the reach of DataOps extends from the data scientists and business analysts within your organization who solve business problems with data and algorithms, through to the database administrators and the developers who work with that data (e.g., ETL) and build applications from it.
From here, DataOps gives you insight into value discovery, turning these into new methods and tools, with an emphasis on continuous refinement. Tools and techniques will help you automate data mining, the discovery of potential performance improvements, identifying areas to create more efficiency and lower costs and prepare you for the future in terms of business and architectural needs.
Leveraging DataOps: Automation Is Key
Putting feedback loops in place helps to automate the entire data flow, from business intelligence, where tools are used to help data analytics processes, to data science, where new mathematical and architectural approaches are used to drive deeper understanding data, to developers and operations staff, where the data is aggregated into applications and further measured to drive new value back through this process.
Automation is used to build efficient feedback loops without overburdening your IT staff. In turn, automation tools are used not to replace your analysts, data scientists, developers or other IT personnel, but to enhance their jobs and make them more efficient at what they do. As data production and consumption continue to grow, your organization’s ability to discover hidden value in this data and drive it through to real system improvement will be a key competitive advantage.
While data visualization is important to effective DataOps, displaying graphs and pictorials can go only so far. Gaining actionable insight into metrics is what drives improvements in performance and efficiency. To remove the potentially error-prone manual approach to achieve this, organizations are leveraging an AI-driven approach to DataOps. This helps discover hidden dependencies between applications and their resources (e.g., databases and other data sources) along with their users and their needs, providing unified visibility and operational intelligence to optimize your entire ecosystem.
AI can help you dive deeper into the effectiveness of your data, going beyond operational efficiencies to identify improvements to code, overall system architecture and the configuration tools and frameworks in place across the cloud or data center. Going further, AI-driven DataOps can help you optimize your consumption of cloud-based resources or identify key components to migrate (with recommended changes to deployment) when moving to the cloud.
The Big Picture: Cloud, Microservices, Serverless and DataOps
As a systems architect, you need to maintain a big-picture view of the data, systems and processes used within your organization to best serve your customers. This means you need to understand the entire application ecosystem to communicate and simplify your system and data operations. Tools help provide and maintain a 360-degree view into the performance, interactions and dependencies between components in your software systems and data workflows, along with all of the storage, compute and network infrastructure that supports them.
To enable the value in DataOps and drive visibility of the data and interactions within your systems, adopting newer architectural approaches can help. For instance, breaking apart monolithic applications using a microservices approach helps drive real value throughout your technology ecosystem; it helps improve the makeup and velocity of your development teams, improves the benefits of operational efficiency, drives a test-driven approach to development (since these services are more easily verifiable) and helps enable cloud deployment.
Other approaches such as the use of containers and serverless architecture help remove system dependencies from implementation, reducing the impact of deployment changes on your applications’ code. With all of these changes, the data orchestration and workflows without your applications (including your customers’ usage of this data, in some cases) become more apparent and measurable.
Call to Action: Leverage Tools and Frameworks
DataOps helps operations staff, app developers and enterprise architects reduce the complexity of delivering reliable application performance. Whether you’re new to DataOps or have been on board for some time, knowing the tools and frameworks to help drive both your architectures and operational control is key to success. Tools such as those from Unravel (see Figure 1) provide monitoring solutions that map dependencies between applications and their data sources, and all of the components in between.
The tools you use should automate the discovery of value and optimizations to be made, where possible, from the viewpoint of business owners, IT operations staff and the data scientists and developers defining and building the systems. All of this should be achievable regardless of the platforms, frameworks, languages and cloud providers you use to enable your applications and users. Finally, make sure your DataOps approach and tools support your organization in the future, with support for continuous improvement of overall architecture as well as data processes.