This article was first published on the StarCIO blog

It’s no secret to anyone working in technology that IT’s operating world is becoming more demanding and complex. 

Digital transformation, hybrid working, exponentially increasing data volumes, greater security risks, and expanding global regulations are all driving up business demands and expectations for reliable and robust technology operations. Business leaders expect IT teams to evolve their digital operational capabilities and support the speed and breadth of technological capabilities needed to compete.

The speed of change and the breadth of technologies drives IT operation’s challenges. Supporting innovation, multicloud environments, frequent application deployments, microservice architectures, high-performing customer experiences, machine learning model operations (ModelOps), and real-time data processing requirements are complexities IT operations must address.

In The 2021 State of Digital Operations Management produced by OpsRamp, survey respondents highlighted the paradoxes of improving operations and rationalizing technology complexities.  IT leaders conceded their biggest barriers to meeting organizational goals include keeping up with the pace of technology innovation (55%) and the pace of business innovation (35%). At the same time, they acknowledged complexities driven by legacy tools (42%), siloed organizations (40%), understaffed IT, (30%), and the lack of skills (30%).

What’s the answer for IT to improve operations amid growing complexities? Saying “no” to the business or aiming to consolidate to homogenous technology platforms is not a viable strategy for controlling the demand or managing the technical challenges.

Here are three steps IT operations should consider: 

  1. Consolidate Monitoring Tools Used in IT Operations
    In the OpsRamp survey, 83 percent of respondents state that they have eleven or more IT operations tools in use. For understaffed IT departments struggling to keep up with the skills required to support IT, having too many tools can be a drag force on productivity and performance.

    One place to review and audit is where monitoring tools are deployed and utilized. Over two decades of businesses supporting internet applications, IT added tools to monitor user experiences, application performance, APIs, integrations, and databases. They capture data and send alerts on infrastructure issues, application errors, and business service disruptions, and are far more effective than having end-users open incident tickets. But having too many tools, data sources, and uncorrelated alerts is also a problem.

    AIOps and applying machine learning in IT operations paves a path to tool consolidation, and in the OpsRam survey, 63 percent plan to use AIOps as part of their IT tool consolidation strategy.

    How does IT tool consolidation happen with AIOps?
    An AIOps solution helps consolidate the data, alerts, and tools used during incident management. Instead of a bridge call of experts investigating issues with multiple tools, the team starts its triage using one AIOps tool, correlated data, and fewer independent alerts. Leaders can then streamline which monitoring tools to standardize and look to sunset redundant tools and data sources.

  2. Leverage Machine Learning to Improve Incident Management
    Consolidating the number of monitoring tools has a financial ROI and reduces the required IT skills, but IT leaders also use AIOps solutions to improve operational performance. In the OpsRamp survey, 70% of respondents looking to implement AIOps solutions aim to solve critical issues faster.

    When there is a major incident, count the number of people that join the bridge call or participate in the war room. How many alerts need investigating, and which monitoring tools are most useful for diagnosing root causes?

    AIOps solutions improve incident management by reducing noisy alerts, correlating events, delivering actionable inferences, and identifying probabilistic root causes.

    In other words, instead of requiring a bunch of people to gather and decipher all the alerts, a machine learning algorithm has already started the process of analyzing the data, correlating the alerts, and presenting information in a consistent way for incident management teams to review. The analysis can help reduce the amount of time required to resolve major incidents by as much as 95 percent.

    IT can achieve these dramatic improvements in resolving incidents by combining the machine learning and automation capabilities in AIOps solutions. For example, when tier-1 support teams review incidents where machine learning has correlated alerts into a high-likelihood root cause, the support team can trigger automated recovery tasks and close incident tickets faster.

  3. Enable IT to Support Multicloud Digital Experiences
    Here are some of my recommendations on what IT should target in their operating charters:
  • Enhance business capabilities by delivering reliable, high performing, and secure digital experiences to customers and employees
  • Provide technology agility and flexibilities as most organizations are operating hybrid clouds and many target multicloud capabilities
  • Automate repeatable tasks and orchestrate complex procedures to reduce risk, improve quality, address security, enhance communications, and  free up people’s time
  • Leverage data in decision-making, servicing customers, reducing risks, and prioritizing initiatives
  • Simplify operations by using machine learning, integration, and automation capabilities    

Achieving this charter requires standardizing on platforms that help balance the paradox. On the one hand, AIOps solutions, automation capabilities, and integrations improve IT productivity and reduce complexity. On the other, IT expands capabilities by supporting multicloud applications and proactively addressing issues that impact digital experiences. In the middle, IT operation focuses on selecting and improving key performance indicators driven by AIOps, including service level objectives, mean time to resolve (MTTR) incidents, and time savings driven by automation.

Proactive IT leaders should communicate the goals (digital experiences, technical agility), the strategy (AIOps and automation), and the service level objectives to align IT and execute an incremental roadmap. AIOps and automation is the key capability for IT to deliver against a growing business charter while rationalizing operational complexities. 

I hope you will join me on a May 4th webinar where I will join Sheen Khoury, the Chief Revenue Officer at OpsRamp to discuss the survey results, the factors constraining organizational innovation, and various strategies and technology investments IT leaders can consider to continue on their modernization journey, including real world examples.

CTA-Live-Webinar-Digital-Ops-Mangement


Recommended posts