Seven KPIs for AIOps

This article originally appeared in eWeek.

Leaders looking to measure the benefits of AIOps and build key performance indicators (KPIs) for both IT and business audiences should focus on key factors such as uptime, incident response, remediation time and predictive maintenance, so that potential outages affecting employees and customers can be prevented.

Business KPIs connected to AIOps include employee productivity, customer satisfaction and web site metrics such as conversion rate or lead generation. Bottom line, AIOps can help companies cut IT operations costs through automation and rapid analysis; and it can support revenue growth by enabling business processes to run smoothly and with excellent user experiences.

Service-Centric-AIOps-Dashboard

These common KPIs can measure the impact of AIOps on business processes.

  1. Mean time to detect (MTTD)
    This KPI refers to how quickly it takes for an issue to be identified. AIOps can help companies drive down MTTD through the use of machine learning to detect patterns, block out the noise and identify outages. Amid an avalanche of alerts, ITOps can understand the importance and scope of an issue, which leads to faster identification of an incident, reduced down time and better performance of business processes.

  2. Mean time to acknowledge (MTTA)
    Once an issue has been detected, IT teams need to acknowledge the issue and determine who will address it. AIOps can use machine learning to automate that decision making process and quickly make sure that the right teams are working on the problem.

  3. Mean time to restore/resolve (MTTR)
    When a key business process or application goes down, speedy restoration of service is key. ITOps plays an important role in using machine learning to understand if the issue has been seen previously and, based on past experiences, to recommend the most effective way to get the service back up and running.

  4. Service availability
    Often expressed in terms of percentage of uptime over a period of time or outage minutes per period of time, AIOps can help boost service availability through the application of predictive maintenance.

  5. Percentage of automated versus manual resolution
    Increasingly, organizations are leveraging intelligent automation to resolve issues without manual intervention. Machine learning techniques can be trained to identify patterns, such as previous scripts that had been executed to remedy a problem, and take the place of a human operator.

  6. User Reported versus Monitoring Detected
    IT operations should be able to detect and remediate a problem before the end user is even aware of it. For example, if application performance or web site performance is slowing down by milliseconds, ITOps wants to get an alert and fix the issue before the slowdown worsens and affects users. AIOps enables the use of dynamic thresholds to ensure that alerts are generated automatically and routed to the correct team for investigation or auto-remediated when policies dictate.

  7. Time savings and associated cost savings
    The use of AIOps whether to perform automation or more quickly identify and resolve issues will result in savings both in operator time and business time to value. These have a direct impact on the bottom line.

In summary

These KPIs can be correlated to business KPIs around user experience, application performance, customer satisfaction, improved e-commerce sales, employee productivity, and increased revenue. ITOps teams need the ability to quickly connect the dots between IT infrastructure and business metrics to prioritize spend and effort on real business needs. Hopefully, as machine learning matures, AIOps tools can recommend ways to improve business outcomes or provide insights as to why digital programs succeed or miss the mark.

Next Steps:

State-of-AIOps-report-CTA


Recommended posts