Suppress Excessive Alerts with Auto-Alert Suppression Management

Auto-alert suppression management in OpsRamp delivers first-response actions to reduce redundant and noisy alerts. Learning-based first-response policies ensure that IT teams no longer have to create static rules for a target set of resources by configuring alarm thresholds, defining filter criteria, and specifying time intervals. 

Part one of this blog series, Cut Down Distractions, Reduce Stress and Focus on Critical Priorities with OpsRamp's First-Response Policies discussed how OpsRamp provides both time-based and attribute-based suppression to filter out false positives and deliver only relevant alerts. This post shares how first-response policies help address specific use cases for noise reduction and how to configure auto-alert suppression management in OpsRamp. 

IT operations teams can use first-response policies for the following real-world scenarios:

  1. Continuous Delivery, Deployment, and Integration. Keeping track of the availability and health of modern digital services hosted on dynamic and distributed hybrid infrastructure is an ongoing challenge. Enterprise DevOps teams make sure to frequently test a digital service in both staging and pre-production environments before actually deploying in production. First-response policies in OpsRamp help DevOps pros quickly categorize noisy alert streams across delivery pipelines and save time and effort by ignoring informational and seasonal metric-based alerts. 
  2. Implementing Standard Changes. Every IT department has a standard set of change management processes that get executed during the operational lifecycle. First-response policies can automatically suppress alerts that occur during a change management process while making sure that critical performance alerts are not missed. 
  3. Eliminate Duplicate Alerts during IT Outages. An enterprise IT service is typically dependant on multiple IT infrastructure services. During an outage, the failure of a dedicated load balancer might affect the functioning of other underlying infrastructure components. In this scenario, network teams might get inundated with alert notifications across all these different components. Intelligent inferencing and auto-alert suppression policies in OpsRamp ensure that IT teams receive only contextual alerts that help pinpoint root cause and ensure rapid issue resolution. 

Configuring Auto-Alert Suppression Management in OpsRamp 

IT teams can configure auto-alert suppression management in OpsRamp to address the above use cases. Time-based auto-alert suppression identifies seasonal alerts and restrains such alerts from showing up in IT event streams. Attribute-based auto-alert suppression withholds alerts that match specific criteria and only delivers context rich alerts to on-call support teams. Here’s how to configure first-response policies in OpsRamp: 

  1. Navigate to the Setup tab in the OpsRamp portal. Select First Response under Service Level Management.
  2. Apply the auto-alert suppression policy to the relevant client resource. Click on “Add” to define an alert suppression rule.
  3. Under Filter Criteria, IT teams can define rules for a set of devices or their entire tenant. Configuration rules help in auto-suppressing alerts during the development, testing, and deployment phases of the DevOps lifecycle.
  4. In the Policy Definition section, you can either select time-based suppression (analyze seasonal patterns and suppress alerts without any human intervention) or attributed-based suppression (suppress alerts that match specific conditions). Attribute-based suppression requires IT teams to upload a CSV sheet with different alert attributes for accurate alert pattern recognition and detection.
  5. Site reliability engineers also have the option of downloading a sample CSV file, filling in the required alert attributes, and then uploading the CSV file to the OpsRamp portal.
  6. After uploading the CSV file, incident management teams can tweak and train machine learning models with auto-suppression criteria for specific alert occurrences.
  7. As and when IT operations teams gain a better understanding of the different factors impacting IT events, they can update the CSV file with more accurate and relevant information. 

Next Steps:

State of AIOps report CTA

Recommended posts