How to Setup Multi-burn rate Windows Alert on Service Level Objectives

The burn rate is a calculation of how fast an issue is burning through the error budget.

  • Whether you generate timely alerts that allow SREs to diagnose and fix problems before they deteriorate,
  • If you generate alerts that are too reactive so the SLO is already in danger before they fire, or
  • If you generate tons of noisy alerts, which just get ignored.
  1. Precision is about keeping the signal-to-noise ratio high. A precise alert is triggered by a significant event, and 100% precision means every alert was triggered by a significant event.
  2. Recall is the other side of that, how many significant events actually triggered alerts. 100% recall means every significant event triggered an alert.
  3. Detection time is how long it takes for a significant event to trigger that alert, and
  4. Reset time is how long alerts continue to fire after the significant event has died down.



