Definitions
- SLI - Service Level Indicator. The ratio between good events to the total number of events in a given time window. For example,all the successful HTTP requests will be considered as good events.
- SLO -Service Level Objective. A target on the SLI. The SLO is this lower bound of SLI. Common choices are week, 28, or 30 days. For example, 99.9% 30-day availability SLO means that we want our availability SLI on any 30-day window will not to fall below 99.9 %.
- Error Rate - The ratio between the number of bad events to the total number of events.
- SLO Error Rate - The upper bound on the error rate. For example, the SLO error rate of 99.9 % SLO is 0.1 %.
- Error Budget - The number of allowable bad events in a given SLO time window.
- Burn Rate - How fast the service consumes the error budget.
Burn Rate Thresholds
B
u
r
n
R
a
t
e
=
P
e
r
i
o
d
×
(
E
r
r
o
r
B
u
d
g
e
t
)
÷
A
l
e
r
t
W
i
n
d
o
w
S
i
z
e
Burn Rate=Period \times (Error Budget) \div Alert Window Size
BurnRate=Period×(ErrorBudget)÷AlertWindowSize
For example, 2 % of a 30-day error budget in one hour burn rate is 14.4:
B
u
r
n
R
a
t
e
=
720
×
0.002
÷
1
=
14.4
Burn Rate=720 \times 0.002 \div 1 = 14.4
BurnRate=720×0.002÷1=14.4
5 % of a 30-day error budget in 6 hour burn rate is 6:
B
u
r
n
R
a
t
e
=
720
×
0.005
÷
6
=
6
Burn Rate=720 \times 0.005 \div 6 = 6
BurnRate=720×0.005÷6=6