SLA简介

SLA (Service-Level Agreement)简单的理解,就是测量一些网络性能参数,在超过一些门限值时,结合track或者EEM它可以触发一些操作。例如:
1. 监控下一跳的可达性,如果不可达了, 则让某一静态路由失效
2. 监控领居的接口地址,如果连续三次不可达, 则将端口shutdown

SLA 应用实例

如果客户的线路质量不好,又无法改善时,我们需要一种方法来:当线路质量达到一定阀值时,直接reset端口,用重置链路来改善。
那么我们如何达到这种需求呢,这时SLA就登场了,那么如何部署SLA呢?

分析第一种方法

 

 

ip sla 2
 icmp-echo 1.1.1.2
 timeout 3000
 frequency 10
<---频率设置为10S发一次
ip sla schedule 2 life forever start-time now
<---设置SLA的启动时间为马上,有效期为永远
!
Track 1 rtr 2
<---配置track, trace有up和down两种状态
!
event manager session cli username "username"
event manager applet test_track_1
<---EEM 配置
 event track 1 state down
<---如果track 1的状态是down的, 则执行下面的操作
 action 1.0 cli command "enable"
 action 2.0 cli command "conf t"
 action 3.0 cli command "int g4/3"
 action 3.1 cli command "shut"
 action 3.2 cli command "no shut"
 action 4.0 cli command "end"

根据以上配置,如果设备会每10秒发送一个PING包, 当超时时,track 1会变为down的状态,进而触发对端口的shut/no shut,这样达到最终的目的了么?在实际网络中,偶尔丢一个包是正常的、不可避免的,与此同时,端口是正常工作的。但是,根据上面的配置,端口依然会被reset,导致业务受到影响。

为了避免这一类不必要的业务影响,我们需要对此配置进行优化,让track 1在发生真正的网络故障时才down。
最常用的判断网络故障的标准是连续的超时!
因此加入以下命令(只说明增加的命令,另外修改的命令用红色标出):

 

ip sla 4
 icmp-echo 1.1.1.2
 timeout 1000
 frequency 10
ip sla schedule 4 life 5 start-time pending
<---并不是马上启动,有效期也只有5秒
!
ip sla reaction-configuration 2 react timeout threshold-type consecutive 3 action-type trapandtrigger
<---当发生连续3次timeout时,trigger另一个SLA,并发SNMP Trap
ip sla reaction-trigger 2 4
<---SLA 2连续三次timeout就要trigger SLA 4
!
track 1 rtr 4
<---track SLA 4,而不是SLA 2,为什么?

如果配置track 1 rtr 2,则每次SLA 2超时时,track 1都会down,每down一次EEM都会reset端口,这样功能还是和之前一样不合理。所以,需要配置track 1 rtr 4,因为SLA 4是pending的,它在SLA 2连续三次超时时才被触发(10*3 + 5 =35S)
 

OK,那么问题解决了么?
 

 

根据上篇文章分析的第一种方法到底行不行呢?
经过测试,确实可以规避原丢一个包就启动EEM的问题。
但有个问题,因为要新增一个sla,如777,且其状态为pending,即只有在17丢三个包的情况下才启动777。

有如下两种情况:

1、线路已经开通时,这时配置如上命令时,因17无法连续丢3个包,导致777始终不能启动,导致track17的状态始终为down,最终导致不管丢多少包都不能启动EEM。(想想为什么?)

规避措施:配置完如上命令时需要shut上端或下端端口30s(因每10s探测一次),这时777才能启动,然后再做no shut操作,track17状态才能变为up,才能在专线中断的情况下正常启动EEM。所以在已经开通的线路配置如上命令时都要中断主用线路最少30s。

2、线路尚未开通时,这时需要在配置完如上命令最少30s才能开通此MSTP线路,否则同样会有如上问题。

关于问题的分析及解释:
在这我就不解释命令的含义了,关于命令的含义,可以看上篇文章 《 关于IP SLA及与EEM联动的探讨<2>

Config:

 

ip sla monitor 17
 type echo protocol ipIcmpEcho 12.1.1.2
 timeout 3000
 frequency 10
!
ip sla monitor reaction-configuration 17 react timeout threshold-type consecutive 3 action-type trapAndTrigger
ip sla monitor reaction-trigger 17 777
ip sla monitor schedule 17 life forever start-time now
!
ip sla monitor 777
 type echo protocol ipIcmpEcho 12.1.1.2
 timeout 1000
 frequency 10
!
ip sla monitor schedule 777 life 5 start-time pending
track 17 rtr 777
!
event manager session cli username "username"
event manager applet test_track_17
 event track 17 state down
 action 1.0 cli command "enable"
 action 2.0 cli command "conf t"
 action 3.0 cli command "int s1/0"
 action 3.1 cli command "shut"
 action 3.2 cli command "no shut"
 action 4.0 cli command "end"
!
R1#sh debugging
Track debugging is on
Embedded Event Manager:
  Debug EEM action cli debugging is on
IP SLA Monitor:
  TRACE debugging for all operations is on

Debug Information:

1. 初始配置后是down的状态:

 

R1(config)#track 17 rtr 777
*Feb 25 10:15:06.979: Track: 17 Adding rtr object
*Feb 25 10:15:06.979: Track: Initialise
*Feb 25 10:15:06.983: Track: 17 New rtr 777, state Down
*Feb 25 10:15:06.987: Track: Starting process

R1#sh track
Track 17
  Response Time Reporter 777 state
  State is Down
    1 change, last change 00:01:19
  Latest operation return code: Unknown
  Tracked by:
     applet test_track_17

2. shutdown本断或对断端口,激活777,使track 17成为down

 

*Feb 25 10:16:25.107: IP SLA Monitor(777) Scheduler: Starting an operation
*Feb 25 10:16:25.107: IP SLA Monitor(777) echo operation: Sending an echo operation
*Feb 25 10:16:26.107: IP SLA Monitor(777) echo operation: Timeout
*Feb 25 10:16:26.107: IP SLA Monitor(777) Scheduler: Updating result
*Feb 25 10:16:26.777: IP SLA Monitor(777) Scheduler: Ageout

R1#sh track
Track 17
  Response Time Reporter 777 state
  State is Down
    1 change, last change 00:01:55
  Latest operation return code: Timeout
  Tracked by:
     applet test_track_17

3. no shut端口,再次激活777,使其成为up

 

*Feb 25 10:17:42.159: IP SLA Monitor(777) Scheduler: Starting an operation
*Feb 25 10:17:42.159: IP SLA Monitor(777) echo operation: Sending an echo operation
*Feb 25 10:17:42.171: IP SLA Monitor(777) echo operation: RTT=12
*Feb 25 10:17:42.175: IP SLA Monitor(777) Scheduler: Updating result
*Feb 25 10:17:42.175: IP SLA Monitor(777) Scheduler: Ageout
*Feb 25 10:17:46.983: Track: 17 Change #2 rtr 777, state Down->Up

R1#sh track
Track 17
  Response Time Reporter 777 state
  State is Up
    2 changes, last change 00:09:16
  Latest operation return code: OK
  Latest RTT (millisecs) 12
  Tracked by:
     applet test_track_17

4. shutdown本端端口,测试是否可以达到效果
注意:下面的时间戳跟上面的不是连续的,是经过两次测试得到的

 

R1#config ter
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#int s1/0
R1(config-if)#
R1(config-if)#shutdown
R1(config-if)#end
R1#
*Feb 25 09:47:27.911: %LINK-5-CHANGED: Interface Serial1/0, changed state to administratively down
*Feb 25 09:47:27.915: %ENTITY_ALARM-6-INFO: ASSERT INFO Se1/0 Physical Port Administrative State Down
*Feb 25 09:47:28.911: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1/0, changed state to down

*Feb 25 09:47:31.775: IP SLA Monitor(17) Scheduler: Starting an operation
*Feb 25 09:47:31.775: IP SLA Monitor(17) echo operation: Sending an echo operation
*Feb 25 09:47:34.779: IP SLA Monitor(17) echo operation: Timeout
*Feb 25 09:47:34.779: IP SLA Monitor(17) Scheduler: Updating result

*Feb 25 09:47:41.775: IP SLA Monitor(17) Scheduler: Starting an operation
*Feb 25 09:47:41.779: IP SLA Monitor(17) echo operation: Sending an echo operation
*Feb 25 09:47:44.779: IP SLA Monitor(17) echo operation: Timeout
*Feb 25 09:47:44.779: IP SLA Monitor(17) Scheduler: Updating result

*Feb 25 09:47:51.775: IP SLA Monitor(17) Scheduler: Starting an operation
*Feb 25 09:47:51.775: IP SLA Monitor(17) echo operation: Sending an echo operation
*Feb 25 09:47:54.779: IP SLA Monitor(17) echo operation: Timeout
*Feb 25 09:47:54.779: IP SLA Monitor(17) Scheduler: Updating result

*Feb 25 09:47:54.827: IP SLA Monitor(777) Scheduler: Starting an operation
*Feb 25 09:47:54.827: IP SLA Monitor(777) echo operation: Sending an echo operation
*Feb 25 09:47:55.831: IP SLA Monitor(777) echo operation: Timeout
*Feb 25 09:47:55.831: IP SLA Monitor(777) Scheduler: Updating result
*Feb 25 09:47:55.835: IP SLA Monitor(777) Scheduler: Ageout

*Feb 25 09:47:56.231: Track: 17 Change #5 rtr 777, state Up->Down
*Feb 25 09:47:56.251: fh_schedule_callback: EEM callback policy EEM Policy Director has been scheduled to run

*Feb 25 09:47:56.275: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : CTL : cli_open called.
*Feb 25 09:47:56.291: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.291: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1>
*Feb 25 09:47:56.295: %HA_EM-6-LOG:
R1#test_track_17 : DEBUG(cli_lib) : : IN  : >enable
*Feb 25 09:47:56.311: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.311: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1#
*Feb 25 09:47:56.311: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : IN  : #conf t

*Feb 25 09:47:56.327: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.331: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
Enter configuration commands, one per line.  End with CNTL/Z.
*Feb 25 09:47:56.335: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1(config)#
*Feb 25 09:47:56.339: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : IN  : #int s1/0

*Feb 25 09:47:56.355: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.355: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1(config-if)#
*Feb 25 09:47:56.355: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : IN  : #shut

*Feb 25 09:47:56.371: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.375: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1(config-if)#
*Feb 25 09:47:56.379: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : IN  : #no shut

*Feb 25 09:47:56.411: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.411: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1(config-if)#
*Feb 25 09:47:56.415: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : IN  : #end

*Feb 25 09:47:56.435: %SYS-5-CONFIG_I: Configured from console by name on vty1

*Feb 25 09:47:56.447: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT :
*Feb 25 09:47:56.447: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : OUT : R1#
*Feb 25 09:47:56.451: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : IN  : #exit
*Feb 25 09:47:56.455: %HA_EM-6-LOG: test_track_17 : DEBUG(cli_lib) : : CTL : cli_close called.

*Feb 25 09:47:58.387: %LINK-3-UPDOWN: Interface Serial1/0, changed state to up
*Feb 25 09:47:58.391: %ENTITY_ALARM-6-INFO: CLEAR INFO Se1/0 Physical Port Administrative State Down
*Feb 25 09:47:59.399: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1/0, changed state to up

R1#sh track
R1#sh track 17
Track 17
  Response Time Reporter 777 state
  State is Up
    6 changes, last change 00:15:41
  Latest operation return code: OK
  Latest RTT (millisecs) 3
  Tracked by:
     applet test_track_17

 

如何才能规避这种问题?或者说如何顺利满足要求呢?
其实在track中有非常方便的方法实现客户的要求,如下所示。

Config:

 

ip sla monitor 17 type echo protocol ipIcmpEcho 12.1.1.2 timeout 3000 frequency 10 ip sla monitor schedule 17 life forever start-time now ! track 1 rtr 17 event manager session cli username "username" event manager applet test_track_17 event track 1 state down action 1.0 cli command "enable" action 2.0 cli command "conf t" action 3.0 cli command "int s1/0" action 3.1 cli command "shut" action 3.2 cli command "no shut" action 4.0 cli command "end"

Debug information:

1. default config and no action

 

R1#debug ip sla monitor trace IP SLA Monitor TRACE debugging for all operations is on R1# R1# R1#debug track *Feb 25 11:06:04.091: IP SLA Monitor(17) Scheduler: Starting an operation *Feb 25 11:06:04.091: IP SLA Monitor(17) echo operation: Sending an echo operation *Feb 25 11:06:04.135: Track: 1 Adding rtr object *Feb 25 11:06:04.139: Track: Initialise *Feb 25 11:06:04.143: Track:  Latest RTT (millisecs) 44 Tracked by: applet test_track_17

2. the link down and not recovery in 30
注意:由于上面设备出了点问题,现在换成别的设备,配置基本一样,只不过ip sla monitor 被 ip sla取代了。

 

C7304_CE2#config ter Enter configuration commands, one per line. End with CNTL/Z. C7304_CE2(config)#int g0/0 C7304_CE2(config-if)# Latest RTT (millisecs) 1 Tracked by: EEM applet test_track_17

3. the link down, but recovery in 30s.

 

C7304_CE2#config ter Enter configuration commands, one per line. End with CNTL/Z. C7304_CE2(config)#int g0/0 C7304_CE2(config-if)# C7304_CE2(config-if)# 02:51:49: IP SLAs (17) Scheduler: saaSchedulerEventWakeup 02:51:49: IP SLAs (17) Scheduler: Starting an operation 02:51:49: IP SLAs (17) echo operation: Sending an echo operation 02:51:49: IP SLAs (17) echo operation: RTT=1 02:51:49: IP SLAs (17) Scheduler: Updating result C7304_CE2(config-if)# C7304_CE2(config-if)# C7304_CE2(config-if)# C7304_CE2(config-if)# C7304_CE2(config-if)#do sh track Track 1 Response Time Reporter 17 state State is Up 7 changes, last change 00:00:53 Delay down 30 secs Latest operation return code: OK Latest RTT (millisecs) 1 Tracked by: EEM applet test_track_17 C7304_CE2(config-if)# C7304_CE2(config-if)# C7304_CE2(config-if)#end C7304_CE2#un all All possible debugging has been turned off C7304_CE2#

OK,问题解决了,其实SLA与EEM能做很多事,举两个例子:
1. 如果在2层网络中使用浮动静态路由做backup,可能造成路由无法切换
7206-1(L3)—-(L2)3500(L2)—-(L2)3500(L2)—-(L3)7206-2

在7206之间存在两条线路,所以指两个静态到7206-2
这时有一个是浮动的,两个路由的下一跳都是7206-2的三层接口地址
如果7206-2出现问题,7206-1自动切换到浮动静态上
但实际上是不行的,主要原因就是7206-1和7206-2在一个网段中,所以只要7206-1的端口不down,那么浮动静态永远都不能激活。
此时只能用SLA的icmp echo来达成目的:

 

track 22 rtr 11 ip route 0.0.0.0 0.0.0.0 1.1.1.1 track 22 ip sla 11 icmp-echo 1.1.1.1 timeout 1000 frequency 5 ip sla schedule 11 life forever start-time now

2.定时采集信息
如果客户想在cpu利用率超过70%时,报出一些自定义的信息,或30秒采集某个端口的信息,这是就需要EEM出马了,用它可以简单实现客户的需求,在此配置就忽略了。