5.7、模拟外部和服务器之间Ping
本文简要分析实验中外部和服务器之间Ping的过程。
路由都已经配置完成,
R1:
#
#配置去往服务器的路由,下一跳指向BL1
#
ip route-static 10.10.10.0 255.255.255.0 182.1.1.1
ip route-static 172.16.1.0 255.255.255.0 182.1.1.1
ip route-static 192.168.0.0 255.255.0.0 182.1.1.1
#
BL1:
静态路由注入BGP:
#
bgp 100
router-id 4.4.4.4
peer 1.1.1.1 as-number 100
peer 1.1.1.1 connect-interface LoopBack0
#
ipv4-family unicast
peer 1.1.1.1 enable
#
ipv4-family vpn-instance vpn1
import-route direct <-----------------------直连端口网段注入BGP
advertise l2vpn evpn
#
l2vpn-family evpn
policy vpn-target
peer 1.1.1.1 enable
peer 1.1.1.1 advertise irb
#
通过bgp evpn type 5 prefix-route update, BL1、Leaf1、Leaf2相同VPN(相同L3VNI)都相互学习到各自的路由(此时服务器主机路由还没有学习到)。
5.7.1 R1和Serv2之间Ping
R1和Serv2 Ping通过程比较简单,直接查询路由即可。
(1) R1 Ping Serv2
1、R1发起Ping 10.10.10.10(源地址为182.1.1.2),第一个icmp(icmp1)根据静态路由送到BL1
2、BL1查询vpn1的路由表,10.10.10.10的next-hop为20.20.20.20,进行vxlan封装:
[BL1]dis ip routing-table vpn-instance vpn1 10.10.10.10
Proto: Protocol Pre: Preference
Route Flags: R - relay, D - download to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table : vpn1
Summary Count : 1
Destination/Mask Proto Pre Cost Flags NextHop Interface
10.10.10.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
[BL1]
于是用L3VNI 100,进行vxlan封装,送到Leaf1
3、Leaf1查询vpn1的路由表(通过L3VNI 100关联到的),需从网关10.10.10.1端口送出,此时10.10.10.1如果没有10.10.10.10的mac1,所以发出arp1,并得到arp1 reply。此时icmp1 time out。
4、再次发出icmp request(简称icmp2),可顺利到达10.10.10.10。
5、10.10.10.10发出的icmp reply,到达Leaf1后,查询vpn1路由表,182.1.1.0/24 next-hop为40.40.40.40, 进行vxlan封装,送到BL1。
[Leaf1]dis ip routing-table vpn-instance vpn1 182.1.1.0
Proto: Protocol Pre: Preference
Route Flags: R - relay, D - download to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table : vpn1
Summary Count : 1
Destination/Mask Proto Pre Cost Flags NextHop Interface
182.1.1.0/24 IBGP 255 0 RD 40.40.40.40 VXLAN
6、BL1再查vpn1路由表,最终送达182.1.1.1。
至此顺利Ping通。
如果一开始ping不通,可能是ensp bug导致,可以进行undo bridge-domain/bridge-domain操作。另外可以看到网关10.10.10.1的端口发出一个ARP Announcement,可以了解一下:
Frame 49: 70 bytes on wire (560 bits), 70 bytes captured (560 bits) on interface -, id 0
Ethernet II, Src: HuaweiTe_da:58:76 (70:7b:e8:da:58:76), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 0100 = ID: 20
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000000000000000
Address Resolution Protocol (ARP Announcement)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
[Is gratuitous: True]
[Is announcement: True]
Sender MAC address: HuaweiTe_da:58:76 (70:7b:e8:da:58:76)
Sender IP address: 10.10.10.1
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 10.10.10.1
(2) Serv2 Ping R1
在Serv2上Ping 182.1.1.2地址,过程类似,由于路由都有了,可以顺利Ping通。只是Serv2如果没有网关10.10.10.1的mac,会首先发arp获得10.10.10.1的mac。
5.7.2 R1和Serv1之间Ping
需注意Leaf1和Leaf2都发布了172.16.1.0/24位vpn 路由,BL1的相应vpn在接收此路由时,会选择一条最佳路由。
[BL1]dis bgp evpn all routing-table prefix-route 0:172.16.1.0:24
BGP local router ID : 4.4.4.4
Local AS number : 100
Total routes of Route Distinguisher(200:1): 1
BGP routing table entry information of 0:172.16.1.0:24:
Label information (Received/Applied): 100/NULL
From: 1.1.1.1 (1.1.1.1)
Route Duration: 0d00h23m56s
Relay IP Nexthop: 192.168.14.1
Relay Tunnel Out-Interface: VXLAN
Original nexthop: 20.20.20.20
Qos information : 0x0
Ext-Community: RT <200 : 10>, Tunnel Type <VxLan>, Router's MAC <707b-e8da-5876>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, best, select, pre 255, IGP cost 2
Originator: 2.2.2.2
Cluster list: 1.1.1.1
Route Type: 5 (Ip Prefix Route)
Ethernet Tag ID: 0, IP Prefix/Len: 172.16.1.0/24, ESI: 0000.0000.0000.0000.0000, GW IP Address: 0.0.0.0
Not advertised to any peer yet
Total routes of Route Distinguisher(300:1): 1 注释:Leaf2 vpn1发布出来的
BGP routing table entry information of 0:172.16.1.0:24:
Label information (Received/Applied): 100/NULL 注释:L3VNI
From: 1.1.1.1 (1.1.1.1)
Route Duration: 0d00h23m56s
Relay IP Nexthop: 192.168.14.1
Relay Tunnel Out-Interface: VXLAN
Original nexthop: 30.30.30.30
Qos information : 0x0
Ext-Community: RT <200 : 10>, Tunnel Type <VxLan>, Router's MAC <707b-e82d-5cd3>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, best, select, pre 255, IGP cost 2
Originator: 3.3.3.3
Cluster list: 1.1.1.1
Route Type: 5 (Ip Prefix Route)
Ethernet Tag ID: 0, IP Prefix/Len: 172.16.1.0/24, ESI: 0000.0000.0000.0000.0000, GW IP Address: 0.0.0.0
Not advertised to any peer yet
EVPN-Instance __RD_1_400_1__:
Number of Ip Prefix Routes: 2
BGP routing table entry information of 0:172.16.1.0:24:
Route Distinguisher: 200:1
Remote-Cross route
Label information (Received/Applied): 100/NULL
From: 1.1.1.1 (1.1.1.1)
Route Duration: 0d00h23m57s
Relay Tunnel Out-Interface: VXLAN
Original nexthop: 20.20.20.20
Qos information : 0x0
Ext-Community: RT <200 : 10>, Tunnel Type <VxLan>, Router's MAC <707b-e8da-5876>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, best, select, pre 255
Originator: 2.2.2.2
Cluster list: 1.1.1.1
Route Type: 5 (Ip Prefix Route)
Ethernet Tag ID: 0, IP Prefix/Len: 172.16.1.0/24, ESI: 0000.0000.0000.0000.0000, GW IP Address: 0.0.0.0
Not advertised to any peer yet
BGP routing table entry information of 0:172.16.1.0:24:
Route Distinguisher: 300:1
Remote-Cross route
Label information (Received/Applied): 100/NULL
From: 1.1.1.1 (1.1.1.1)
Route Duration: 0d00h23m57s
Relay Tunnel Out-Interface: VXLAN
Original nexthop: 30.30.30.30
Qos information : 0x0
Ext-Community: RT <200 : 10>, Tunnel Type <VxLan>, Router's MAC <707b-e82d-5cd3>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, pre 255, not preferred for router ID
Originator: 3.3.3.3
Cluster list: 1.1.1.1
Route Type: 5 (Ip Prefix Route)
Ethernet Tag ID: 0, IP Prefix/Len: 172.16.1.0/24, ESI: 0000.0000.0000.0000.0000, GW IP Address: 0.0.0.0
Not advertised to any peer yet
[BL1]
这条最佳路由会注入L3vpn instance vpn1的路由表中:
[BL1]dis ip routing-table vpn-instance vpn1
Proto: Protocol Pre: Preference
Route Flags: R - relay, D - download to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table : vpn1
Destinations : 8 Routes : 8
Destination/Mask Proto Pre Cost Flags NextHop Interface
10.10.10.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
10.10.10.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
182.1.1.0/24 Direct 0 0 D 182.1.1.1 GE1/0/1
182.1.1.1/32 Direct 0 0 D 127.0.0.1 GE1/0/1
182.1.1.255/32 Direct 0 0 D 127.0.0.1 GE1/0/1
255.255.255.255/32 Direct 0 0 D 127.0.0.1 InLoopBack0
[BL1]
由于ENSP实验环境,服务器都是静默主机,所以分布式网关下的服务器主机路由暂时还没有出现在路由表中,后续需要ARP触发产生。
(1) R1 Ping Serv1 (182.1.1.2 --> 172.16.1.10)
R1主动Ping Serv1,根据路由表,BL1的vpn1路由表,去往172.16.1.0/24下一跳为20.20.20.20(Leaf1), icmp包会到达Leaf1,查询Leaf1的vpn 路由表,决定从分布式网关172.16.1.1发出,此时172.16.1.1没有172.16.1.10的mac地址,会发出arp request, 然后收到are reply,收到后,会触发Leaf1发出bgp evpn type 2 irb route,这样,BL1/Leaf2的vpn路由表都会装载172.16.1.10/32位主机路由。第一个icmp timeout,后续都可以ping通了。
Leaf1的G1/0/1抓包:
现在查看BL1的VPN路由表,已经有了172.16.1.10/32路由。
[BL1]dis ip routing-table vpn-instance vpn1
Proto: Protocol Pre: Preference
Route Flags: R - relay, D - download to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table : vpn1
Destinations : 9 Routes : 9
Destination/Mask Proto Pre Cost Flags NextHop Interface
10.10.10.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
10.10.10.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.10/32 IBGP 255 0 RD 20.20.20.20 VXLAN
182.1.1.0/24 Direct 0 0 D 182.1.1.1 GE1/0/1
182.1.1.1/32 Direct 0 0 D 127.0.0.1 GE1/0/1
182.1.1.255/32 Direct 0 0 D 127.0.0.1 GE1/0/1
255.255.255.255/32 Direct 0 0 D 127.0.0.1 InLoopBack0
[BL1]
Leaf2也出现了此路由:
[Leaf2]dis ip routing-table vpn-instance vpn1
Proto: Protocol Pre: Preference
Route Flags: R - relay, D - download to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table : vpn1
Destinations : 9 Routes : 9
Destination/Mask Proto Pre Cost Flags NextHop Interface
10.10.10.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
10.10.10.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.0/24 Direct 0 0 D 172.16.1.1 Vbdif300
172.16.1.1/32 Direct 0 0 D 127.0.0.1 Vbdif300
172.16.1.10/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.255/32 Direct 0 0 D 127.0.0.1 Vbdif300
182.1.1.0/24 IBGP 255 0 RD 40.40.40.40 VXLAN
182.1.1.1/32 IBGP 255 0 RD 40.40.40.40 VXLAN
255.255.255.255/32 Direct 0 0 D 127.0.0.1 InLoopBack0
[Leaf2]
查看BL1的bgp evpn路由表,可以看出是Leaf1(RD 20:1)发出的,并通告RT机制注入L3VPN对应的内部EVPN instance中:
[BL1]dis bgp evpn all routing-table mac-route
Local AS number : 100
BGP Local router ID is 4.4.4.4
Status codes: * - valid, > - best, d - damped, x - best external, a - add path,
h - history, i - internal, s - suppressed, S - Stale
Origin : i - IGP, e - EGP, ? - incomplete
EVPN address family:
Number of Mac Routes: 3
Route Distinguisher: 20:1
Network(EthTagId/MacAddrLen/MacAddr/IpAddrLen/IpAddr) NextHop
*>i 0:48:0001-0001-0001:0:0.0.0.0 20.20.20.20
*>i 0:48:5489-98b3-20fb:32:172.16.1.10 20.20.20.20
Route Distinguisher: 30:1
Network(EthTagId/MacAddrLen/MacAddr/IpAddrLen/IpAddr) NextHop
*>i 0:48:0001-0001-0001:0:0.0.0.0 30.30.30.30
EVPN-Instance __RD_1_400_1__:
Number of Mac Routes: 1
Network(EthTagId/MacAddrLen/MacAddr/IpAddrLen/IpAddr) NextHop
*>i 0:48:5489-98b3-20fb:32:172.16.1.10 20.20.20.20
[BL1]
Leaf1的G1/0/0(连Spine)抓包:
BGP EVPN Type 2 irb route update:
Frame 62: 181 bytes on wire (1448 bits), 181 bytes captured (1448 bits) on interface -, id 0
Ethernet II, Src: 38:7d:c8:03:01:00 (38:7d:c8:03:01:00), Dst: 38:7d:c8:04:01:01 (38:7d:c8:04:01:01)
Internet Protocol Version 4, Src: 2.2.2.2, Dst: 1.1.1.1
Transmission Control Protocol, Src Port: 179, Dst Port: 52675, Seq: 77, Ack: 77, Len: 127
Border Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 127
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 104
Path attributes
Path Attribute - ORIGIN: INCOMPLETE
Path Attribute - AS_PATH: empty
Path Attribute - LOCAL_PREF: 100
Path Attribute - EXTENDED_COMMUNITIES
Flags: 0xc0, Optional, Transitive, Complete
Type Code: EXTENDED_COMMUNITIES (16)
Length: 32
Carried extended communities: (4 communities)
Route Target: 100:10 [Transitive 2-Octet AS-Specific]
Route Target: 200:10 [Transitive 2-Octet AS-Specific]
Encapsulation: VXLAN Encapsulation [Transitive Opaque]
EVPN Router MAC: Router MAC: 70:7b:e8:da:58:76 [Transitive EVPN]
Path Attribute - MP_REACH_NLRI
Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
Type Code: MP_REACH_NLRI (14)
Length: 51
Address family identifier (AFI): Layer-2 VPN (25)
Subsequent address family identifier (SAFI): EVPN (70)
Next hop: 14141414
IPv4 Address: 20.20.20.20
[Expert Info (Error/Malformed): Unknown Next Hop length (4 bytes)]
Number of Subnetwork points of attachment (SNPA): 0
Network Layer Reachability Information (NLRI)
EVPN NLRI: MAC Advertisement Route
Route Type: MAC Advertisement Route (2)
Length: 40
Route Distinguisher: 0000001400000001 (20:1)
ESI: 00:00:00:00:00:00:00:00:00:00
Ethernet Tag ID: 0
MAC Address Length: 48
MAC Address: HuaweiTe_b3:20:fb (54:89:98:b3:20:fb)
IP Address Length: 32
IPv4 address: 172.16.1.10
VNI: 8000
VNI: 100
相应的抓包:
第一个icmp time out(此时172.16.1.1在发arp,请求172.16.1.10的mac,172.16.1.1收到arp reply后,触发bgp update( bgp evpn type 2 irb route for 172.16.1.10/32)。
后面都顺利ping通了。
(2) Serv1 Ping R1 (172.16.1.10 --> 182.1.1.2)
从服务器端发出Ping,过程类似。服务器Ping时,会发arp请求网关172.16.1.1的mac,网关172.16.1.1由于时分布式网关,会触发发送bgp evpn type 2 irb route。这样BL1和Leaf2相应VPN实例中都能学习到172.16.1.10/32主机路由。
5.7.3 R1和Serv3之间Ping
(1) R1 Ping Serv3 (182.1.1.2 --> 172.16.1.20)
过程类似。但需要注意一开始icmp包发往Leaf1(因为BL1 vpn1的路由表中172.16.1.0/24下一跳为20.20.20.20(即Leaf1))。
icmp包到Leaf1后,172.16.1.1发arp request,请求172.16.1.20的mac,arp请求会送到Leaf2(根据head-end peer-list),最终送到Serv3。由于172.16.1.1是分布式网关,所以arp reply会被Leaf2配置的分布式网关172.16.1.1截获(或者可理解为截胡),这也会触发Leaf2发出bgp evpn type 2 irb route for 172.16.1.20/32。此时第一个icmp request timeout。
随后BL1(包括Leaf1)都学习到172.16.1.20/32位路由,下一跳为30.30.30.30(即Leaf2),后续R1发出的icmp request,会被BL1直接发往Leaf2了,也可以正常Ping通了。
抓包信息(注意看Leaf间通信的L3VNI 100和L2VNI 8000):
BL1的VPN路由表有172.16.1.20/32位主机路由:
[BL1]dis ip routing-table vpn-instance vpn1
Proto: Protocol Pre: Preference
Route Flags: R - relay, D - download to fib, T - to vpn-instance, B - black hole route
------------------------------------------------------------------------------
Routing Table : vpn1
Destinations : 10 Routes : 10
Destination/Mask Proto Pre Cost Flags NextHop Interface
10.10.10.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
10.10.10.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.0/24 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.1/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.10/32 IBGP 255 0 RD 20.20.20.20 VXLAN
172.16.1.20/32 IBGP 255 0 RD 30.30.30.30 VXLAN
182.1.1.0/24 Direct 0 0 D 182.1.1.1 GE1/0/1
182.1.1.1/32 Direct 0 0 D 127.0.0.1 GE1/0/1
182.1.1.255/32 Direct 0 0 D 127.0.0.1 GE1/0/1
255.255.255.255/32 Direct 0 0 D 127.0.0.1 InLoopBack0
[BL1]
注意看表中有172.16.1.10/32和172.16.1.20/32两条主机路由,所以外部和内部服务器通信,可以选择正确的Leaf到达服务器,这也是分布式网关的好处。
(2) Serv4 Ping R1 (172.16.1.20 --> 182.1.1.2)
过程类似,但172.16.1.20首先会arp请求172.16.1.1的mac,这个过程会触发Leaf2发172.16.1.20的irb route。
所以整个过程不会有数据发往Leaf1。
在真实环境中,服务器上线一般不是静默主机,会主动arp请求网关mac,这样外部或者其他Leaf就知道某个主机IP处于哪个Leaf下面,提供通信效率。