VRRP介绍
虚拟路由冗余协议VRRP(Virtual Router Redundancy Protocol)是一种用于提高网络可靠性的容错协议。通过VRRP,可以在主机的下一跳设备出现故障时,及时将业务切换到备份设备,从而保障网络通信的连续性和可靠性。
(1)VRRP路由器
VRRP路由器(VRRP Router)是运行VRRP协议的设备,它可能属于一个或多个虚拟路由器
(2)虚拟路由器
虚拟路由器(Virtual Router)又称VRRP备份组,由一个Master设备和多个Backup设备组成,被当作一个共享局域网内主机的缺省网关
(3)Master路由器
Master路由器(Virtual Router Master)是承担转发报文任务的VRRP设备
(4)Backup路由器
Backup路由器(Virtual Router Backup)是一组没有承担转发任务的VRRP设备,当Master设备出现故障时,它们将通过竞选成为新的Master设备
(5)VRID
VRID是虚拟路由器的标识
(6)虚拟IP地址
虚拟IP地址(Virtual IP Address)是虚拟路由器的IP地址,一个虚拟路由器可以有一个或多个IP地址,由用户配置
(7)IP地址拥有者
如果一个VRRP设备将虚拟路由器IP地址作为真实的接口地址,则该设备被称为IP地址拥有者(IP Address Owner),如果IP地址拥有者是可用的,通常它将成为Master
(8)虚拟MAC地址
虚拟MAC地址(Virtual MAC Address)是虚拟路由器根据虚拟路由器ID生成的MAC地址,当虚拟路由器回应ARP请求时,使用虚拟MAC地址,而不是接口的真实MAC地址
DBGvpp# show vrrp vr
[0] sw_if_index 1 VR ID 2 IPv4
state Initialize flags: preempt yes accept yes unicast no
priority: configured 200 adjusted 200
timers: adv interval 100 master adv 0 skew 0 master down 0
virtual MAC 00:00:5e:00:01:02 固定部分+vrid
addresses 2.2.2.222
peer addresses
tracked interfaces
目前,VRRP协议包括两个版本:VRRPv2(RFC3768)和VRRPv3(RFC5798)。VRRPv2仅适用于IPv4网络,VRRPv3适用于IPv4和IPv6两种网络。
基于不同的网络类型,VRRP可以分为VRRP for IPv4和VRRP for IPv6(简称VRRP6)。VRRP for IPv4支持VRRPv2和VRRPv3,而VRRP for IPv6仅支持VRRPv3。
VRRPv2和VRRPv3的报文结构分别如图。
VRRPv2报文结构
VRRPv3报文结构
随着网络的快速普及和相关应用的日益深入,各种增值业务(如IPTV、视频会议等)已经开始广泛部署,基础网络的可靠性日益成为用户关注的焦点,能够保证网络传输不中断对于终端用户非常重要。
现网中的主机使用缺省网关与外部网络联系时,如果Gateway出现故障,与其相连的主机将与外界失去联系,导致业务中断。
局域网缺省网关示意图
VRRP的出现很好地解决了这个问题。VRRP将多台设备组成一个虚拟设备,通过配置虚拟设备的IP地址为缺省网关,实现缺省网关的备份。当网关设备发生故障时,VRRP机制能够选举新的网关设备承担数据流量,从而保障网络的可靠通信。如下图所示,当Master设备故障时,发往缺省网关的流量将由Backup设备进行转发。
VRRP协议中定义了三种状态机:初始状态(Initialize)、活动状态(Master)、备份状态(Backup)。其中,只有处于Master状态的设备才可以转发那些发送到虚拟IP地址的报文。下表详细描述了三种状态。
状态 | 说明 |
---|---|
Initialize | 该状态为VRRP不可用状态,在此状态时设备不会对VRRP通告报文做任何处理。通常设备启动时或设备检测到故障时会进入Initialize状态。 |
Master | 当VRRP设备处于Master状态时,它将会承担虚拟路由设备的所有转发工作,并定期向整个虚拟内发送VRRP通告报文。 |
Backup | 当VRRP设备处于Backup状态时,它不会承担虚拟路由设备的转发工作,并定期接受Master设备的VRRP通告报文,判断Master的工作状态是否正常。 |
Master设备选举过程
(1)Initialize
该状态为VRRP不可用状态,在此状态时设备不会对VRRP报文做任何处理
- 通常刚配置VRRP时或设备检测到故障时会进入Initialize状态
- 收到接口up的消息后,如果设备的优先级为255,则直接成为Master设备,如果设备的优先级小于255,则会先切换至Backup状态
(2)Master
当VRRP设备处于Master状态时,它将会做下列工作
- 定时(Advertisement Interval)发送VRRP通告报文
- 以虚拟MAC地址响应对虚拟IP地址的ARP请求
- 转发目的MAC地址为虚拟MAC地址的IP报文
- 如果它是这个虚拟IP地址的拥有者,则接收目的IP地址为这个虚拟IP地址的IP报文,否则,丢弃这个IP报文
- 如果收到比自己优先级大的报文,立即成为Backup
- 如果收到与自己优先级相等的VRRP报文且本地接口IP地址小于对端接口IP,立即成为Backup
- Backup
当VRRP设备处于Backup状态时,它将会做下列工作
- 接收Master设备发送的VRRP通告报文,判断Master设备的状态是否正常
- 对虚拟IP地址的ARP请求,不做响应
丢弃目的IP地址为虚拟IP地址的IP报文
- 如果收到优先级和自己相同或者比自己大的报文,则重置Master_Down_Interval定时器,不进一步比较IP地址
- Master_Down_Interval定时器:Backup设备在该定时器超时后仍未收到通告报文,则会转换为Master状态,计算公式如下:Master_Down_Interval=(3* Advertisement_Interval) + Skew_time,其中,Skew_Time=(256–Priority)/256
- 如果收到比自己优先级小的报文且该报文优先级是0时,定时器时间设置为Skew_time(偏移时间),如果该报文优先级不是0,丢弃报文,立刻成为Master
VRRP设备的工作方式有如下两种:
抢占模式:在抢占模式下,如果Backup设备的优先级比当前Master设备的优先级高,则主动将自己切换成Master。
非抢占模式:在非抢占模式下,只要Master设备没有出现故障,Backup设备即使随后被配置了更高的优先级也不会成为Master设备
如下图所示,路由器A、B、C通过配置VRRP组成一个虚拟路由器。虚拟路由器的IP地址可以与设备上某台设备的实际IP地址一致(实际上直接指定此设备为Master),也可以与它们的地址在同一个网段但不一致。在本例中,我们以前一种方式来举例说明,虚拟路由器的IP地址为路由器A的IP地址(注意:虚拟路由器的IP地址可以与设备上某台设备的实际IP地址一致,也可以与它们呢的地址在同一个网段但不一致。在本例中,我们以前一种方式来举例说明)。由于虚拟路由器的IP地址与路由器A的IP地址相同,因此路由器A为Master设备,路由器B、C为Backup设备。Client13的默认网关为10.10.0.1。作为Master设备,路由器A处理着Client13发往默认网关10.10.0.1的报文。
当Master设备出现故障时,路由器B和路由器C会选举出新的Master设备。新的Master设备开始响应对虚拟IP地址的ARP响应,并定期发送VRRP通告报文。
VRRP的详细工作过程如下:
VRRP备份组中的设备根据优先级选举出Master。Master设备通过发送免费ARP报文,将虚拟MAC地址通知给与它连接的设备或者主机,从而承担报文转发任务。
Master设备周期性向备份组内所有Backup设备发送VRRP通告报文,通告其配置信息(优先级等)和工作状况。
如果Master设备出现故障,VRRP备份组中的Backup设备将根据优先级重新选举新的Master。
VRRP备份组状态切换时,Master设备由一台设备切换为另外一台设备,新的Master设备会立即发送携带虚拟路由器的虚拟MAC地址和虚拟IP地址信息的免费ARP报文,刷新与它连接的设备或者主机的MAC表项,从而把用户流量引到新的Master设备上来,整个过程对用户完全透明。
原Master设备故障恢复时,若该设备为IP地址拥有者(优先级为255),将直接切换至Master状态。若该设备优先级小于255,将首先切换至Backup状态,且其优先级恢复为故障前配置的优先级。
Backup设备的优先级高于Master设备时,由Backup设备的工作方式(抢占方式和非抢占方式)决定是否重新选举Master。
|
static clib_error_t *
vrrp_init (vlib_main_t * vm)
{
vrrp_main_t *vmp = &vrrp_main;
clib_error_t *error = 0;
ip4_main_t *im4 = &ip4_main;
ip4_add_del_interface_address_callback_t cb4;
vlib_node_t *intf_output_node;
clib_memset (vmp, 0, sizeof (*vmp));
/*
初始化路由查找,是否一定需要初始化
*/
if ((error = vlib_call_init_function (vm, ip4_lookup_init)) ||
(error = vlib_call_init_function (vm, ip6_lookup_init)))
return error**;
vmp->vlib_main = vm;
vmp->vnet_main = vnet_get_main ();
/*
1)Vrrp 协议报文从接口直接发送出去
2)免费Arp/nd 报文 也直接从接口直接发送出去
*/
intf_output_node = vlib_get_node_by_name (vm, (u8 *) “interface-output”);
vmp->intf_output_node_idx = intf_output_node->index;
error = vrrp_plugin_api_hookup (vm);
if (error)
return error;
/*保存vvrp key
typedef struct vrrp_vr_key
{
u32 sw_if_index;
u8 vr_id;
u8 is_ipv6;
} vrrp_vr_key_t;
*/
mhash_init (&vmp->vr_index_by_key, sizeof (u32), sizeof (vrrp_vr_key_t));
/*
Vr index
*/
vmp->vrrp4_arp_lookup = hash_create (0, sizeof (uword));
vmp->vrrp6_nd_lookup = hash_create_mem (0, sizeof (vrrp6_nd_key_t),
sizeof (uword));
/*
注册IP地址操作回调接口
*/
cb4.function = vrrp_ip4_add_del_interface_addr;
cb4.function_opaque = 0;
vec_add1 (im4->add_del_interface_address_callbacks, cb4);
/*??*/
vrrp_ip6_delegate_id = ip6_link_delegate_register (&vrrp_ip6_delegate_vft);
return error;
}**
|| :- |
其中混杂模式使vswitch不按照Mac地址表进行转发,而是按照vmware管理的虚拟机网卡地址列表进行转发。MAC地址更改则保证虚拟机出来的报文,携带的源mac地址只能是vmware管理的虚拟机网卡mac地址。而对于VRRP应用,是由多台主机虚拟使用0000-5e00-xxxx的mac地址来仲裁VRRP虚IP。此安全选项此对于VRRP的场景,或其他使用虚拟mac地址的场景,会导致以太网帧直接被丢弃,到不了物理交换机上。
从抓包分析来看,数据报文在交换机上收到后被丢弃,vpp无法接收到数据报文,但是arp学习成功的。
/* *INDENT-OFF* */
VLIB_CLI_COMMAND (vrrp_vr_add_command, static) =
{
.path = “vrrp vr add”,
.short_help =
“vrrp vr add [vr_id ] [ipv6] [priority ] [interval ] [no_preempt] [accept_mode] [unicast] [<ip_addr> …]”,
.function = vrrp_vr_add_command_fn,
};
/* *INDENT-OFF* */
VLIB_CLI_COMMAND (vrrp_proto_start_stop_command, static) =
{
.path = “vrrp proto”,
.short_help =
“vrrp proto (start|stop) (<intf_name>|sw_if_index ) vr_id [ipv6]”,
.function = vrrp_proto_start_stop_command_fn,
};
//(master)
vrrp vr add GigabitEthernet2/9/0 vr_id 1 priority 200 accept_mode 2.2.2.254
vrrp proto start GigabitEthernet2/9/0 vr_id 1
// (salve)
vrrp vr add GigabitEthernet2/6/0 vr_id 1 priority 100 no_preempt accept_mode 2.2.2.254
vrrp proto start GigabitEthernet2/6/0 vr_id 1
typedef enum vrrp_vr_flags
{
VRRP_VR_PREEMPT = 0x1, //默认抢占模式,
VRRP_VR_ACCEPT = 0x2, //虚地址是否配置到接口
VRRP_VR_UNICAST = 0x4, //单播模式(非组播发送),需要配置对端IP,类似于OSPF/BGP //单播建邻居
VRRP_VR_IPV6 = 0x8, //IPv6,vrrp3
} vrrp_vr_flags_t;
- 保存VRPP基础保持
- 保存VRRP运行时配置
- 保存VRRP接口
- 配置组播路由
static int vrrp_intf_enable_disable_mcast (u8 enable, u32 sw_if_index**,** u8 is_ipv6**) {
}** |
---|
typedef CLIB_PACKED (struct
typedef CLIB_PACKED (struct
typedef CLIB_PACKED (struct
|
---|
网络中的一台主机如果希望能够接收到来自网络中其它主机发往某一个组播组的数据报,那么这么主机必须先加入该组播组,然后就可以从组地址接收数据包。
|
int
vrrp_vr_multicast_group_join (vrrp_vr_t * vr)
{
vlib_main_t *vm = vlib_get_main ();
vlib_buffer_t *b;
vlib_frame_t *f;
vnet_main_t *vnm = vnet_get_main ();
vrrp_intf_t *intf;
u32 bi = 0, *to_next;
int n_buffers = 1;
u8 is_ipv6**;
u32 node_index;
if (!vnet_sw_interface_is_up (vnm, vr->config.sw_if_index))
return 0;
if (vlib_buffer_alloc (vm, &bi, n_buffers) != n_buffers)
{
clib_warning (“Buffer allocation failed for %U”,** format_vrrp_vr_key**,
vr);
return -1;
}
is_ipv6 = vrrp_vr_is_ipv6 (vr);
b = vlib_get_buffer (vm, bi);
VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b);
b->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
vnet_buffer (b)->sw_if_index[VLIB_RX] = 0;
vnet_buffer (b)->sw_if_index[VLIB_TX] = vr->config.sw_if_index;
intf = vrrp_intf_get (vr->config.sw_if_index);
vnet_buffer (b)->ip.adj_index[VLIB_TX] = intf->mcast_adj_index[is_ipv6];
/*加入组播组核心代码*/
if (is_ipv6)
{
vrrp_icmp6_mlr_pkt_build (vr, b);
node_index = ip6_rewrite_mcast_node.index;
}
else
{
vrrp_igmp_pkt_build (vr, b);
node_index = ip4_rewrite_mcast_node.index;
}
f = vlib_get_frame_to_node (vm, node_index);
to_next = vlib_frame_vector_args (f);
to_next[0]** = bi**;
f->n_vectors = 1;
vlib_put_frame_to_node (vm, node_index,** f**);
return f->n_vectors;
}**
|| :- |
static void vrrp_igmp_pkt_build (vrrp_vr_t * vr, vlib_buffer_t * b**) {
}** |
---|
vrrp_adv_l2_build_multicast (vrrp_vr_t * vr, vlib_buffer_t * b)
构建L2组播头
static void
vrrp4_garp_pkt_build (vrrp_vr_t * vr, vlib_buffer_t * b, ip4_address_t *ip4)
static void
vrrp6_na_pkt_build (vrrp_vr_t * vr, vlib_buffer_t * b, ip6_address_t * addr6)
|
int
vrrp_garp_or_na_send (vrrp_vr_t * vr)
{
vlib_main_t *vm = vlib_get_main ();
vrrp_main_t *vmp = &vrrp_main;
vlib_frame_t *to_frame;
u32 *bi = 0;
u32 n_buffers;
u32 *to_next;
int i;
if (vec_len (vr->config.peer_addrs))
return 0; /* unicast is used in routed environments - don’t garp */
n_buffers = vec_len (vr->config.vr_addrs);
if (!n_buffers)
{
clib_warning (“Unable to send gratuitous ARP for VR %U - no addresses”,
format_vrrp_vr_key**,** vr**);
return -1;
}
/* need to send a packet for each VR address */
vec_validate (bi, n_buffers - 1);
if (vlib_buffer_alloc (vm, bi, n_buffers)** != n_buffers**)
{
clib_warning (“Buffer allocation failed for %U”,** format_vrrp_vr_key**,
vr);
vec_free (bi);
return -1;
}
to_frame = vlib_get_frame_to_node (vm, vmp->intf_output_node_idx);
to_frame->n_vectors = 0;
to_next = vlib_frame_vector_args (to_frame);
for (i = 0; i < n_buffers;** i**++)
{
vlib_buffer_t *b;
ip46_address_t *addr;
addr = vec_elt_at_index (vr->config.vr_addrs, i);
b = vlib_get_buffer (vm, bi[i]);
VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b);
b->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
vnet_buffer (b)->sw_if_index[VLIB_RX] = 0;
vnet_buffer (b)->sw_if_index[VLIB_TX] = vr->config.sw_if_index;
if (vrrp_vr_is_ipv6 (vr))
vrrp6_na_pkt_build (vr, b, &addr->ip6);
else
vrrp4_garp_pkt_build (vr, b,** &addr->ip4);
vlib_buffer_reset (b);
to_next**[i]** = bi**[i];
to_frame->n_vectors++;
}
vlib_put_frame_to_node (vm, vmp->intf_output_node_idx,** to_frame**);
return 0;
}**
|| :- |
|
int
vrrp_adv_send (vrrp_vr_t * vr, int shutdown**)
{
vlib_main_t *vm = vlib_get_main ();
vlib_frame_t *to_frame;
int i, n_buffers = 1;
u32 node_index,** *to_next, *bi = 0;
u8 is_unicast = vrrp_vr_is_unicast (vr);
/*直接从接口发送*/
node_index = vrrp_adv_next_node (vr);
if (is_unicast)
n_buffers = vec_len (vr->config.peer_addrs);
if (n_buffers < 1)
{
/* A unicast VR will not start without peers added so this should
* not happen. Just avoiding a crash if it happened somehow.
*/
clib_warning (“Unicast VR configuration corrupted for %U”,
format_vrrp_vr_key**,** vr**);
return -1;
}
vec_validate (bi, n_buffers - 1);
if (vlib_buffer_alloc (vm, bi, n_buffers)** != n_buffers**)
{
clib_warning (“Buffer allocation failed for %U”,** format_vrrp_vr_key**,
vr);
vec_free (bi);
return -1;
}
to_frame = vlib_get_frame_to_node (vm, node_index);
to_next = vlib_frame_vector_args (to_frame);
for (i = 0; i < n_buffers;** i**++)
{
vlib_buffer_t *b;
u32 bi0;
/*获取ipv4或者ipv6组播ip地址*/
const ip46_address_t *dst = vrrp_adv_mcast_addr (vr);
bi0 = vec_elt (bi, i);
b = vlib_get_buffer (vm, bi0**);
VLIB_BUFFER_TRACE_TRAJECTORY_INIT (b);
b->flags |= VNET_BUFFER_F_LOCALLY_ORIGINATED;
vnet_buffer (b)->sw_if_index[VLIB_RX] = 0;
/*指定发送接口*/
vnet_buffer (b)->sw_if_index[VLIB_TX] = vr->config.sw_if_index;
if (is_unicast)
{
dst = vec_elt_at_index (vr->config.peer_addrs, i);
vnet_buffer (b)->sw_if_index[VLIB_TX] = ~0;
}
Else
/*构造二层组播头*/
vrrp_adv_l2_build_multicast (vr, b);
/*添加三层头*/
vrrp_adv_l3_build (vr, b,** dst**);
/*添加vrrp头*/
vrrp_adv_payload_build (vr, b,** shutdown**);
vlib_buffer_reset (b);
to_next[i]** = bi0**;
}
to_frame->n_vectors = n_buffers;
vlib_put_frame_to_node (vm, node_index,** to_frame**);
vec_free (bi);
return 0;
}**
|| :- |
VLIB_REGISTER_NODE (vrrp_periodic_node) = {
}; |
---|
static uword vrrp_periodic_process (vlib_main_t * vm,
{
vlib_process_wait_for_event_or_clock先去检查non_empty_event_type_bitmap是否有置位,如有说明有事件需要去处理,则直接返回。否则将suspend状态标记置位,标识当前是suspend状态,等待event或clock
}** |
---|
void vrrp_vr_timer_timeout (u32 timer_index) {
}** |
---|
Master_Down_Interval定时器:Backup设备在该定时器超时后仍未收到通告报文,则会转换为Master状态,计算公式如下:Master_Down_Interval=(3* Advertisement_Interval) + Skew_time,其中,Skew_Time=(256–Priority)/256
void vrrp_vr_timer_set (vrrp_vr_t * vr, vrrp_vr_timer_type_t type**) {
}** |
---|
void vrrp_vr_transition (vrrp_vr_t * vr, vrrp_vr_state_t new_state**,** void *data) {
}** |
---|
|
static void
vrrp_input_process_master (vrrp_vr_t * vr, vrrp_header_t * pkt**)
{
/* received priority 0, another VR is shutting down. send an adv and
* remain in the master state
*/
if (pkt->priority == 0)
{
clib_warning (“Received shutdown message from a peer on VR %U”,
format_vrrp_vr_key,** vr**);
vrrp_adv_send (vr, 0);
vrrp_vr_timer_set (vr, VRRP_VR_TIMER_ADV);
return;
}
/* if either:
* - received priority > adjusted priority, or
* - received priority == adjusted priority and peer addr > local addr
* allow the local VR to be preempted by the peer
*/
if ((pkt->priority > vrrp_vr_priority (vr)) ||
((pkt->priority == vrrp_vr_priority (vr)) &&
(vrrp_vr_addr_cmp (vr, pkt) < 0)))
{
vrrp_vr_transition (vr, VRRP_VR_STATE_BACKUP,** pkt**);
return;
}
/* if we made it this far, eiher received prority < adjusted priority or
* received == adjusted and local addr > peer addr. Ignore.
*/
return;
}
/* RFC 5798 section 6.4.2 */
static void
vrrp_input_process_backup (vrrp_vr_t * vr, vrrp_header_t * pkt)
{
vrrp_vr_config_t *vrc = &vr->config;
vrrp_vr_runtime_t *vrt = &vr->runtime;
/* master shutting down, ready for election */
if (pkt->priority == 0)
{
clib_warning (“Master for VR %U is shutting down”, format_vrrp_vr_key,
vr);
vrt->master_down_int = vrt->skew;
vrrp_vr_timer_set (vr, VRRP_VR_TIMER_MASTER_DOWN);
return;
}
/* no preempt set or adv from a higher priority router, update timers */
if (!(vrc->flags & VRRP_VR_PREEMPT) ||
(pkt->priority >= vrrp_vr_priority (vr)))
{
vrt->master_adv_int = clib_net_to_host_u16 (pkt->rsvd_and_max_adv_int);
vrt->master_adv_int &= ((u16) 0x0fff);** /* ignore rsvd bits */
vrrp_vr_skew_compute (vr);
vrrp_vr_master_down_compute (vr);
vrrp_vr_timer_set (vr, VRRP_VR_TIMER_MASTER_DOWN**);
return;
}
/* preempt set or our priority > received, continue to wait on master down */
return;
}
always_inline void
vrrp_input_process (vrrp_input_process_args_t * args)
{
vrrp_vr_t *vr;
vr = vrrp_vr_lookup_index (args->vr_index);
if (!vr)
{
clib_warning (“Error retrieving VR with index %u”,** args**->vr_index);
return;
}
switch (vr->runtime.state)
{
case VRRP_VR_STATE_INIT:
return;
case VRRP_VR_STATE_BACKUP:
/* this is usually the only state an advertisement should be received */
vrrp_input_process_backup (vr, args->pkt);
break;
case VRRP_VR_STATE_MASTER:
/* might be getting preempted. or have a misbehaving peer */
clib_warning (“Received advertisement for master VR %U”,
format_vrrp_vr_key,** vr**);
vrrp_input_process_master (vr, args->pkt);
break;
default:
clib_warning (“Received advertisement for VR %U in unknown state %d”,
format_vrrp_vr_key,** vr**,** vr**->runtime.state);
break;
}
return;
}**
|| :- |