先解释一下什么是IPMP
IP网络多路径(IP networkmultipathing, IPMP)为在同一IP链路上具有多个接口的系统提供物理接口故障检测和透明网络访问故障转移功能。IPMP还为具有多个接口的系统提供了包负荷分配。通过IPMP,可以将一个或多个物理接口配置到IP多路径组(IPMP组)中。配置IPMP后,系统将自动监视IPMP组中的接口是否出现故障。由于IPMP的工作原理,感觉这种方式有点浪费IP,而且配置比较麻烦,我还是比较喜欢SUSE LINUS下进行两个绑定就可以实现的方式,简单又方便,不会浪费IP,呵呵~~
IPMP是通过ICMP的请求应答(常说的PING)来判定接口有没有出现故障的。通常先将需要互做备份的接口加入到同一个IPMP组中,然后需要在接口上各配置一个测试IP(这就是浪费IP的地方了......),测试IP就是用来PING默认网关的,如果网关有应答消息,则说明接口没有故障,否则说明接口有故障,会将业务IP切换到备用接口上,备用接口代替原接口工作,原来接口故障恢复后会自动再切回去。下面是我做实验的一个实际例子,我是在虚拟机上做的实验,
IPMP组名 aa
192.168.100.222 业务IP
192.168.100.223 网卡1的测试IP
192.168.100.224 网卡2的测试IP
192.168.100.102 默认网关
网卡的配置如下:
# more hostname.e1000g0

192.168.100.222 group aa netmask 255.255.255.0 broadcast 192.168.100.255 up \
addif 192.168.100.223 deprecated -failover netmask 255.255.255.0 broadcast 192.1
68.100.255 up
# more hostname.e1000g1  

192.168.100.224 group aa netmask 255.255.255.0 broadcast 192.168.100.255 depreca
ted -failover standby up
解释一下,由于第二个接口是配置成待机接口,因此不配置业务IP,只配置测试IP,很明显,当第一个接口出现故障时,业务IP会转移到第二个接口上的某个子接口上,不过子接口可以看作是一个实在物理接口,所以不会影响业务使用;
addif 表示为接口添加子接口,在这里是指要在子接口上配置测试IP;
deprecated 指表示测试地址不用于外发包,防止应用程序使用该IP;
-failover 表示在接口出现故障时测试地址不进行故障转移;
standby 将接口标记为待机接口;
此时如果在主机上抓包,可以看到192.168.100.223和224在不停的发送ICMP Echo request给192.168.100.102
# snoop 192.168.100.102
Using device /dev/e1000g (promiscuous mode)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1381)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1381)
     e1000g0 -> 192.168.100.102 ICMP Echo request (ID: 35074 Sequence number: 1283)
192.168.100.102 -> e1000g0      ICMP Echo reply (ID: 35074 Sequence number: 1283)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1382)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1382)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1383)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1383)
     e1000g0 -> 192.168.100.102 ICMP Echo request (ID: 35074 Sequence number: 1284)
192.168.100.102 -> e1000g0      ICMP Echo reply (ID: 35074 Sequence number: 1284)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1384)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1384)
     e1000g0 -> 192.168.100.102 ICMP Echo request (ID: 35074 Sequence number: 1285)
192.168.100.102 -> e1000g0      ICMP Echo reply (ID: 35074 Sequence number: 1285)
说明两个网卡都是正常的,所以业务IP会出现在e1000g0的子接口上

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.100.222 netmask ffffff00 broadcast 192.168.100.255
        groupname aa
        ether 0:c:29:8c:c1:2c 
e1000g0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
        inet 192.168.100.223 netmask ffffff00 broadcast 192.168.100.255
e1000g1: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 3
        inet 192.168.100.224 netmask ffffff00 broadcast 192.168.100.255
        groupname aa
        ether 0:c:29:8c:c1:36 

接下来,把e1000g0 down掉,在虚拟机中实现很简,在vmware上将第一块网卡断连就可以了,当然也可以用命令if_mpadm -d e1000g0将网卡强制failover
再抓包看看
# snoop 192.168.100.102
Using device /dev/e1000g (promiscuous mode)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1665)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1665)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1666)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1666)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1667)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1667)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1668)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1668)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1669)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1669)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1670)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1670)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1671)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1671)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1672)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1672)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1673)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1673)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1674)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1674)
只有e1000g1在向192.168.100.102发送ICMP Echo reply了

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g0: flags=19000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,FAILED> mtu 0 index 2
        inet 0.0.0.0 netmask 0 
        groupname aa
        ether 0:c:29:8c:c1:2c 
e1000g0:1: flags=19040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 2
        inet 192.168.100.223 netmask ffffff00 broadcast 192.168.100.255
e1000g1: flags=29040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY> mtu 1500 index 3
        inet 192.168.100.224 netmask ffffff00 broadcast 192.168.100.255
        groupname aa
        ether 0:c:29:8c:c1:36 
e1000g1:1: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 3
        inet 192.168.100.222 netmask ffffff00 broadcast 192.168.100.255
此时,192.168.100.222已经在e1000g1上去了

现在将第一张网卡恢复,会发现192.168.100.222又回到e1000g0上去了,说明故障恢复后,IPMP会将业务IP切换到已恢复的接口上
# snoop 192.168.100.102  
Using device /dev/e1000g (promiscuous mode)
     e1000g0 -> 192.168.100.102 ICMP Echo request (ID: 35074 Sequence number: 1743)
192.168.100.102 -> e1000g0      ICMP Echo reply (ID: 35074 Sequence number: 1743)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1841)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1841)
     e1000g0 -> 192.168.100.102 ICMP Echo request (ID: 35074 Sequence number: 1744)
192.168.100.102 -> e1000g0      ICMP Echo reply (ID: 35074 Sequence number: 1744)
     e1000g1 -> 192.168.100.102 ICMP Echo request (ID: 35075 Sequence number: 1842)
192.168.100.102 -> e1000g1      ICMP Echo reply (ID: 35075 Sequence number: 1842)
^C# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.100.222 netmask ffffff00 broadcast 192.168.100.255
        groupname aa
        ether 0:c:29:8c:c1:2c 
e1000g0:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
        inet 192.168.100.223 netmask ffffff00 broadcast 192.168.100.255
e1000g1: flags=69040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 1500 index 3
        inet 192.168.100.224 netmask ffffff00 broadcast 192.168.100.255
        groupname aa
        ether 0:c:29:8c:c1:36 

如果还有不清楚的,可以去sun网站上下些资料看看,http://docs.sun.com

当然在做实验中也遇到些问题,有个问题到现在都还没有解决,我最开始是将虚拟机的默认网关指向vmnet1的,vmnet1的网卡IP是192.168.100.1,但是实验怎么都不成功,后来抓包才发现原来是只有PING请求,没有应答,在自己电脑上对vmnet1抓包也可以看得到PING请求,只是我的电脑没有响应,汗~~~电脑防PING了,检查防火墙是关掉的,将360关了,只有个NOD32还在。。。这玩艺儿关不掉,只能禁用实时保护,结果还是不行。。我最后只能又启一台虚拟机192.168.100.102才做完实验,这个问题,还请高手们赐教
Solaris我也不是很熟,还在探讨中,可能其中有些地方理解得不对,还请高手们指教,先谢了