OpenStack VM DHCP problem with Quantum? Guideline and real case

Today, OpenStack draws many eyeballs in deploying cloud-computingenvironments.
Whenusing OpenStack in practical scenarios, there will be numbers of detailedevils. One notorious bug is that booted vm sometimes cannot get an IP by DHCPautomatically. Many people encountered similar problems, and proposed severalsolutions, including restarting quantum related services. However, this maywork for some special cases, while fail on the others.
So,how to find out the crime culprit for your specified problem? In this article,we will show the guideline to locate the DHCP failure reason and demonstratewith a real case.
Debug Guideline:
0)Start a DHCP request in the vm using
sudo udhcpc
orother dhcp client.
1)Does the DHCP request reach the network node?
Ifnot, then you should use tcpdump to capture packets at the compute node’s andthe network node’s network interface (at the data network). A DHCP request usuallylooks like
IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP,Request from fa:16:3e:82:ee:fe, length 286
ifusing the following commands.
tcpdump -ni eth1 port 67 and port 68
2)If the DHCP request successfully reaches the network nodes, then make sure thequantum-dhcp-agent offers reply. This can be validated through the log file (/var/log/syslog),or by tcpdump also.
Thelog may look like
Jun 21 10:42:31 localhost, dnsmasq-dhcp[541]:DHCPREQUEST(tap9c753e61-fc) 50.50.1.6 fa:16:3e:82:ee:fe
Jun 21 10:42:31 localhost, dnsmasq-dhcp[541]:DHCPACK(tap9c753e61-fc) 50.50.1.6 fa:16:3e:82:ee:fe 50-50-1-6
Anda DHCP Reply usually looks like
IP 50.50.1.3.67 > 50.50.1.7.68: BOOTP/DHCP,Reply, length 308
Ifnot, make sure the quantum-* services starts successfully at the network node.
service quantum-dhcp-agent status
3)Make sure the DHCP reply goes back to the compute node using tcpdump too.
4)If the DHCP reply reach the compute node, then capture at the vm’scorresponding tap-* network interface, to make sure the reply can reach vm.
Ifnot, then try to check the quantum-plugin-openvswitch-agent services works fineat the compute node.
service quantum-plugin-openvswitch-agent status
5)Sometimes, you may need to restart the whole nodes if problem continues appearat a special machine.
A real case
Ihave met a weird case.
Inthe case, everything seems OK. The network node gets the DHCP request and givesback the offer, while the compute node successfully gets the DHCP offer. However,the vm still cannot get IP some times, while occasionally it will get one!
Ilook very carefully the entire process, and make sure all services are started.
Thenthe only suspicious component is the OpenvSwitch.
Icheck the of rules at the br-int (vm’s located bridge) using
ovs-ofctl dump-flows br-int
andthey looks like:
NXST_FLOW reply (xid=0x4):
 cookie=0x0,duration=2219.925s, table=0, n_packets=0, n_bytes=85038, idle_age=3,priority=3,in_port=1,dl_vlan=2 actions=mod_vlan_vid:1,NORMAL
 cookie=0x0,duration=2231.487s, table=0, n_packets=0, n_bytes=120021, idle_age=3,priority=1 actions=NORMAL
 cookie=0x0,duration=2227.341s, table=0, n_packets=0, n_bytes=16868, idle_age=5,priority=2,in_port=1 actions=drop
Theylook quite normal, as all the rules are generated by the quantum-plugin-openvswitch-agent service.
I also make sure the DHCP offerreach br-int with capturing packet at it’s data network interface.
tcpdump –ni int-br-eth1 port 67 or port 68
As I guess, the DHCP offershould match rule#1 (vlan mode), and send out. However, watch a while, then_packets does not increase, which means the DHCP offer does not match therule.
It is strange right? Why ovsdoes not work as expected?
Based on my years’ experienceon ovs, I think there must be some HIDDEN rule destroying the processing. Then Icheck more details of the rules.
ovs-appctl bridge/dump-flows br-int
HAHA,some thing now is floating outside.
duration=151s, priority=180001, n_packets=0,n_bytes=0, priority=180001,arp,dl_dst=fe:86:a7:fd:c0:4f,arp_op=2,actions=NORMAL
duration=151s, priority=180003, n_packets=0,n_bytes=0, priority=180003,arp,dl_dst=00:1a:64:99:f2:72,arp_op=2,actions=NORMAL
duration=148s, priority=3, n_packets=0, n_bytes=0,priority=3,in_port=1,dl_vlan=2,actions=mod_vlan_vid:1,NORMAL
duration=151s, priority=180006, n_packets=0,n_bytes=0, priority=180006,arp,nw_src=10.0.1.197,arp_op=1,actions=NORMAL
duration=151s, priority=180004, n_packets=0,n_bytes=0, priority=180004,arp,dl_src=00:1a:64:99:f2:72,arp_op=1,actions=NORMAL
duration=151s, priority=180002, n_packets=0,n_bytes=0, priority=180002,arp,dl_src=fe:86:a7:fd:c0:4f,arp_op=1,actions=NORMAL
duration=151s, priority=15790320,n_packets=174, n_bytes=36869, priority=15790320,actions=NORMAL
duration=151s, priority=180005, n_packets=0,n_bytes=0, priority=180005,arp,nw_dst=10.0.1.197,arp_op=2,actions=NORMAL
duration=151s, priority=180008, n_packets=0,n_bytes=0, priority=180008,tcp,nw_src=10.0.1.197,tp_src=6633,actions=NORMAL
duration=151s, priority=180007, n_packets=0,n_bytes=0, priority=180007,tcp,nw_dst=10.0.1.197,tp_dst=6633,actions=NORMAL
duration=151s, priority=180000, n_packets=0,n_bytes=0, priority=180000,udp,in_port=65534,dl_src=fe:86:a7:fd:c0:4f,tp_src=68,tp_dst=67,actions=NORMAL
table_id=254, duration=165s, priority=0,n_packets=13, n_bytes=2146,priority=0,reg0=0x1,actions=controller(reason=no_match)
table_id=254, duration=165s, priority=0,n_packets=0, n_bytes=0, priority=0,reg0=0x2,drop
Seethat? Packets are matching the red rule, which owns a high priority and justforward the vlan packet as NORMAL!!
Sowhere does the rule come from?
Insome version of ovs, when we start ovs without any controller specified, then itmay smartly works like a L2 switch, and some rules will be added automatically.
Nowhow to solve the problem?
Weneed to tell the ovs do not be that “Smart” with the commands:
ovs-vsctl set bridge br-int fail-mode=secure

Atlast, the problem has puzzled our team for several weeks. During solving theproblem, I summarize the guideline and wish it would be a little bit helpful.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值