Openstack Neutron: troubleshooting and solving common problems

From:http://abregman.com/2016/01/06/openstack-neutron-troubleshooting-and-solving-common-problems/


Important note: this post is based on the great sessions I Can’t Ping My VM! Learn How to Debug Neutron and Solve Common Problems of Rossella Sblendido & OpenStack Neutron Troubleshooting by Assaf Muller . So the credit goes to them. I simply gathered it here in a written form and added little bit of description and examples.  Enjoy =)

Common problems classification

The problems you may experience can be divided into several categories:

  • Misconfiguration –  you may experience issues due to inadequate configuration you put in the config files used by neutron. Wrong usage of the configuration tools may also be relevant and cause some issues. In addition, misconfigured underlying network will affect neutron functionality as every packet goes eventually through the physical. For example, it can be external network that isn’t reachable or firewall rule that is blocking traffic from your VMs or to them. So if the underlying network isn’t working, neutron will also fail to work properly.
  • Bug in the code – you may found a bug in the code. Good chances you are not the first to bump into this bug so it’s worth checking here if someone already reported it. If you can’t find the bug there, they you are probably the first one to catch it and you should report it so that the developers can start fixing it.

Issue #1: I can’t ping/ssh my VM using private IP

One of the common issues out there, especially for anyone who is starting to explore the OpenStack world. So In order to debug such issue, it will be wise to understand  how our VM getting an IP in the first place.

How does a VM get an IP?

In order to answer that we need to introduce the DHCP agent. If you are familiar with networking, you know DHCP is a protocol for distributing different network parameters (including IP addresses).

The DHCP agent communicates with neutron-server over RPC. It ensures network isolation using namespaces, so every network has its own dhcp namepsace. Inside this namespace there is a process called ‘dnsmasq’ and it’s the one that actually serves the DHCP parameters, including the IP address. So the DHCP agent configures this dnsmasq using a lease file.

Let’s see in more detail the IP allocation process:

vm_get_ip

At the end of the process, the new ip will be served and the VM will get its IP.

Let’s follow the traffic in more detail. It’s important to understand our packets flow. It will allow us to know where to look and hopefully find the issue more quickly.

We have two default implementation in neutron – openvswitch and linux bridge. Let’s start with openvswitch:

openvswitch_flow

Little bit of explanation on what we can see in the drawing:

The firewall bridge is a linux bridge. It’s there to be able to apply security groups which are firewall rules. They are implemented using iptables. You can not apply iptables to an interface that’s connected to openvswitch port, so that’s why we need the firewall bridge in the middle.

The integration bridge (br-int) is in charge of tagging & untagging the traffic that is coming from the vm and going to the vm, using the VLAN id assosicated with the network. Every network has a VLAN id and this VLAN id is used internally in the compute host to isolate the traffic (that’s why it’s called local VLAN id).

The tunnel bridge (br-tun) is the bridge in charge of the tunneling. It has the flows that will translate the VLAN id assigned to the network, into the segmentation id. If for example you are using GRE tuneel, the GRE tunnel id would be the segmentation id assigned to the network.

Now let’s see the flow with using linux bridge:

linux_bridge_flow

In linux bridge implemantion we have one linux bridge for every networ. You can see the we have net1 and the vm is connected to this network. We can also see the infterface plugged into net1 bridge is eth0.100, meaning vlan 100 assigned to net1 network.

Debugging Steps

First of all, check if the instance is up. It may sound trivial, but let’s not skip anything:

The output should be similar to this:

nova_list

In the above output we can see the instance is running. If it wasn’t running, we would want to peek in the logs to get a clue on what went wrong. Looking in the logs is always a wise step, as many issues should be reflected there.

Remember, at this point, the issue can be caused by anything. It can be even be not directly related to your OpenStack deployment, but rather to your hardware. For example, if you don’t have enough space or memory for VMs to boot and run. You can verify it with:

Anther common cause for this issue is the default security group rules. The default is not allowing ICMP (the protocol used by the ping command) traffic. So you may need to configure it so ICMP wouldn’t be blocked and you will be able to ping the machine.

As mentioned earlier, the physical underlying network may also cause the issues. Make sure you are able to ping between nodes in your environment.

Port binding

If the vm didn’t boot, check if you ran into port binding failure on either the vm port or router, DHCP ports.

For a vm port, it will be logged as port binding failure and so it will be easy for you to spot. For a DHCP or router, it’s not so easy since the ports are created asynchronously, meaning you will not see it right away. Let’s take routers for example. When you create a router and adding new interface, the operation will succeeded even if the ports created behind the scenes entered binding failure state. That’s because it happens asynchronously.

binding_failure2

There are two reasons this usually happens:

1. OVS agent was dead when you added new subnet or new interface port in your router.  This can be easily verified with:

You would see in ‘Open vSwitch agent’ line, under the ‘alive’ column, this: ‘xxx’.

Anther symptom of dead OVS agent is no VLAN tag under the tap device. You can verify it with:

At the moment, the only solution for this issue is to recreate the resource.

2. Misconfiguration in your agents or server config files.  This usually happens when you are using non-defaults values in the configuration file

Did the VM receive an IP?

So now that you know how a VM gets an IP, check if it happened. To check if your vm has an IP,  you can simply issue from the VM console:

No IP? Check If the DHCP agent is up and running:

The above command will list all the agents with their status. If the DHCP agent is up and running, you should see under the ‘alive’ column a smiley like this:   

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值