Infiniband IPoIB Debug FAQ

Here's an update to my initial attempt at an IPoIB FAQ:

ping doesn't work between IPoIB nodes. What should I do ?

First, verify that the ports are active.

This can be done via:

cat /sys/class/infiniband/mthca0/ports/1/state

This should indicate 4: ACTIVE

assuming the HCA is mthca0 and port 1 is the one plugged into the subnet
(switch, etc.).

If the port is not active, there could be several reasons:

1. You need an SM in your subnet to bring the ports to active. Do you
have an SM ? This can be embedded in a switch or some other IB hardware
or run on an end node (HCA) although OpenIB (gen2) does not currently
support this.

2. If you have an SM in your subnet, there might be a cabling problem
where the SM cannot "reach" your end node.

If the port is active, indicate the subnet configuration and which SM is
being utilized.

Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0"
show anything on the other nodes when you try to ping or something?

There are 2 levels of IPoIB debug which can be enabled when building:
IP-over-InfiniBand debugging and IP-over-InfiniBand data path debugging.
The latter has performance implications and should only be enabled when
all else fails. Enable the first level of IPoIB debug and then:

mount -t ipoib_debugfs none /ipoib_debufs/
cat /ipoib_debugfs/ib0_mcg

Other things to verify and supply to help isolate the problem:

1. Verify the firmware version via

cat /sys/class/infiniband/mthca0/fw_ver

For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version
4.5.3 is recommended.

2. Make sure the IB modules are loaded:
/sbin/lsmod | grep ib_
should show ib_mthca (HCA driver) as well as ib_ipoib. There are others
but those are the two which need to be loaded and any others will
follow.

3. Make sure there are no errors in /var/log/messages pertaining to ib_.

4. Indicate the IP configuration via
/sbin/ifconfig -a
and
ip addr show dev ib0
(assuming ib0 is the network interface being configured)

This is because ifconfig can only show the first 16 octets of the HW
address (and the last two bytes are actually wrong, because the
SIOGIFHWADDR ioctl that it uses can only return 14 bytes). IPoIB has
a 20 byte HW address; the four (or six?) bytes that get cut off are
the low-order bytes of the port GID, which is probably where the
difference between port GIDs is.

To see the real IB hardware address, you need to do something like "ip addr show dev ib0". For example,
Code: Select all
    # ifconfig ib0
    ib0       Link encap:UNSPEC  HWaddr
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
              BROADCAST MULTICAST  MTU:2044  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:128
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

    # ip addr show dev ib0
    5: ib0: <BROADCAST,MULTICAST> mtu 2044 qdisc noop qlen 128
        link/[32]
00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:01:07:8c:e4:61 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff


5. Use
ip neigh show dev ib0
to display ARP table for IB interface ib0

转载于:https://www.cnblogs.com/super119/archive/2011/04/16/2017826.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值