Linux 邻居子系统介绍

一 Linux网络邻居子系统简介

 

1.1邻居子系统

  • MAC地址唯一标识一台主机,当系统要发送数据到其他主机时,必须事先知道它的MAC地址, 并且上层应用协议并不关心MAC地址,然而在数据链接层,必须要获取发送方和接收方的MAC地址,这样数据才能正确到达接收方。邻居子系统的作用就是把IP地址转换成对应的MAC地址. 如果目的主机不是和发送发位于同一局域网时,解析的MAC地址就是下一跳网关地址

 

二 邻居子系统协议和指令

Linux 系统中,通过一定的封装和抽象,实现了邻居协议基础框架层,具体的邻居协议定义各自的参数,然后注册到基础框架层,并使用基础层提供的API.Linux目前支持的邻居协议有ARP(ipv4), NDISC(ipv6),decnet,但是邻居协议只有简单的两条指令ARP_REQUEST和ARP_REPLY

 

2.1邻居通用基础结构功能

  为每个协议缓存L3到L2的地址

  提供缓存的添加,删除,改变和查找

  为每个协议缓存的数据项提供老化机制

  为每个邻居提供一个请求队列

 

2.2 源码和数据结构

net/core/neighbour.c :实现所有基础框架功能

struct neigh_table : 表示一个具体的邻居协议,通过neigh_table_init函数注册到neighbour

struct neigh_parms:  邻居协议参数,包含探测次数,探测间隔,垃圾回收时间,可到达性存活时间等

struct neighbour  :表示一个具体的邻居,包含邻居的MAC地址,邻居状态,确认时间,output函数,数据缓冲队列

 

三 邻居子系统状态机

neighbour共有六种状态

#define NUD_INCOMPLETE    0x01 //正在进行邻居项MAC地址探测,但还没有收到应答

#define NUD_REACHABLE       0x02 //邻居项是可到达的

#define NUD_STALE         0x04   //长时间没有使用邻居项,

#define NUD_DELAY        0x08  //延时探测

#define NUD_PROBE       0x10 //探测状态

#define NUD_FAILED       0x20 //探测失败

从这六种状态,可以衍生出NUD_IN_TIMER,NUD_VALID,NUD_CONNECTE新状态

NUD_IN_TIMER       (NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE)

 NUD_VALID(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)

 NUD_CONNECTED       (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE)

 

 

状态的实现函数为neigh_update,不仅会更新neighbour状态,还有处理和neighbour相关的数据缓存

 

四 邻居子系统定时器

邻居子系统使用了几个定时器,有些定时器是全局的,也有些是为每个邻居协议单独创建.

4.1 状态转移定时器

处理函数为neigh_timer_handler,负责neighbour的状态转换,以及根据当前状态进行MAC地址探测

static void neigh_timer_handler(unsigned long arg)

{

        

         struct neighbour *neigh = (struct neighbour *)arg;

         write_lock(&neigh->lock);

         state = neigh->nud_state;

         now = jiffies;

         next = now + HZ;

 

         /*如果不是处理in timer状态,则直接返回*/

         if (!(state & NUD_IN_TIMER))

                   goto out;

        /*依次对三种状态下的neighbour进行处理*/

         if (state & NUD_REACHABLE) {

       /*如果上次确认时间加上neigh存活时间比now靠后,这就说明此neigh没有过期 */

                   if (time_before_eq(now,

                                        neigh->confirmed + neigh->parms->reachable_time)) {

                            neigh_dbg(2, "neigh %p is still alive\n", neigh);

                            next = neigh->confirmed + neigh->parms->reachable_time;

 

                   } else if (time_before_eq(now,/*now没有超过上一次使用的时间加上probe delay时间,则进入delay状态 */

                                                 neigh->used +

                                                 NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {

                            neigh_dbg(2, "neigh %p is delayed\n", neigh);

                            neigh->nud_state = NUD_DELAY;/*为什么会从reachable到delay了?这是为了延时发送ARP REQUST数据包 */

                            neigh->updated = jiffies;

                            neigh_suspect(neigh);

                            next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME);

                   } else {

                            neigh_dbg(2, "neigh %p is suspected\n", neigh);

                            neigh->nud_state = NUD_STALE;//进入STALE状态,表示一段时间没有使用了

                            neigh->updated = jiffies;

                            neigh_suspect(neigh);

                            notify = 1;

                   }

         } else if (state & NUD_DELAY) {

                   if (time_before_eq(now,

                                        neigh->confirmed +

                                        NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {

                            neigh_dbg(2, "neigh %p is now reachable\n", neigh);

                            neigh->nud_state = NUD_REACHABLE;/*在delay状态,收到了可到达性确认 */

                            neigh->updated = jiffies;

                            neigh_connect(neigh);

                            notify = 1;

                            next = neigh->confirmed + neigh->parms->reachable_time;

                   } else {

                            neigh_dbg(2, "neigh %p is probed\n", neigh);

                            neigh->nud_state = NUD_PROBE;/*在delay的时间段内,没有收到confirm,则需  要发送ARP REQUST数据包 */

                            neigh->updated = jiffies;

                            atomic_set(&neigh->probes, 0);

                            notify = 1;

                            next = now + NEIGH_VAR(neigh->parms, RETRANS_TIME);

                   }

         } else {

                   /* NUD_PROBE|NUD_INCOMPLETE */

                   next = now + NEIGH_VAR(neigh->parms, RETRANS_TIME);

         }

                   if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {

                   neigh_probe(neigh);/*发送ARP REQUST数据包 */

         } else {

out:

                   write_unlock(&neigh->lock);

         }

         if (notify)

                   neigh_update_notify(neigh);

         neigh_release(neigh);

}

4.2 垃圾回收定时器

这是一个周期性的定时器,确保内存不会由于邻居项过多而导致占用过多内存的问题

处理函数为: neigh_periodic_work,垃圾回收有异步回收和同步回收两种.

 

 

五 ARP协议

 

5.1 ARP协议数据格式

硬件类型:以太网/令牌环等等

协议类型:ipv4,ipv6等等

硬件地址长度:MAC地址长度

协议地址长度:IP地址长度

操作码: REQUST和REPLY两种指令, 后面四个字段表示发送方和接收方的MAC和IP地址

 

5.2 ARP选项

通过arp选择,可以选择处理IP地址和过滤某些ARP包

arp_announce: 发送arp request时,控制怎么去选择源IP地址

 0 : 任何本地IP都可以

 1 : 如果可以,尽量选择与目的IP为同一子网的地址

 2 : 优先使用主地址

 

 arp_ignore

 这个选项控制判断是否处理arp request的条件

  0 :对任何本地地址ARP请求都应答

  1 :如果目的IP地址配置在接收ARP请求的接口上,才应答

  2 :在满足1的前提下,源IP和目的IP属于同一子网

  3 :如果目的IP的scope不是本地主机,才应答

  4-7 :保留.

  8 不应答

  > 未知值,接收请求

 

5.3 ARP处理函数

arp_rcv函数负责处理接收到的ARP_REQUST和ARP_REPLY指令,并作出相应的应答

if (arp->ar_op == htons(ARPOP_REQUEST) && //收到ARP_REQUEST指令

             ip_route_input_noref(skb, tip, sip, 0, dev) == 0) {//必须知道如何到达请求方

                   rt = skb_rtable(skb);

                   addr_type = rt->rt_type;

                   if (addr_type == RTN_LOCAL) {

                            int dont_send;

          /*判断ARP_IGNORE参数设置 */

                            dont_send = arp_ignore(in_dev, sip, tip);

                            if (!dont_send && IN_DEV_ARPFILTER(in_dev))

                                     dont_send = arp_filter(sip, tip, dev);

                            if (!dont_send) {

              /*这里是被动更新neighbour缓存 */

                                     n = neigh_event_ns(&arp_tbl, sha, &sip, dev);

                                     if (n) {/*发送ARP_REPLY应答 */

                                               arp_send(ARPOP_REPLY, ETH_P_ARP, sip,

                                                         dev, tip, sha, dev->dev_addr,

                                                         sha);

                                               neigh_release(n);

                                     }

                            }

}

 

/* 更新ARP tables */

         n = __neigh_lookup(&arp_tbl, &sip, dev, 0);

         if (n) {

                   int state = NUD_REACHABLE;

                   int override;

                   /* If several different ARP replies follows back-to-back,

                      use the FIRST one. It is possible, if several proxy

                      agents are active. Taking the first reply prevents

                      arp trashing and chooses the fastest router.

                    */

                   override = time_after(jiffies,

                                           n->updated +

                                           NEIGH_VAR(n->parms, LOCKTIME)) ||

                               is_garp;

 

                   /* Broadcast replies and request packets

                      do not assert neighbour reachability.

                    */

                   if (arp->ar_op != htons(ARPOP_REPLY) ||

                       skb->pkt_type != PACKET_HOST)

                            state = NUD_STALE;

               /*更新neighbour状态 */

                   neigh_update(n, sha, state,

                                 override ? NEIGH_UPDATE_F_OVERRIDE : 0);

                   neigh_release(n);

         }

  • 0
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值