计算机网络之网络层


Reference:

  • Computer Networking:A Top-Down Approach,7-th
  • PPT of NEU(lx)

0. Overview of Network Layer
  • The network layer:

  • The network layer can be decomposed into two interacting parts: the data plane and the control plane.
    • Data plane functions
      • The per-router functions in the network layer that determine how a datagram (that is, a network-layer packet) arriving on one of a router’s input links is forwarded to one of that router’s output links.
    • Control plane functions
      • The network-wide logic that controls how a datagram is routed among routers along an end-to-end path from the source host to the destination host.
0.1 Forwording and Routing: The Date and Control Planes
  • Forwarding and Routing

    • Forwording:
      • When a packet arrives at a router’s input link, the router must move the packet to the appropriate output link. ( Forwarding refers to the router-local action of transferring a packet from an input link interface to the appropriate output link interface.)
      • Forwarding is but one function implemented in the data plane.
      • Forwarding takes place at very short timescales (typically a few nanoseconds), and thus is typically implemented in hardware
    • Routing:
      • The network layer must determine the route or path taken by packets as they flow from a sender to a receiver. The algorithms that calculate these paths are referred to as routing algorithms. (Routing refers to the network-wide process that determines the end-to-end paths that packets take from source to destination.)
      • Routing is implemented in the control plane of the network layer.
      • Routing takes place on much longer timescales (typically seconds), and as we will see is often implemented in software.
        *
  • Forwarding table:

    • A router forwards a packet by examining the value of one or more fields in the arriving packet’s header, and then using these header values to index into its forwarding table. The value stored in the forwarding table entry for those values indicates the outgoing link interface at that router to which that packet is to be forwarded.
  • Control Plane: The traditional Approach

    Routing algorithms determine values in forward tables:

  • Control Plane: The SDN Approach

    A remote controller determines and distributes values in forwarding tables:

0.2 Netwok Service Model***
  • Services Provide by Network Layer
    • Guaranteed delivery
    • Guaranteed delivery with bounded delay
    • In-order packet delivery
    • Guaranteed minimal bandwidth
    • Security
1. Virtual circuit and datagram networks (虚电路与数据报网络)
  • Network layer connection and connection-less service

    • Datagram network provides network-layer connectionless service;
    • VC network provides network-layer connection service;
    • Analogous to the transport-layer services, but:
      • Service: host-to-host
      • No choice: network provides one or the other (选择互斥)
      • Implementation: in the core
  • Virtual circuits

    • Intro:

      • call setup, teardown for each call (建立、拆除);
      • each packet carries VC identifier (虚电路标识,not destination host address);
      • every router on source-dest path maintains “state” for each passing connection;
      • link, router resources (bandwidth, buffers) may be allocated to VC (资源分配);
    • Implementation:

      • A VC consists of:

        1.Path from source to destination;

        2.VC numbers, one number for each link along path;

        3.Entries in forwarding tables in routers along path;

      • Packet belonging to VC carries a VC number.

      • VC number must be changed on each link (减少分组头部中虚电路字段的长度; 简化虚电路的建立) :

        • New VC number comes from forwarding table.
    • Forwarding table:

    • Signaling protocol (信令协议):

      • 信令报文:used to setup, maintain, teardown(拆除) VC
      • used in ATM, frame-relay, X.25
      • not used in today’s Internet.

    *

  • Datagram networks

    • Intro:

      • No call setup at network layer
      • Routers: no state about end-to-end connections
        • no network-level concept of “connection”
      • Packets forwarded using destination host address
        • packets between same source-dest pair may take different paths
    • Forwarding table:

      • 4 billion possible enties: Impossible

      • Longest prefix matching

        PrefixLink interface
        11001000 00010111 000100
        11001000 00010111 000110001
        11001000 00010111 000112
        Otherwise3

    *

  • 虚电路与数据报服务的对比:

    对比面虚电路服务数据报服务
    思路可靠通信应当由网络来保证可靠通信应当由用户主机来保证
    连接的建立必须有No need
    终点地址仅在连接建立阶段使用,每个分组使用短的虚电路号每个分组都有终点的完整地址
    分组的转发属于同一条虚电路的分组均按照同一路由进行转发每个分组独立选择路由进行转发
    结点故障所有通过出故障的结点的虚电路均不能工作出故障的结点可能会丢失分组,一些路由可能会发生变化
    分组的顺序总是按发送顺序到达终点到达终点时不一定按发送顺序
    端到端的差错处理和流量控制可以由网络负责,也可以由用户主机负责由用户主机负责
  • 从协议栈的层次上看数据的流动:

2. What’s Inside a Router

A high-level view of a generic router architectur and components

  • Input ports
    • Line termination:
      • The physical layer function of terminating an incoming physical link at a router.
    • Data link processing:
      • The link-layer functions needed to interoperate (互操作) with the link layer at the other side of the incoming link.
    • Lookup, forwarding, queuing.
      • Given datagram dest., lookup output port using forwarding table in input port memory
      • Queuing: if datagrams arrive faster thaan forwarding rate in switch fabric
  • Swiching fabrics (交换结构):
    • The switching fabric connects the router’s input ports to its output ports.
    • This switching fabric is completely contained within the router—a network inside of a network router!
  • Output ports:
    • An output port stores packets received from the switching fabric and transmits these packets on the outgoing link by performing the necessary link-layer and physical-layer functions.
    • Buffering required when datagrams arrive from fabric faster than the transmission rate.
    • Scheduling discipline chooses among queued datagrams for transmission.
  • Routing processor
    • In traditional routers, it executes the routing protocols, maintains routing tables and attached link state information, and computes the forwarding table for the router.
    • In traditional routers, it executes the routing protocols, maintains routing tables and attached link state information, and computes the forwarding table for the router.
2.1 Input Port Processing and Destination-Based Forwarding
  • Input port processing:

  • Destination-Based Forwarding

    The forwarding table in the case of 32-bit IP addresses: Longest prefix matching rule

    PrefixLink interface
    11001000 00010111 000100
    11001000 00010111 000110001
    11001000 00010111 000112
    Otherwise3
3.2.2 Switching
  • The switching fabric is at the very heart of a router, as it is through this fabric that the packets are actually switched (that is, forwarded) from an input port to an output port.

  • Three types of switching fabric:

    • Switching via memory

      • Traditional computers, with switching between input and output ports being done under direct control of the CPU (routing processor).

      • Packet copied to system’s memory.

      • Speed limited by memory bandwidth (2 bus crossings per datagram)

    • Switching via a bus

      • An input port transfers a packet directly to the output port over a shared bus, without intervention by the routing processor.
      • Bus contention: switching speed limited by bus bandwidth.
      • Sufficient for routers that operate in small local area and enterprise networks.
    • Switching via an interconnection network (Crossbar)

      • A crossbar switch is an interconnection network consisting of 2N buses that connect N input ports to N output ports.
      • Overcome the bandwidth limitation of a single, shared bus.
2.3 Output Port Procesing

Output port processing:

2.4 Where Does Queuing Occur ?

The location and extent of queueing (either at the input port queues or the output port queues) will depend on the traffic load, the relative speed of the switching fabric, and the line speed.

Since as these queues grow large, the router’s memory can eventually be exhausted and packet loss will occur when no memory is available to store arriving packets.

  • Input queuing

    Multiple packets can be transferred in parallel, as long as their output ports are different. However, if two packets at the front of two input queues are destined for the same output queue, then one of the packets will be blocked and must wait at the input queue—the switching fabric can transfer only one packet to a given output port at a time.

    • Fabric slower than input ports combined -> queueing may occur at input queues.
    • Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue (like destined for middle output port) from moving forward.
    • Queueing delay and loss due to input buffer.
  • Output queuing

    在这里插入图片描述

    • Buffering when arrival rate via switch exceeds output line speed.
    • Queueing delay and loss due to output port buffer.
2.5 Packet Scheduling***

  • FIFO
  • Priority Queuing
  • Round Robin and Weighted Fair Queuing (WFQ)
3. The Internet Protocol (IP) : IPv4, Addressing, IPv6, and more
  • Internet Protocol (IP)
    • The IP provides routing functions for datagrams traversing the network;
      • Each datagram has source and destination addresses;
      • IP determines if the datagram has reached its destination or if it must be forwarded;
        • If it must be forwarded, IP determines the next hop;
    • IP does not provide a reliability guarantee:
      • No assurance that a packet will reach its specified destination;
    • Responsible for fragmentation of datagrams;
3.1 IPv4 Datagram Format

IPv4 datagram format:

  • Version number: 4 bits

    • Specify the IP protocol version of the datagram.
    • By looking at the version number, the router can determine how to interpret the remainder of the IP datagram.
  • Header length: 4 bits

    • Determine where in the IP datagram the payload (e.g., the transport-layer segment being encapsulated in this datagram) actually begins.

    • Most IP datagrams do not contain options, so the typical IP datagram has a 20-byte

      header.

  • Type of service: 8 bits

    • The type of service (TOS) bits were included in the IPv4 header to allow different types of IP datagrams to be distinguished from each other.
      • For example, it might be useful to distinguish real-time datagrams (such as those used by an IP telephony application) from non-real-time traffic (for example, FTP).
      • The specific level of service to be provided is a policy issue determined and configured by the network administrator for that router.
  • Datagram length: 16 bits

    • The total length of the IP datagram (header plus data), measured in bytes (max size of IP datagram: 65535 bytes, but rarely larger than q500 bytes to fit in the payload field of a maximally sized Ethernet frame).
  • Identifier (16 bits), flags (3 bits), fragmentation offset (13 bits):

    • All of them have to do with so-called IP fragmentation.
  • Time-to-live: 8 bits

    • Ensure that datagrams do not circulate forever (due to, for example, a long-lived routing loop) in the network.
    • This field is decremented by one each time the datagram is processed by a router:
      • If the TTL field reaches 0, a router must drop that datagram.
  • Upper-layer Protocol: 8 bits

    • This field is typically used only when an IP datagram reaches its final destination.
    • The value of this field indicates the specific transport-layer protocol to which the data portion of this IP datagram should be passed.
  • Header checksum: 16 bits

    • Aids a router in detecting bit errors in a received IP datagram.

  • Source and destination IP addresses: both of them are 32 bits

    • When a source creates a datagram, it inserts its IP address into the source IP address field and inserts the address of the ultimate destination into the destination IP address field.

    • Written in Dotted Decimal Notation (点分十进制): 205.150.58.7 (255: 11111111)

      • Each byte of the address is written in its decimal form and is separated by a period

        (dot) from other bytes in the address.

  • Options:

    • E.g. timestamp, record route taken, specify list of routers to visit.
  • Data (payload):

    • In most circumstances, the data field of the IP datagram contains the transport-layer segment (TCP or UDP) to be delivered to the destination. However, the data field can carry other types of data, such as ICMP messages.

Note that an IP datagram has a total of 20 bytes of header (assuming no options). If the datagram carries a TCP segment, then each (non-fragmented) datagram carries a total of 40 bytes of header (20 bytes of IP header plus 20 bytes of TCP header) along with the application-layer message.

3.2 IPv4 Datagram Fragmentation(分片)
  • Maximum transmission unit (MTU):

    • The maximum amount of data that a link-layer frame can carry;
    • Different link types, different MTUs;
  • How to squeeze this oversized IP datagram into the payload field of the link-layer frame?

    • Fragment the payload in the IP datagram into two or more smaller IP datagrams, encapsulate each of these smaller IP datagrams in a separate link-layer frame; and send these frames over the outgoing link.

    • Each of these smaller datagrams is referred to as a fragment.

    • Fragments need to be reassembled before they reach the transport layer at the destination. (只在目的结点重组)

      • Sticking to the principle of keeping the network core simple, the designers of IPv4 decided to put the job of datagram reassembly in the end systems rather than in network routers.

      • How to perform reassembly tasks?

        • When a router needs to fragment a datagram, each resulting datagram (that is, fragment) is stamped with thesource address, destination address, and identification number of the original datagram
          • Typically, the sending host increments the identification number for each datagram it sends.
        • When the destination receives a series of datagrams from the same sending host, it can examine the identification numbers of the datagrams to determine which of the datagrams are actually fragments of the same larger datagram.
        • IP is an unreliable service, one or more of the fragments may never arrive at the destination:
          • In order for the destination host to be absolutely sure it has received the last fragment of the original datagram, the last fragment has a flag bit set to 0, whereas all the other fragments have this flag bit set to 1.
          • Also, in order for the destination host to determine whether a fragment is missing (and also to be able to reassemble the fragments in their proper order), the offset field is used to specify where the fragment fits within the original IP datagram.
      • IP fragmentation and reassembly

        • 4000 byte datagram, MTU =1500 bytes:

          1480/8 = 185

3.3 IPv4 Addressing (IPv4寻址)

IP address classes

  • IP address:
    • 32-bit identifier for host, router interface;
    • An IP address is technically associated with an interface, rather than with the host or router containing that interface.
  • Interface:
    • The boundary (边界) between the host and the physical link;
      • Host typically has one interface
    • The boundary between the router and any one of its links;
      • Router’s typically have multiple interfaces.
  1. IP Address class:

    • 类别位:A: 0; B: 10; …

    *

    • A: 2^24; B: 2^16; C: 2^8;

    • A: 00000000(0)-01111111(127)

    • B: 10000000(128)-10111111(191)

    • C: 11000000(192)-11011111(223)

    • 网络类别最大网络数第一个可用网络号最后一个可用网络号每个网络最大主机数
      A2^7-211262^24-2
      B2^14-1128.1191.2552^16-2
      C2^21-1192.0.1223.255.2552^8-2
    • 在IP地址的主要三种类型里,各保留了三个区域作为私有地址(只用在局域网内部),其地址范围如下:

      • A类地址:10.0.0.0~10.255.255.255
      • B类地址:172.16.0.0~172.31.255.255
      • C类地址:192.168.0.0~192.168.255.255
  2. Assigning IP Address:

  • Since every interface must have a unique IP address, there must be a central authority for assigning numbers.
  • That authority is the Internet Network Information Center, called the InterNIC.
  • The InterNIC assigns only network ids, the assignment of host ids is up to the system administrator.
  1. IP地址的重要特点:

    • IP地址是一种分等级的地址结构:
      • 第一,IP地址管理机构在分配IP地址时只分配网络号,而剩下的主机号则由得到该网络号的单位自行分配(方便IP地址的管理);
      • 第二,路由器仅根据目的主机所连接的网络号来转发分组(而不考虑目的主机号),这样就可以使路由表中的项目数大幅度减少,从而减小了路由表所占的存储空间;
    • 实际上IP地址是标志一个主机(或路由器)和一条链路的接口:
      • 多归属主机(multihomed host):同时连接到两个网络的主机
      • 一个路由器至少应当有两个不同网络号的IP地址:
    • 用转发器或网桥连接起来的若干个局域网仍为一个网络,具有同样的网络号(net-id);
    • 所有分配到网络号(net-id)的网络,不论是范围很小的局域网,还是可能覆盖很大地理范围的广域网,都是平等的;
  2. 特定主机路由:

    • 这种路由是为特定的目的主机指明一个路由;
    • 采用特定主机路由可使网络管理人员能更方便地控制网络和测试网络,同时也可在需要考虑某种安全问题时采用这种特定主机路由;
  3. 默认路由(default route):

    • 采用默认路由以减少路由表所占用的空间和搜索路由表所用的时间;
    • 在一个网络只有很少的对外连接时是很有用的;
    • 如果一个主机连接在一个小网络上,而这个网络只用一个路由器和因特网连接,那么在这种情况下使用默认路由是非常合适的;
  4. 分组转发算法

    (1). 从数据报的首部提取目的主机的IP地址, 得出目的网络地址为N;

    (2). 若网络N与此路由器直接相连,则把数据报直接交付目的主机D;

    ​ 否则是间接交付,执行(3);

    (3). 若路由表中有目的地址为D的特定主机路由,则把数据报传送给路由表中所指明的下一跳 路由器;否则,执行(4);

    (4). 若路由表中有到达网络N的路由,则把数据报传送给路由表指明的下一跳路由器;

    ​ 否则,执行(5);

    (5) 若路由表中有一个默认路由,则把数据报传送给路由表中所指明的默认路由器;

    ​ 否则,执行(6);

    (6) 报告转发分组出错;


IP Subnet Addressing:

Each interface on every host and router in the global Internet must have an IP address that is globally unique (except for interfaces behind NATs). A portion of an interface’s IP address will be determined by the subnet to which it is connected( 接口的IP地址的一部分将由它连接到的子网确定).

Locally IP addresses consist of three parts:

  • Network ID
  • Subnet ID
  • Host ID

Outside of the subnetted network the addresses are handled normally;

Inside the subnet, the network portion of the address is extended for local routing purpose;

Interface addresses and subsets:

  • In IP terms, this network interconnecting host interfaces and one router interface forms a subnet.
    • IP addressing assigns an address to upper-left subnet: 223.1.1.0/24, where the /24 (“slash-24”) notation, sometimes known as a subnet mask, indicates that the leftmost 24 bits of the 32-bit quantity define the subnet address.

Subnet definition

To determine the subnets, detach each interface from its host or router, creating islands of isolated networks, with interfaces terminating the end points of the isolated networks. Each of these isolated networks is called a subnet.

(1) Subsets and Subnet Masks:

  • Once the decision to subnet has been made, the local administrator must decide how many bits to allocate to the subnet ID;

  • A common division is to use the 8-bit boundary in the 16 bits of a host ID in a class B address(class B);

  • A subnet mask is used to divide the local address into network and host portions;

  • Subnetting effectively hides the details of the internal network to external routers;

    (IP Address) AND (Subnet mask) = Net Address:

    在这里插入图片描述

    Default Subnet Mask:

  • 子网掩码是一个网络或一个子网的重要属性:

    • 路由器在和相邻路由器交换路由信息时,必须把自己所在网络(或子网)的子网掩码告诉相邻路由器;
    • 路由器的路由表中的每一个项目,除了要给出目的网络地址外,还必须同时给出该网络的子网掩码;
    • 若一个路由器连接在两个子网上就拥有两个网络地址和两个子网掩码;
  • Example:

    • 已知 IP 地址是 141.14.72.24,子网掩码是 255.255.192.0。试求网络地址:

    • 若子网掩码改为255.255.224.0。试求网络地址:

    • 不同的子网掩码得出相同的网络地址,但不同的掩码的效果是不同的

(2) Subnetting 子网划分

  • Subnet Mask:

    • All bits that correspond to the network ID are set to 1;
    • All bits that correspond to the host ID are set to 0;
    • Frequently expressed in dotted decimal notation;
    • Shorthand way of expressing a subnet mask is to denote the number of bits that define the network ID as a network prefix using the network prefix notation: /<# of bits>;
  • Four Subnetting Steps:

    • To correctly subnet a given network address into subnet addresses, ask yourself the following questions:

      • How many bits do I need to borrow?

        • First, you need to know how many bits you have to work with;

        • Second, you must know either how many subnets you need or how many hosts per subnet you need;

        • Finally, you need to figure out the number of bits to borrow;

          Remember: you must borrow at least 2 bits for subnets and leave at least 2 bits for host addresses(2 bits borrowed allows 2^2 - 2 = 2 subnets, subtract two to provide for the subnetwork and broadcast addresses);

        A simple formula:

        ​ Host Bits = Bits Borrowed + Bits Left(HB = BB + BL)

        If need x hosts, then:

        2 B B − 2 > x 2^{BB}-2>x 2BB2>x

    • What’s the subnet mask?

      • Determine the subnet mask by adding up the decimal value of the bits we borrowed;

        11111111.11111111.11111111.111(BB)00000-------255.255.255.224

    • What’s the “magic number” or multiplier?幻数

      • Subtract the last non-zero octet from 256;

        256-224 = 32

      • Last non-zero octet:最后非0八位组

        • Quickly calculate the last non-zero octet when given the number of bits borrowed;
        • Determine the number of bits borrowed given the last non-zero octet.
        • Determine the amount of bits left over for hosts and the number of host addresses available;
    • What are the first three subnetwork addresses?

  • 子网划分下的分组转发


    ​ 假设我们有一个网络:192.168.0.0/24,我们现在需要两个子网,那么使用/26而不是/25,得到两个可以使用的子网192.168.0.64和192.168.0.128

    对于192.168.0.0/24,

    网络地址是192.168.0.0,广播地址是***192.168.0.255***

    对于192.168.0.0/26,

    网络地址是192.168.0.0,广播地址是192.168.0.63 (网段浪费) x

    对于192.168.0.64/26,

    网络地址是192.168.0.64,广播地址是192.168.0.127

    对于192.168.0.128/26,

    网络地址是192.168.0.128,广播地址是192.168.0.191

    对于192.168.0.192/26,

    网络地址是192.168.0.192,广播地址是***192.168.0.255*** x


Slowing IP Address Depletion

  • Variable Length Subnet Masks (VLSM, 可变长度子网掩码)

子网再划分

  • Classless Interdomain Routing (CIDR, 无类别域间路由)-The Internet’s address assignment strategy
  • 地址的子网部分是任意长度的;

  • Format:a.b.c.d/x , where x is # bits in subnet portion of address

    • The x most significant bits of an address of the form a.b.c.d/x constitute the network portion of the IP address, and are often referred to as the prefix (or network prefix) of the address.
    • The remaining 32-x bits of an address can be thought of as distinguishing among the devices within the organization, all of which have the same network prefix
    • Other format:
      • 10.0.0.0/10 ------ 10/10 把点分十进制低位连续的0省略
      • 00001010 00* 网络前缀
  • Before CIDR was adopted, the network portions of an IP address were constrained to be 8, 16, or 24 bits in length, an addressing scheme known as classful addressing.

  • Address aggregation 地址聚合

    • Route aggregation(路由聚合)\Route summarization\Router supernetting(路由超网).

    • This ability to use a single prefix to advertise (通告) multiple networks.

    • Hierarchical addressing and route aggregation

      • 其他路由器路由转发分组时,使用最长前缀匹配法则,找到正确路由;

  • Obtaining a Host Address: The Dynamic Host Configuration Protocol 动态主机配置协议

Obtaining Addresses:

  • How does an ISP Obtains a Block of Addresses

IP addresses are managed under the authority of the Internet Corporation for Assigned Names and Numbers (ICANN).

The role of the nonprofit ICANN organization is not only to allocate IP addresses, but also to manage the DNS (域名系统) root servers. It also has the very contentious (有争论的) job of assigning domain names and resolving domain name disputes.

  • The Dynamic Host Configuration Protocol
  • A system administrator will typically manually configure the IP addresses into the router (Hard-coded by system admin in a file). Host addresses can also be configured manually, but typically this is done using the Dynamic Host Configuration Protocol (DHCP).

  • A network administrator can configure DHCP so that a given host receives the same IP address each time it connects to the network, or a host may be assigned a temporary IP address that will be different each time the host connects to the network.

  • Plug-and-play or zeroconf (zero-configuration) protocol:

    • Why? DHCP’s ability to automate the network-related aspects of connecting a host into a network.
  • DHCP is a client-server protocol:

3.4 Network Address Translation (NAT)
  • A private network\A realm with private addresses
    • A realm with private addresses refers to a network whose addresses only have meaning to devices within that network.
  • Motivation: Local network uses just one IP address as far as outside world is concerned;
    • Range of addresses not needed from ISP: just one IP address for all devices;
    • Can change addresses of devices in local network without notifying outside world;
    • Can change ISP without changing addresses of devices in local network;
    • Devices inside local net not explicitly addressable, visible by outside world (a security plus);
  • NAT Implementation: NAT router must
    • Outgoing datagrams: Replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #). . . remote clients/servers will respond using (NAT IP address, new port #) as destination addr.
    • Remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair;
    • Incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table;
  • Network address translation:
    • 16-bit port-number (端口号) field:

      • 2 16 2^{16} 216 connections with a single LAN-side address;
3.5 IPv6
  • 128-bits number, written in Hex Decimal Notation(十六进制数)
    • 2001:0503:0C27:0000:0000:0000:0000:0000
4. ICMP: The Internet Control Message Protocol
  • Used by hosts & routers to communicate network-level information;

    • Error reporting: unreachable host, network, port, protocol;
    • Echo request/reply (used by ping): 请求和应答的回显;
      • Ping:
        • 主要功能:用来测试两个主机之间的连通性;
        • 使用ICMP回送请求与回送回答报文;
        • PING是应用层直接使用网络层ICMP的例子,并未通过传输层的TCP或UDP;
  • Network-layer “above” IP:

    • ICMP messages carried in IP datagrams;
  • IMCP message: Type, Code, plus first 8 bytes of IP datagram causing error;

  • ICMP 报文格式:

*

  • Traceroute and ICMP:

    • Source sends series of UDP segments to dest:
      • First has TTL =1
      • Second has TTL=2, etc.
      • Unlikely port number
    • When nth datagram arrives to nth router
      • Router discards(丢弃) datagram and sends to source an ICMP message (type 11, code 0, Message includes name of router& IP address)
    • When ICMP message arrives, source calculates RTT;
    • Traceroute does this 3 times for each TTL to compute avg;

    Stopping criterion:

    • UDP segment eventually arrives at destination host;
    • Destination returns ICMP “host unreachable” packet(type 3, code 3);
    • When source gets this ICMP, stops;
5. Routing algorithms

Routing Algorithm classification:

  • Global or decentralized information?
    • Global:
      • All routers have complete topology , link cost info (完整的拓扑与链路代价信息);
      • “link state” alogorithms (“链路状态”算法);
    • Decentralized:
      • Router knows physically-connented neighbors, link costs to neighbors;
      • Iterative (重复的) process of computation computation, exchange of info with neighbors;
      • “distance vector” algorithms (“距离矢量”算法);
  • Static or dynamic?
    • Static:
      • Routes change slowly over time;
    • Dynamic:
      • Routes change more quickly;
        • Periodic update; 时间触发
        • In response to link cost changes; 事件触发
  • Link state:

    • Dijkstra’s algorithm

      • Net topology, link costs known to all nodes
        • Accomplished via “link state broadcast”;
        • All nodes have same info;
      • Computes least cost paths from one node (‘source”) to all other nodes
        • Gives forwarding table for that node;
      • Iterative: after k iterations, know least cost path to k dest.’s (k跳);
      • Notation:
        • c(x,y): link cost from node x to y; = ∞ if not direct neighbors
        • D(v): current value of cost of path from source to dest. v
        • p(v): predecessor node along path from source to v
        • N’: set of nodes whose least cost path definitively known

      1 Initialization:

      2 N’ = {u}

      3 for all nodes v

      4 if v adjacent to u

      5 then D(v) = c(u,v)

      6 else D(v) = ∞

      7

      8 Loop

      9 find w not in N’ such that D(w) is a minimum

      10 add w to N’

      11 update D(v) for all v adjacent to w and not in N’ :

      12 D(v) = min( D(v), D(w) + c(w,v) )

      13 /* new cost to v is either old cost to v or known

      ​ shortest path cost to w plus cost from w to v */

      14 until all nodes in N’

    • Discussion:

      • Algotithm complextity: n nodes O ( n 2 ) O(n^2) O(n2)
      • Oscillations(震荡) posible:
        • e.g., link cost = amount of carried traffic;
          *
  • Ditance Vector

    • Bellman-Ford Equation (dynamic programming)** (贝尔曼福特方程)

      • Define:
        • d x ( y ) d_x(y) dx(y): = cost of leaast-cost path from x to y
        • Then, for all v (neighbor of x), d x ( y ) = m i n { c ( x , v ) + d v ( y ) } d_x(y)=min\{c(x,v) + d_v(y)\} dx(y)=min{c(x,v)+dv(y)}
    • Distanc Vector Algorithm:

      • D x ( y ) D_x(y) Dx(y): Estimate of least cost from x to y;
      • Distance vector : D x = [ D x ( y ) : y ∈ N ] D_x=[D_x(y):y\in N] Dx=[Dx(y):yN];
      • Node x knows cost to each neighbor v: c(x,v);
      • Node x maintains D x = [ D x ( y ) : y ∈ N ] D_x=[D_x(y):y\in N] Dx=[Dx(y):yN];
      • Node x also maintains its neighbors’ distance vectors:
        • For each neighbor v, x maintains D v ( y ) = [ D v ( y ) : y ∈ N ] D_v(y)=[D_v(y):y\in N] Dv(y)=[Dv(y):yN];
    • Basic idea:

      • Each node periodically sends its own distance vector estimate to neighbors;

      • When a node x receives new DV estimate from neighbor, it updates its own DV using B-F equation:

        D x ( y ) = m i n { c ( x , v ) + d v ( y ) }    f o r   e a c h   n o d e D_x(y)=min\{c(x,v) + d_v(y)\} \space\space for\space each\space node Dx(y)=min{c(x,v)+dv(y)}  for each node

      • Under minor, natural conditions, the estimate D x ( y ) D_x(y) Dx(y) converge to the actual least cost d x ( y ) d_x(y) dx(y).

    Iterative, asynchronous: each local iteration caused by:

    • local link cost change;

    • DV update message from neighbor;

    Distributed:

    • Each node notifies neighbors only when its DV changes;
      • Neighbors then notify their neighbors if necessary;

    • Link cost changes:

      • Node detects local link cost change;

      • Updates routing info, recalculates distance vector;

      • If DV changes, notify neighbors;

        Good news (smaller link cost) travels fast;

        Bad news travels slow-“count to infinity” problem;

        See textbook for details

        Poissoned reverse (毒性逆转)


    Comparison of LS and DV algorithms: 略


  • Hierarchical Routing

    • Aggregate routers into regions, “autonomous systems” (AS)
    • Routers in same AS run same routing protocol;
      • “intra-AS” routing protocol;
      • routers in different AS can run different intra-AS routing protocol;
    • Gateway router (网关路由器)
      • Direct link to router in another AS
    • Interconnected ASes:
      • Forwarding table is configured by both intra- and inter-AS routing algorithm (自治系统内与自治系统间路由路由算法)
        • Intra-AS sets entries for internal dests;
        • Inter-AS & Intra-As sets entries for external dests;
    • Inter-AS tasks:
      • learn routers(in other ASes)’ reachability info;
      • propagate routers(in other ASes)’ reachability info to all routers in self-AS;
      • choosing among multiple ASes;
        • Hot potato routing: choose the gateway that has the smallest least cost;
6. Intra-AS Routing in the Internet

Intra-AS Routing also known as Interior Gateway Protocols (IGP)

Most common Intra-AS routig protocols:

  • RIP: Routing Information Protocol;

  • OSPF: Open Shortest Path First;

  • IGPR: Interior Gateway Routing Protocol (Cisco proprietary);

  • RIP:

    • Distance vector algorithm;

    • Included in BSD-UNIX Distribution in 1982;

    • Distance metric: # of hops (max = 15 hops);

    • RIP advertisements (RIP通告)

      • Distance vectors: exchanged among neighbors every 30 sec via Response Message (响应信息,also called advertisement);
      • Each advertisement: list of up to 25 destination nets within AS;
    • Update:

      收到相邻路由器(其地址为 X)的一个 RIP 报文:

      (1) 先修改此 RIP 报文中的所有项目:把“下一跳”字段中的地址都改为 X,并把所有的“距离”字段的值加 1;

      (2) 对修改后的 RIP 报文中的每一个项目,重复以下步骤:

      若项目中的目的网络不在路由表中,则把该项目加到路由表中;
      否则(项目中的目的网络在路由表中)
      若下一跳字段给出的路由器地址是相同的,则把收到的项目替换原路由表中的项目;
      否则(下一跳字段给出的路由器地址是不同的)
      	  若收到项目中的距离小于路由表中的距离,则进行更新,
      	  否则,什么也不做。
      

      (3) 若 3 分钟还没有收到相邻路由器的更新路由表,则把此相邻路由器记为不可达路由器,即将距离置为16(距离为16表示不可达);

      (4) 返回;

      Example:

      *

      *

    • Link Failure and Recovery:

      If no advertisement heard after 180 sec --> neighbor/link declared dead:

      • routes via neighbor invalidated;

      • new advertisements sent to neighbors;

      • neighbors in turn send out new advertisements (if tables changed);

      • link failure info quickly propagates to entire net;

      • poison reverse used to prevent ping-pong loops (infinite distance = 16 hops);

        使用毒性逆转技术防止乒乓回路

    • RIP Table processing

      • RIP routing tables managed by application-level process called route-d (daemon);

      • Advertisements sent in UDP packets, periodically repeated;

    • RIP2协议的报文格式:

      • RIP2报文中的路由部分由若干个路由信息组成,每个路由信息需要用20个字节;
      • 地址族标识符(又称为地址类别)字段用来标志所使用的地址协议);
      • 路由标记填入自治系统的号码(AS Number, ASN),这是考虑使RIP有可能收到本自治系统以外的路由选择信息;再后面指出某个网络地址、该网络的子网掩码、下一跳路由器地址以及到此网络的距离。
  • OSPF

    • “open”: publicly available (公众可用)

    • Uses Link State algorithm

      • LS packet dissemination (分发);
      • Topology map at each node;
      • Route computation using Dijkstra’s algorithm;
    • OSPF advertisement carries one entry per neighbor router;

    • Advertisements disseminated to entire AS (via flooding, 洪泛);

      • Carried in OSPF messages directly over IP (rather than TCP or UDP);

      • OSPF使用的是可靠的洪泛法:

    • OSPF “advanced” features (not in RIP):

      • Security: all OSPF messages authenticated (to prevent malicious intrusion (恶意入侵));

      • Multiple same-cost paths allowed (only one path in RIP);

      • For each link, multiple cost metrics for different TOS (Type of service)

        • e.g., satellite link cost set “low” for best effort, high for real time;
      • Integrated uni- and multicast support (综合单波和多波支持):

        • Multicast OSPF (MOSPF) uses same topology data base as OSPF;
      • Hierarchical OSPF in large domains:

        • Two-level hierarchy: local area, backbone(骨干):
          • Link-state advertisements only in area;
          • Each nodes has detailed area topology, only know direction (shortest path) to nets in other areas;
        • Area border routers (区域边界路由器):
          • “summarize” distances to nets in own area, advertise to other Area Border routers;
        • Backbone routers (骨干路由器): run OSPF routing limited to backbone;
        • Boundary routers (边界路由器): connect to other AS’s;
7. Inter-AS Routing in the Internet: BGP
  • BGP (Border Gateway Protocol, 边界网关路由协议): the defacto standard;

  • BGP provides each AS a means to:

    1. Obtain subnet reachability information from neighboring ASs;
    2. Propagate the reachability information to all routers internal to the AS;
    3. Determine “good” routes to subnets based on reachability information and policy;
  • Allows a subnet to advertise its existence to rest of the Internet: “I am here”;

  • BGP basics:

    • Pairs of routers (BGP peers) exchange routing info over semi-permanent (半永久的) TCP connections: BGP sessions (BGP会话, eBGP sessions\iBGP sessions);
    • Note that BGP sessions do not correspond to physical links;
    • AS can aggreate prefixes in its advertisement;
  • BGP messages

    • Exchanged using TCP;
    • Type:
      • OPEN: opens TCP connection to peer and authenticates sender;
      • UPDATE: advertises new path (or withdraws old);
      • KEEPALIVE: keeps connection alive in absence of UPDATES; also ACKs OPEN request;
      • NOTIFICATION: reports errors in previous msg; also used to close connection;
  • Routing policy (路由策略)

    • A,B,C are provider networks;

    • X,W,Y are customer (of provider networks);

    • X is dual-homed: attached to two networks:

      • X does not want to route from B via X to C;
      • … so X will not advertise to B a route to C;
    • A advertises to B the path AW;

    • B advertises to X the path BAW;

    • Should B advertise to C the path BAW?

      • No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers;
      • B wants to force C to route to w via A;
      • B wants to route only to/from its customers!

Intra-AS: can focus on performance;

Inter-AS: policy may dominate (支配) over performance;


8. Broadcast and multicast routing
  • IP多播的基本概念:

    • 多播可以明显地减少网络中资源的消耗;
  • IP多播的特点:

    • 多播使用组地址——IP使用D类地址支持多播:
      • 多播地址只能用于目的地址,而不能用于源地址;
    • 永久组地址——由因特网号码指派管理局IANA负责指派;
    • 动态的组成员;(一个主机可以属于多个多播组)
    • 使用硬件进行多播;
  • IP多播需要两种协议:

    • 为了使路由器知道多播组成员的信息,需要利用网际组管理协议IGMP(Internet Group Management Protocol);
      • IGMP 的本地使用范围:
        • IGMP并非在因特网范围内对所有多播组成员进行管理的协议;
        • IGMP不知道IP多播组包含的成员数,也不知道这些成员都分布在哪些网络上;
        • IGMP协议是让连接在本地局域网上的多播路由器知道本局域网上是否有主机(严格讲,是主机上的某个进程)参加或退出了某个多播组;
    • 连接在局域网上的多播路由器还必须和因特网上的其他多播路由器协同工作,以便把多播数据报用最小代价传送给所有的组成员,这就需要使用多播路由选择协议;

End of Chapter 3

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值