4. Network Layer

Data plane :
local, per-router function (单个路由器上的活动) determines how datagram arriving on router input port is forwarded to output port (关注从路由器输入端口到路由器输出端口的过程)
forwarding function Control plane :
network-wide logic(关系路由器间的活动)
determines how datagram is routed from source to dest (关注路由器转发时端口的选择)
two approaches:implemented in routers , SDN(software-defined networking )
网络性能主要指标:时延、丢包率、带宽、可靠性
网络服务模型:ATM(异步转换模式)模型、Internet模型

4.1 Router Organization(基于TCAM)

Input port:

line termination(线路端接,物理层) :

  • bit-level reception

link layer protocol (receive)(数据链路处理,链路层):

  • e.g. Ethernet

look up, forwarding, queuing(查找、转发、排队) :

  • destination-based forwarding(基于目的地转发): based only on destination IP address (traditional)
  • generalized forwarding(泛化转发/通用转发): based on any set of header field values ( Longest prefix matching (最长前缀匹配,根据前缀特征选择输出端口,节省工作量)),using TCAM(Tenary Content Address Memory,三态内容可寻址存储器)

when looking for forwarding table entry for given dstination address, use longest address prefix that matches destination address.
采用最大长度匹配而不是固定长度匹配的原因:尽可能的一步到位,提升速度但又降低一定的复杂度

switch fabric(交换结构):

  • Switching via memory:speed limited by memory bandwidth (2 bus crossings per datagram)

  • Switching via a bus:switching speed limited by bus bandwidth

  • Switching via interconnection network:overcome bus bandwidth

input port queuing:

  • many datagram are transfered to the same output port at the same time(同时交换的速度大于从输出端口出去的速度)
  • many datagram are send to the same input port(同时交换的速度小于从输入端口进接受的速度,队首阻塞)

Output port:

结构基本类似,但比起输入端口,输出端口的缓冲区不用进行转发操作
buffering required when datagrams arrive from fabric faster than the transmission rate(缓存的意义是短时间内交换速度大于输出速度时数据报不会丢失,但累积超出缓冲区时仍会丢失)

buffering size :

  • RFC 3439:
  • recent reconmmendation:buffering equal to
with N flows, C link capacity

output port quequing scheduling mechanisms(输出排队调度) :

  • FIFO(先进先出) scheduling : tail drop or priority drop or random drop

  • priority(优先权排队,即优先级高的先润) scheduling

  • Round Robin (RR)(分组循环排队,每个组轮流发一) scheduling :

  • Weighted Fair Queuing (WFQ)(分组加权排队,轮流发的数量之比等于权重之比)

4.2 IP(Internet Protocol)

网络中除了数据信息还跑了很多控制信息,以及重发报文,控制报文的寿命不能超过15跳
接受输出匹配速度一致,否则阻塞丢包
IP报文格式+传输要求(规范)+寻址方式+ICMP

IPv4

IP packet format:

Packet:

一个IP报文的开销:20 bytes TCP首部+20 bytes IP首部+应用层首部
TTL(Time to live):
  • max number remaining hops(最大跳跃数量)
  • decremented at each router(TTL值经过一个节点减一)
  • TTL=0 : discard
  • can be used to count hops to destination

IP Fragmentation & Re-assemble(IP分片和重组):

MTU(Maximum Transmission Unit)(网络上传输的最大数据包), different link types, different MTUs
常用链路的MTU值:以太网:1500 Bytes、P2P:4470 Bytes
理论最大值:65535 Bytes
理论最小值:68 Bytes
IP分片和重组的目的就是为了兼容不容链路类型的MTU
  • large IP datagram divided (fragmented) into several datagrams within net
  • re-assembled only at final destination
  • IP header bits used to identify, order related fragments

IP头部中的flag用与标志该分片报文是否是尾部分片报文(最后一片),flag = 1时不是尾部分片报文,flag = 0时是尾部分片报文
IP头部中的offset用于指示该分片报文数据在完整数据中的起始位置,计算方法为(length字段的值-20) / 8

addressing:

点分十进制表示
address allocation
  • IANA:Internet Assigned Numbers Authority(互联网号码分配机构)
  • ICCAN:Internet Corporation for Assigned Names and Numbers
  • ASO (Address Supporting Organization)
classful addressing (有类寻址)

网络号和主机号全0全1都留做他用
网络号全0:
网络号全1:
主机号全0:本地地址
主机号全1:广播地址
Special IP Address

network mask(子网掩码,用于提取网络号)

IP address & network mask => network ID

interface
  • router’s typically have multiple interfaces
  • host typically has one interface
  • IP addresses associated with each interface

CIDR ( Classless Inter Domain Routing )(无类寻址)
解决A-E分类粒度太粗糙
subnets
IP address can divide into subnet part (high order bits) and host part (low order bits)
  • device interfaces with same subnet part of IP address
  • can physically reach each other without intervening router

DHCP(Dynamic Host Configuration Protocol)
plug-and-play(即插即用),基于UDP传输
function

allow host to dynamically obtain its IP address from network server when it joins network

steps(注意过程中使用的地址)
  • Host broadcasts “DHCP discover” msg(DHCP在不在)
  • DHCP server responds with “DHCP offer” msg(DHCP在的,并给了你一个可用address)
  • host requests IP address: “DHCP request” msg(我确认用这个address)
  • DHCP server sends address: “DHCP ack” msg(收到)

NAT(Network Address Translation)(网络地址转换)
goal

local network uses just one IP address as far as outside world is concerned(子网在外共用一个IP地址,子网内部用特殊的IP地址,进出子网时进行转换)

10.0.0.0 ----- 10.255.255.255
172.0.0.0 ----- 172.255.255.255
192.168.0.0 ----- 192.168.255.255
implement
  • outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)(发送报文至外网,将作为发送端信息的内部IP地址和端口号转换为NAT的IP地址和端口号)
  • incoming datagrams: replace (NAT IP address, new port #) in destination fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table(从外网接受报文,将作为接收端信息的NAT的IP地址和端口号转换为内部IP地址和端口号)
  • remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair(存储NAT的IP地址的端口号和子网内部IP地址和端口号的表格)
本质上是使用端口号模拟了子网主机IP和端口号

controversial(争议)
  • routers should only process up to layer 3(路由器应该只负责网络层功能,职责暧昧了)
  • violates end-to-end argument(违背了端到端通信原则,使用了NAT作为中继,例如使得P2P应用设计者不得不考虑NAT)
  • address shortage should instead be solved by IPv6(IP地址不足的问题IPv6完全可以解决)
  • 端口号应当用于进程寻址而不是主机
problem & its solution
  • client want to connect to server with address 10.0.0.1(外部主机需要对内部子网中的某台特定主机建立连接,但是内部子网中的所有主机都共用了一个IP地址)
  • statically configure NAT to forward incoming connection requests at given port to server(静态NAT转换表,即子网内某主机对应的NAT的IP地址的端口号固定)

ICMP(Internet Control Message Protocol)

function

used by hosts & routers to communicate network-level information(返回差错情况)

Error Report (差错报告)
  • Destination Unreachable: 3
  • Timeout (TTL =0 ) : 11
  • Parameter Problem : 12
Data Control (数据控制)
  • Data Quench(信源抑制) : 4
  • Redirect (重定向) : 5 (D-R1-R2-S,S-R2S-R1)
Query(request/response)
  • Echo (回送) request/response : 8/0
  • Timestamp (时间戳) request/response: 13/14
  • Router (路由器) request/announcement: 15/16
  • Netmask (掩码) request/reply: 17/18

format

ICMP messages carried in IP datagrams(在IP报文的数据区)
traceroute(路由跟踪,返回每个路由节点之间的时延和路由节点地址)
implement(实现)

When nth datagram arrives to nth router(发一串UDP片段到目的地,TTL从一递增)

When Nth datagram arrives to Nth router: Router discards datagram,And sends to source an ICMP message (type 11, code 0),Message includes name of router& IP address

When ICMP message arrives, source calculates RTT

Traceroute does this 3 times
stop(终止条件)
  • UDP segment eventually arrives at destination host(顺利到达)
  • Destination returns ICMP host unreachable packet (type 3, code 3),When source gets this ICMP, stops.(返回不可达ICMP包)

IPv6

解决地址不足

format

IPv6报文不允许分片

changes

  • Checksum: removed entirely to reduce processing time at each hop(去掉了校验和,节省每跳时间)
  • Options: allowed, but outside of header, indicated by Next Header field(选项不在头部之内)
  • ICMPv6: new version of ICMP

transition from IPv4 to IPv6

IPv6头外面再套一个IPv4头部

OpenFlow

匹配来自三个层次协议的字段,并能够进行相应的操作
由Openflow交换机和控制服务器组成

match

header fields, including Link layer, Network layer, Transport layer

operation

  • Forward packet to port(s)(as router)
  • Encapsulate and forward to controller(封装并转发给控制器)
  • Drop packet(as Firewall)
  • Send to normal processing pipeline(as switch)
  • Modify Fields(as NAT)

AS

热土豆协议:自治域间发送数据报时,首先考虑出自治域最短路径,而不是到达目的地的最短路径

BGP路由端口号179

由于域内有相关需求,域间通信使用的是修正后的“最短”不一定是实际上最短,但在条件下

路由的协议大于路由的算法

4.3 Routing Algorithm

通过计算更新转发表中的内容来控制数据报的转发端口的选择

classify

centralized(集中式,拥有所有网络链路开销信息)
decentralized(分散式,仅有预期直接相连链路开销信息)
static(静态,路由经由人工调整)
dynamic(动态,路由随着网络流量负载或链路开销变化而改变)
load-sensitive(负载敏感,链路开销会动态反映底层链路的当前拥塞水平)
load-insensitive(负载迟钝,链路开销不明确反映其拥塞水平)

link state(base on Dijkstra’s algorithm)

condition(条件)

  • net topology, link costs known to all nodes(已知全局节点之间的路径权重信息)

result(返回值)

  • computes least cost paths from one node to all other nodes(从源点到任意点全局最短路径)

iterative(迭代次数):

  • number of destinations

steps

  • 列出当前到各点的距离(权重),选择最小距离(权重)的点加入路径
  • 更新当前到其余各点的距离(权重),重复上一步
  • 所有点都加入路径中时停止

problem:routing flappin(路由选择震荡,在任何使用拥塞或基于时延的链路测度算法中都有可能出现)

when support link cost equals amount of carried traffic(链路开销等于链路上承载的负载时,即某条路径上总共承载的不重复的流量,且链路反向时的开销与正向无关)
situation

路径选择摇摆,本质上就是因为所有节点都在同时追求最低开销的路径,所以造成某条路径“一拥而入”,而另一条路径“无人问津”,而下一次计算路径时,由于链路开销等于链路上承载的负载,“无人问津”的路径此时开销最小,“一拥而入”的路径反而开销最大,因此此时“无人问津”的路径又成了“香馍馍”,节点又会争相选择这条路径,原来“一拥而入”的路径又变得“无人问津”,如此循环往复,路径的选择一直在摇摆转换
solution
  • 强制链路开销不依赖于其所承载的流量
  • 确保所有的路由器并非在同一时刻(周期相同)运行路由算法
但在实践中发现即使以同一周期在不同时刻执行算法仍会达到“自同步”
让每台路由器发送链路通告的时间随机化

distance-vector

Bellman-Ford equation

condition

  • knows cost to each neighbor v(已知到邻居的权重)
  • maintains its neighbors' distance vectors(邻居保存到目标节点的距离向量,即维护距离向量表)

iterative

  • local link cost change
  • receive update message from neighbor

steps

problem:routing loop(路由选择环路,当某一节点到目标节点的开销变大,其邻居的距离向量又是基于此开销计算得到时出现)

邻居的距离向量还是原来的小的开销,本节点在通告其邻居之前首先要计算自己的距离向量,而此时计算的距离向量又是基于邻居的“过时”的开销
situation
44 iterations before algorithm stabilizes

solution
  • 如果z节点到目的节点的距离向量是基于y的距离向量算出,那么z在通告y时会“谎称”自己到目的节点的距离向量为无穷大(增加毒性逆转)
  • 如果z节点到目的节点的距离向量是基于y的距离向量算出,那么z不会通告y他到目的节点的距离向量(水平分割)
  • 缺省情况下通告报文会定期发送,改为一旦开销变更就发出通告减少环路(触发更新)

comparison of LS & DV

Message complexity

  • LS: with n nodes, E links, O(n * E) msgs
  • DV: exchange between neighbors only

Speed of convergence(收敛)

  • LS: O(n2) algorithm requires O(n * E) msgs
  • DV: convergence time varies

Robustness(鲁棒性)

  • LS: node can advertise incorrect link cost,each node computes(计算) only its own table
  • DV: node can advertise incorrect path cost,each node’s table used by others(错误值会扩散到整个网络)

4.4 Intra-AS Routing Protocol(域内路由):OSPF

AS(autonomous systems)自治域,为解决网络规模过大问题以及各ISP自主管理的需要而产生,同一AS中路由器运行相同的路由选择算法并拥有彼此的信息

common

  • RIP: Routing Information Protocol
  • OSPF: Open Shortest Path First (IS-IS protocol essentially same as OSPF)
  • IGRP: Interior Gateway Routing Protocol (Cisco proprietary for decades, until 2016)

OSPF(Open Shortest Path First)

operation

  • uses link-state algorithm
  • router floods OSPF link-state to all other routers in entire AS(carried in OSPF messages directly over IP rather than TCP or UDP)

advanced

  • all OSPF messages authenticated (to prevent malicious intrusion)(鉴别防止入侵)
  • multiple same-cost paths allowed (only one path in RIP)(多条路径都使用)
  • integrated uni- & multi-cast support(对单播路由和多播路由的综合支持)
  • for each link, multiple cost metrics for different ToS(不同服务对象不同开销)
  • Hierarchical OSPF in large domains(支持层次结构)

Hierarchical OSPF

two-level hierarchy: local area, backbone.
link-state advertisements only in area
each nodes has detailed area topology; only know direction (shortest path) to nets in other areas.
  • area border routers(连接骨干区域和非骨干区域): summarize distances to nets in own area, advertise to other Area Border routers.
  • backbone routers: run OSPF routing limited to backbone.
  • boundary routers: connect to other AS' es

4.5 Inter-AS Routing Protocol(域间路由):BGP

BGP(Border Gateway Protocol)用于计算离开AS的方向,即跳到哪个相邻AS的网关

Path advertisement

在BGP中每对路由器通过使用179端口的半永久TCP连接交换路由选择信息

BGP connection

eBGP(外部BGP,跨越连个AS的BGP连接)

obtain subnet reachability information from neighboring ASes

iBGP(内部BGP)

propagate reachability information to all AS-internal routers.

Path attributes(BGP属性)

AS-PATH

list of ASes through which prefix advertisement has passed(需要经过的AS序列)

NEXT-HOP

indicates specific internal-AS router to next-hop AS(路径中相邻AS的与本AS连接的网关)

Advertise Steps

  • 网关路由器先向网关路由器发送eBGP报文
  • 收到eBGP报文的网关路由器向AS中所有其他路由器发送iBGP报文
  • 收到iBGP的网关路由器继续向连接的其他AS网关路由器发送eBGP报文
网关路由器可以向AS内通报多条路径

Policy-based routing(可配置特殊要求)

  • gateway receiving route advertisement uses import policy to accept/decline path (e.g., never route through AS Y)
  • AS policy also determines whether to advertise path to other other neighboring ASes
Message type
OPEN 报文:相当于是HELLO报文,用于邻居发现。
KEEPALIVE 报文:用于邻居状态检查,确保TCP连接正常。
UPDATE 报文:用于路由的更新,BGP使用触发增量更新。
NOTIFICATION 报文:用于对出现错误状态时的提示消息。
ROUTEREFRESH 报文:用于发送路由更新请求,请求邻居重新发送路由

BGP route selection

Hot Potato Routing(热土豆协议)

choose local gateway that has least intra-domain cost(选择到网关的域内路径最小的网关,不考虑域间路径开销)

achieving policy via advertisements(策略配置)

ISP only wants to route traffic to/from its customer networks(ISP不希望使用别的ISp流量时设置特殊的策略,宁愿放弃“最短”路径也要避开某些AS)

steps

  • local preference value attribute: policy decision(本地偏好即策略)
  • shortest AS-PATH(AS最少跳数或最短域间路径)
  • closest NEXT-HOP router: hot potato routing(热土豆,最短域内路径)
  • additional criteria(BGP标识符)
Why different Intra-, Inter-AS routing

Policy:(ISP策略)
inter-AS: admin wants control over how its traffic routed, who routes through its net.(主导)
intra-AS: single admin, so no policy decisions needed(没大影响)
Scale:(规模关系可拓展性)
hierarchical routing saves table size, reduced update traffic(AS间需要考虑,AS内不用)
Performance:(性能)
intra-AS: can focus on performance(AS内首要考虑性能)
inter-AS: policy may dominate over performance(AS间首要考虑ISP策略)

4.6 SDN(Software Defined Networking)

Reason

  • easier network management: avoid router mis-configurations, greater flexibility of traffic flows(便于管理)
  • table-based forwarding (recall OpenFlow API) allows programming routers(集中式编程相对于分布式减小了编程量且便于维护)
  • open (non-proprietary) implementation of control plane(分组交换机和SDN控制器物理分离(数据平面和控制平面),可以不再为单一厂商生产制造,带来了生态多样性和更多的可能)
  • 克服了传统交通工程问题中仅仅以开销作为路径选择的盲目性,能以“上帝视角”寻找符合需要的路径

Construction

data plane switches

  • fast, simple, commodity switches implementing generalized data-plane forwarding in hardware(仅仅实现集中转发)
  • switch flow table computed, installed by controller(流表由SDN控制器计算)
  • protocol for communicating with controller (OpenFlow)(与SDN控制器通信)
  • API for table-based switch control (OpenFlow)(定义SDN能够修改的信息的软件接口)

SDN controller

  • maintain network state information(存储整个网络状态信息)
  • interacts with network control applications “above” via northbound API(向上与网络控制应用通信)
  • interacts with network switches below via southbound API(向下与分组交换机通信)
implemented as distributed system for performance, scalability, fault-tolerance, robustness(存储网络状态信息是逻辑上集中,但在实现过程中常常是分布式的,目的是保证其性能、容错性和健壮性)

Network-control apps

  • implement control functions using lower-level services, API provided by SND controller(通过下层提供的API实现网络控制)

SDN controller

  • Interface layer to network control apps(网络控制程序接口): abstractions API
  • Network-wide state management layer(网络范围状态管理层): state of networks links, switches, services: a distributed database
  • communication layer(通信层): communicate between SDN controller and controlled switches

OpenFlow protocol(使用SDN模式的典例)

OpenFlow协议运行在TCP之上,使用6653的默认端口号

range

between controller, switch

controller-to-switch messages

  • features(读状态): controller queries switch features, switch replies
  • configure(配置): controller queries/sets switch configuration parameters
  • modify-state(修改状态): add, delete, modify flow entries in the OpenFlow tables
  • packet-out(发送分组): controller can send this packet out of specific switch port

switch-to-controller messages

  • packet-in(分组入,在流表中匹配不到的分组,上交控制器处理): transfer packet (and its control) to controller. See packet-out message from controller
  • flow-removed(报告流表失效): flow table entry deleted at switch
  • port status(报告端状态): inform controller of a change on a port.

ODL(OpenDaylight)controller

ONOS controller

4.7 Network management and SNMP

TCP/IP协议簇的一个应用层协议

Infrastructure for network management(网络管理架构)

  • managing(管理服务器):网络管理员发起管理操作
  • managed device(被管设备):本身有许多可管理的组件(如网络接口)和参数
  • data(数据):与被管设备相关联的数据,包括配置数据、运行数据、设备统计
  • network management agent(网络管理代理):运行在被管设备上的软件进程,接收管理服务器的命令,直接管理设备
  • network management protocol(网络管理协议):运行在管理服务器和被管设备之间

SNMP(Simple Network Management Protocol)

MIB(Management Infomation Base):管理信息库

function

  • request/response mode(请求响应模式)

  • trap mode(陷阱报文)

message types

  • Get-Request 、Get-Next-Request、Get-Response:请求被管设备MIB对象值
  • Set-Request:设置被管设备MIB对象值
  • InformRequest :通知另一个MIB信息管理服务器
  • Response:被管设备响应服务器的请求
  • Trap:陷阱报文

message formats

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值