Data plane :
local, per-router function (单个路由器上的活动) determines how datagram arriving on router input port is forwarded to output port (关注从路由器输入端口到路由器输出端口的过程)
forwarding function Control plane :
network-wide logic(关系路由器间的活动)
determines how datagram is routed from source to dest (关注路由器转发时端口的选择)
two approaches:implemented in routers , SDN(software-defined networking )
网络性能主要指标:时延、丢包率、带宽、可靠性
网络服务模型:ATM(异步转换模式)模型、Internet模型
4.1 Router Organization(基于TCAM)
Input port:
line termination(线路端接,物理层) :
- bit-level reception
link layer protocol (receive)(数据链路处理,链路层):
- e.g. Ethernet
look up, forwarding, queuing(查找、转发、排队) :
- destination-based forwarding(基于目的地转发): based only on destination IP address (traditional)
- generalized forwarding(泛化转发/通用转发): based on any set of header field values ( Longest prefix matching (最长前缀匹配,根据前缀特征选择输出端口,节省工作量)),using TCAM(Tenary Content Address Memory,三态内容可寻址存储器)
when looking for forwarding table entry for given dstination address, use longest address prefix that matches destination address.
采用最大长度匹配而不是固定长度匹配的原因:尽可能的一步到位,提升速度但又降低一定的复杂度
switch fabric(交换结构):
- Switching via memory:speed limited by memory bandwidth (2 bus crossings per datagram)
- Switching via a bus:switching speed limited by bus bandwidth
- Switching via interconnection network:overcome bus bandwidth
input port queuing:
- many datagram are transfered to the same output port at the same time(同时交换的速度大于从输出端口出去的速度)
- many datagram are send to the same input port(同时交换的速度小于从输入端口进接受的速度,队首阻塞)
Output port:
结构基本类似,但比起输入端口,输出端口的缓冲区不用进行转发操作
buffering required when datagrams arrive from fabric faster than the transmission rate(缓存的意义是短时间内交换速度大于输出速度时数据报不会丢失,但累积超出缓冲区时仍会丢失)
buffering size :
- RFC 3439:
- recent reconmmendation:buffering equal to
with N flows, C link capacity
output port quequing scheduling mechanisms(输出排队调度) :
- FIFO(先进先出) scheduling : tail drop or priority drop or random drop
- priority(优先权排队,即优先级高的先润) scheduling
- Round Robin (RR)(分组循环排队,每个组轮流发一) scheduling :
- Weighted Fair Queuing (WFQ)(分组加权排队,轮流发的数量之比等于权重之比)
4.2 IP(Internet Protocol)
网络中除了数据信息还跑了很多控制信息,以及重发报文,控制报文的寿命不能超过15跳
接受输出匹配速度一致,否则阻塞丢包
IP报文格式+传输要求(规范)+寻址方式+ICMP
IPv4
IP packet format:
Packet:
一个IP报文的开销:20 bytes TCP首部+20 bytes IP首部+应用层首部
TTL(Time to live):
- max number remaining hops(最大跳跃数量)
- decremented at each router(TTL值经过一个节点减一)
- TTL=0 : discard
- can be used to count hops to destination
IP Fragmentation & Re-assemble(IP分片和重组):
MTU(Maximum Transmission Unit)(网络上传输的最大数据包), different link types, different MTUs
常用链路的MTU值:以太网:1500 Bytes、P2P:4470 Bytes
理论最大值:65535 Bytes
理论最小值:68 Bytes
IP分片和重组的目的就是为了兼容不容链路类型的MTU
- large IP datagram divided (fragmented) into several datagrams within net
- re-assembled only at final destination
- IP header bits used to identify, order related fragments
IP头部中的flag用与标志该分片报文是否是尾部分片报文(最后一片),flag = 1时不是尾部分片报文,flag = 0时是尾部分片报文
IP头部中的offset用于指示该分片报文数据在完整数据中的起始位置,计算方法为(length字段的值-20) / 8
addressing:
点分十进制表示
address allocation
- IANA:Internet Assigned Numbers Authority(互联网号码分配机构)
- ICCAN:Internet Corporation for Assigned Names and Numbers
- ASO (Address Supporting Organization)
classful addressing (有类寻址)
网络号和主机号全0全1都留做他用
网络号全0:
网络号全1:
主机号全0:本地地址
主机号全1:广播地址
Special IP Address
network mask(子网掩码,用于提取网络号)
IP address & network mask => network ID
interface
- router’s typically have multiple interfaces
- host typically has one interface
- IP addresses associated with each interface
CIDR ( Classless Inter Domain Routing )(无类寻址)
解决A-E分类粒度太粗糙
subnets
IP address can divide into subnet part (high order bits) and host part (low order bits)
- device interfaces with same subnet part of IP address
- can physically reach each other without intervening router
DHCP(Dynamic Host Configuration Protocol)
plug-and-play(即插即用),基于UDP传输
function
allow host to dynamically obtain its IP address from network server when it joins network
steps(注意过程中使用的地址)
- Host broadcasts “DHCP discover” msg(DHCP在不在)
- DHCP server responds with “DHCP offer” msg(DHCP在的,并给了你一个可用address)
- host requests IP address: “DHCP request” msg(我确认用这个address)
- DHCP server sends address: “DHCP ack” msg(收到)
NAT(Network Address Translation)(网络地址转换)
goal
local network uses just one IP address as far as outside world is concerned(子网在外共用一个IP地址,子网内部用特殊的IP地址,进出子网时进行转换)
10.0.0.0 ----- 10.255.255.255
172.0.0.0 ----- 172.255.255.255
192.168.0.0 ----- 192.168.255.255
implement
- outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #)(发送报文至外网,将作为发送端信息的内部IP地址和端口号转换为NAT的IP地址和端口号)
- incoming datagrams: replace (NAT IP address, new port #) in destination fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table(从外网接受报文,将作为接收端信息的NAT的IP地址和端口号转换为内部IP地址和端口号)
- remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair(存储NAT的IP地址的端口号和子网内部IP地址和端口号的表格)
本质上是使用端口号模拟了子网主机IP和端口号
controversial(争议)
- routers should only process up to layer 3(路由器应该只负责网络层功能,职责暧昧了)
- violates end-to-end argument(违背了端到端通信原则,使用了NAT作为中继,例如使得P2P应用设计者不得不考虑NAT)
- address shortage should instead be solved by IPv6(IP地址不足的问题IPv6完全可以解决)
- 端口号应当用于进程寻址而不是主机
problem & its solution
- client want to connect to server with address 10.0.0.1(外部主机需要对内部子网中的某台特定主机建立连接,但是内部子网中的所有主机都共用了一个IP地址)
- statically configure NAT to forward incoming connection requests at given port to server(静态NAT转换表,即子网内某主机对应的NAT的IP地址的端口号固定)
ICMP(Internet Control Message Protocol)
function
used by hosts & routers to communicate network-level information(返回差错情况)
Error Report (差错报告)
- Destination Unreachable: 3
- Timeout (TTL =0 ) : 11
- Parameter Problem : 12
Data Control (数据控制)
- Data Quench(信源抑制) : 4
- Redirect (重定向) : 5 (D-R1-R2-S,S-R2S-R1)
Query(request/response)
- Echo (回送) request/response : 8/0
- Timestamp (时间戳) request/response: 13/14
- Router (路由器) request/announcement: 15/16
- Netmask (掩码) request/reply: 17/18
format
ICMP messages carried in IP datagrams(在IP报文的数据区)
traceroute(路由跟踪,返回每个路由节点之间的时延和路由节点地址)
implement(实现)
When nth datagram arrives to nth router(发一串UDP片段到目的地,TTL从一递增)
When Nth datagram arrives to Nth router: Router discards datagram,And sends to source an ICMP message (type 11, code 0),Message includes name of router& IP address
When ICMP message arrives, source calculates RTT
Traceroute does this 3 times
stop(终止条件)
- UDP segment eventually arrives at destination host(顺利到达)
- Destination returns ICMP host unreachable packet (type 3, code 3),When source gets this ICMP, stops.(返回不可达ICMP包)
IPv6
解决地址不足
format
IPv6报文不允许分片
changes
- Checksum: removed entirely to reduce processing time at each hop(去掉了校验和,节省每跳时间)
- Options: allowed, but outside of header, indicated by Next Header field(选项不在头部之内)
- ICMPv6: new version of ICMP
transition from IPv4 to IPv6
IPv6头外面再套一个IPv4头部
OpenFlow
匹配来自三个层次协议的字段,并能够进行相应的操作
由Openflow交换机和控制服务器组成
match
header fields, including Link layer, Network layer, Transport layer
operation
- Forward packet to port(s)(as router)
- Encapsulate and forward to controller(封装并转发给控制器)
- Drop packet(as Firewall)
- Send to normal processing pipeline(as switch)
- Modify Fields(as NAT)
AS
热土豆协议:自治域间发送数据报时,首先考虑出自治域最短路径,而不是到达目的地的最短路径
BGP路由端口号179
由于域内有相关需求,域间通信使用的是修正后的“最短”不一定是实际上最短,但在条件下
路由的协议大于路由的算法
4.3 Routing Algorithm
通过计算更新转发表中的内容来控制数据报的转发端口的选择
classify
centralized(集中式,拥有所有网络链路开销信息)
decentralized(分散式,仅有预期直接相连链路开销信息)
static(静态,路由经由人工调整)
dynamic(动态,路由随着网络流量负载或链路开销变化而改变)
load-sensitive(负载敏感,链路开销会动态反映底层链路的当前拥塞水平)
load-insensitive(负载迟钝,链路开销不明确反映其拥塞水平)
link state(base on Dijkstra’s algorithm)
condition(条件)
- net topology, link costs known to all nodes(已知全局节点之间的路径权重信息)
result(返回值)
- computes least cost paths from one node to all other nodes(从源点到任意点全局最短路径)
iterative(迭代次数):
- number of destinations
steps
- 列出当前到各点的距离(权重),选择最小距离(权重)的点加入路径
- 更新当前到其余各点的距离(权重),重复上一步
- 所有点都加入路径中时停止
problem:routing flappin(路由选择震荡,在任何使用拥塞或基于时延的链路测度算法中都有可能出现)
when support link cost equals amount of carried traffic(链路开销等于链路上承载的负载时,即某条路径上总共承载的不重复的流量,且链路反向时的开销与正向无关)
situation
路径选择摇摆,本质上就是因为所有节点都在同时追求最低开销的路径,所以造成某条路径“一拥而入”,而另一条路径“无人问津”,而下一次计算路径时,由于链路开销等于链路上承载的负载,“无人问津”的路径此时开销最小,“一拥而入”的路径反而开销最大,因此此时“无人问津”的路径又成了“香馍馍”,节点又会争相选择这条路径,原来“一拥而入”的路径又变得“无人问津”,如此循环往复,路径的选择一直在摇摆转换
solution
- 强制链路开销不依赖于其所承载的流量
- 确保所有的路由器并非在同一时刻(周期相同)运行路由算法
但在实践中发现即使以同一周期在不同时刻执行算法仍会达到“自同步”
让每台路由器发送链路通告的时间随机化
distance-vector
Bellman-Ford equation
condition
- knows cost to each neighbor v(已知到邻居的权重)
- maintains its neighbors' distance vectors(邻居保存到目标节点的距离向量,即维护距离向量表)
iterative
- local link cost change
- receive update message from neighbor
steps
problem:routing loop(路由选择环路,当某一节点到目标节点的开销变大,其邻居的距离向量又是基于此开销计算得到时出现)
邻居的距离向量还是原来的小的开销,本节点在通告其邻居之前首先要计算自己的距离向量,而此时计算的距离向量又是基于邻居的“过时”的开销
situation
44 iterations before algorithm stabilizes
solution
- 如果z节点到目的节点的距离向量是基于y的距离向量算出,那么z在通告y时会“谎称”自己到目的节点的距离向量为无穷大(增加毒性逆转)
- 如果z节点到目的节点的距离向量是基于y的距离向量算出,那么z不会通告y他到目的节点的距离向量(水平分割)
- 缺省情况下通告报文会定期发送,改为一旦开销变更就发出通告减少环路(触发更新)
comparison of LS & DV
Message complexity
- LS: with n nodes, E links, O(n * E) msgs
- DV: exchange between neighbors only
Speed of convergence(收敛)
- LS: O(n2) algorithm requires O(n * E) msgs
- DV: convergence time varies
Robustness(鲁棒性)
- LS: node can advertise incorrect link cost,each node computes(计算) only its own table
- DV: node can advertise incorrect path cost,each node’s table used by others(错误值会扩散到整个网络)
4.4 Intra-AS Routing Protocol(域内路由):OSPF
AS(autonomous systems)自治域,为解决网络规模过大问题以及各ISP自主管理的需要而产生,同一AS中路由器运行相同的路由选择算法并拥有彼此的信息
common
- RIP: Routing Information Protocol
- OSPF: Open Shortest Path First (IS-IS protocol essentially same as OSPF)
- IGRP: Interior Gateway Routing Protocol (Cisco proprietary for decades, until 2016)
OSPF(Open Shortest Path First)
operation
- uses link-state algorithm
- router floods OSPF link-state to all other routers in entire AS(carried in OSPF messages directly over IP rather than TCP or UDP)
advanced
- all OSPF messages authenticated (to prevent malicious intrusion)(鉴别防止入侵)
- multiple same-cost paths allowed (only one path in RIP)(多条路径都使用)
- integrated uni- & multi-cast support(对单播路由和多播路由的综合支持)
- for each link, multiple cost metrics for different ToS(不同服务对象不同开销)
- Hierarchical OSPF in large domains(支持层次结构)
Hierarchical OSPF
two-level hierarchy: local area, backbone.
link-state advertisements only in area
each nodes has detailed area topology; only know direction (shortest path) to nets in other areas.
- area border routers(连接骨干区域和非骨干区域): summarize distances to nets in own area, advertise to other Area Border routers.
- backbone routers: run OSPF routing limited to backbone.
- boundary routers: connect to other AS' es
4.5 Inter-AS Routing Protocol(域间路由):BGP
BGP(Border Gateway Protocol)用于计算离开AS的方向,即跳到哪个相邻AS的网关
Path advertisement
在BGP中每对路由器通过使用179端口的半永久TCP连接交换路由选择信息
BGP connection
eBGP(外部BGP,跨越连个AS的BGP连接)
obtain subnet reachability information from neighboring ASes
iBGP(内部BGP)
propagate reachability information to all AS-internal routers.
Path attributes(BGP属性)
AS-PATH
list of ASes through which prefix advertisement has passed(需要经过的AS序列)
NEXT-HOP
indicates specific internal-AS router to next-hop AS(路径中相邻AS的与本AS连接的网关)
Advertise Steps
- 网关路由器先向网关路由器发送eBGP报文
- 收到eBGP报文的网关路由器向AS中所有其他路由器发送iBGP报文
- 收到iBGP的网关路由器继续向连接的其他AS网关路由器发送eBGP报文
网关路由器可以向AS内通报多条路径
Policy-based routing(可配置特殊要求)
- gateway receiving route advertisement uses import policy to accept/decline path (e.g., never route through AS Y)
- AS policy also determines whether to advertise path to other other neighboring ASes
Message type
OPEN 报文:相当于是HELLO报文,用于邻居发现。
KEEPALIVE 报文:用于邻居状态检查,确保TCP连接正常。
UPDATE 报文:用于路由的更新,BGP使用触发增量更新。
NOTIFICATION 报文:用于对出现错误状态时的提示消息。
ROUTEREFRESH 报文:用于发送路由更新请求,请求邻居重新发送路由
BGP route selection
Hot Potato Routing(热土豆协议)
choose local gateway that has least intra-domain cost(选择到网关的域内路径最小的网关,不考虑域间路径开销)
achieving policy via advertisements(策略配置)
ISP only wants to route traffic to/from its customer networks(ISP不希望使用别的ISp流量时设置特殊的策略,宁愿放弃“最短”路径也要避开某些AS)
steps
- local preference value attribute: policy decision(本地偏好即策略)
- shortest AS-PATH(AS最少跳数或最短域间路径)
- closest NEXT-HOP router: hot potato routing(热土豆,最短域内路径)
- additional criteria(BGP标识符)
Why different Intra-, Inter-AS routing
Policy:(ISP策略)
inter-AS: admin wants control over how its traffic routed, who routes through its net.(主导)
intra-AS: single admin, so no policy decisions needed(没大影响)
Scale:(规模关系可拓展性)
hierarchical routing saves table size, reduced update traffic(AS间需要考虑,AS内不用)
Performance:(性能)
intra-AS: can focus on performance(AS内首要考虑性能)
inter-AS: policy may dominate over performance(AS间首要考虑ISP策略)
4.6 SDN(Software Defined Networking)
Reason
- easier network management: avoid router mis-configurations, greater flexibility of traffic flows(便于管理)
- table-based forwarding (recall OpenFlow API) allows programming routers(集中式编程相对于分布式减小了编程量且便于维护)
- open (non-proprietary) implementation of control plane(分组交换机和SDN控制器物理分离(数据平面和控制平面),可以不再为单一厂商生产制造,带来了生态多样性和更多的可能)
- 克服了传统交通工程问题中仅仅以开销作为路径选择的盲目性,能以“上帝视角”寻找符合需要的路径
Construction
data plane switches
- fast, simple, commodity switches implementing generalized data-plane forwarding in hardware(仅仅实现集中转发)
- switch flow table computed, installed by controller(流表由SDN控制器计算)
- protocol for communicating with controller (OpenFlow)(与SDN控制器通信)
- API for table-based switch control (OpenFlow)(定义SDN能够修改的信息的软件接口)
SDN controller
- maintain network state information(存储整个网络状态信息)
- interacts with network control applications “above” via northbound API(向上与网络控制应用通信)
- interacts with network switches below via southbound API(向下与分组交换机通信)
implemented as distributed system for performance, scalability, fault-tolerance, robustness(存储网络状态信息是逻辑上集中,但在实现过程中常常是分布式的,目的是保证其性能、容错性和健壮性)
Network-control apps
- implement control functions using lower-level services, API provided by SND controller(通过下层提供的API实现网络控制)
SDN controller
- Interface layer to network control apps(网络控制程序接口): abstractions API
- Network-wide state management layer(网络范围状态管理层): state of networks links, switches, services: a distributed database
- communication layer(通信层): communicate between SDN controller and controlled switches
OpenFlow protocol(使用SDN模式的典例)
OpenFlow协议运行在TCP之上,使用6653的默认端口号
range
between controller, switch
controller-to-switch messages
- features(读状态): controller queries switch features, switch replies
- configure(配置): controller queries/sets switch configuration parameters
- modify-state(修改状态): add, delete, modify flow entries in the OpenFlow tables
- packet-out(发送分组): controller can send this packet out of specific switch port
switch-to-controller messages
- packet-in(分组入,在流表中匹配不到的分组,上交控制器处理): transfer packet (and its control) to controller. See packet-out message from controller
- flow-removed(报告流表失效): flow table entry deleted at switch
- port status(报告端状态): inform controller of a change on a port.
ODL(OpenDaylight)controller
ONOS controller
4.7 Network management and SNMP
TCP/IP协议簇的一个应用层协议
Infrastructure for network management(网络管理架构)
- managing(管理服务器):网络管理员发起管理操作
- managed device(被管设备):本身有许多可管理的组件(如网络接口)和参数
- data(数据):与被管设备相关联的数据,包括配置数据、运行数据、设备统计
- network management agent(网络管理代理):运行在被管设备上的软件进程,接收管理服务器的命令,直接管理设备
- network management protocol(网络管理协议):运行在管理服务器和被管设备之间
SNMP(Simple Network Management Protocol)
MIB(Management Infomation Base):管理信息库
function
- request/response mode(请求响应模式)
- trap mode(陷阱报文)
message types
- Get-Request 、Get-Next-Request、Get-Response:请求被管设备MIB对象值
- Set-Request:设置被管设备MIB对象值
- InformRequest :通知另一个MIB信息管理服务器
- Response:被管设备响应服务器的请求
- Trap:陷阱报文