1000 feet view
5000 feet view
Terminologies
- LOA-CFA: Letter of Authorization & Connecting Facility Assignment. An authorization document which Customer is able to provide to the COLO facility to create a hardwired connection between Customer network and AWS
- Small Form-factor Pluggable (SFP): a compact, hot-pluggable transceiver used on both side of the fiber Cable. The transceiver converts binary into light pulses which travel over the communication medium to the next network node where the reverse action occurs
- MMR: Meet me Room. The common room where AWS has its demarcation point for and will have the Ports available on the patch panel for the Customers to connect into
- Rack: a metal frame chassis that holds, stacks, organizes, secures and protects various computer network and server hardware devices
- A Side: from the customers perspective, this is the Customer/provider side of the Patch Panel in the Meet Me Room
- Patch Panel: a mounted hardware assembly that contains ports used to connect and manage incoming and outgoing LAN cables
- Z Side: from the customers perspective, this is the AWS side of the Patch Panel in the Meet Me Room
- Cross-Connect: is any connection between facilities provided as separate units by the datacenter. The line that runs from Customer to the AWS rack is a cross-connect
- Single Mode Fiber (SMF): single mode fiber optic cable has a small diameter core that allows only one mode of light to propagate. This smaller core means less modal dispersion, but also means it can carry light signals for longer distances without degradation
- Loopback Tester: a device that can be inserted into an SFP or patch panel port. Determine if a physical port or segment of fiber optic cabling is functioning properly. This device creates a physical loop on the SFP or cable, effectively testing both the transmit and receive strands
1. 好处
- 降低网络开销:bandwidth-heavy负载情况下,直接将数据传到AWS,比通过互联网传输更便宜
- 专用连接的私有性:私密性、高带宽。通过多个private virtual interface或一个transit Virtual Interface,还可以连到多个VPC
- 持续的网络性能:互联网上的传输可能会根据数据是怎么传输的,而在时延上有所变化。但DX是使用专用连接,更加consistent
2. VLAN和DX
In DX, we use VLANs to create multiple virtual networks inside 1 Physical Link. This allows Customers to have multiple Virtual Interfaces inside 1 Physical Connection
3. DX限制
4. DX Resiliency model
两种failure: device failure, location failure
根据不同工作负载选择:
-
Maximum Resiliency: for critical workloads
- Dual DX Connection in Different Locations
- Dual DX Connection in Different Locations
-
High Resiliency: for critical workloads
- Single DX Connection in Different Locations
- Single DX Connection in Different Locations
-
Development and Test: for non critical production workloads
- Dual DX Connection at a Single Location
- Dual DX Connection at a Single Location
-
Classic
- Single DX Connection
- DX Connection with VPN as a Backup
5. DX troubleshooting的三个层面(也是DX服务的三个层面)
1)Layer 1
- Physical Connection DOWN
- 做法:
- 使用K2检查Connection Status是UP还是DOWN
- 使用K2检查Router,查看信号的strength(一个Rx/Tx信号在-14.4~2.50之间是正常的)。且Down的话ARP信息不会报告MAC地址条目
- 可能的原因:
- 让客户检查SFP(Small Form-Factor Pluggable Transceiver Modules)正在工作,必须是1000BASE-LX for 1Gbps connections 或10GBASE-LR for 10Gbps connections
- 让客户检查他们禁用掉了端口自动协商,他们的接口上必须是端口速度和full-duplex 模式
- 确认RX信号强度好(-14.4~2.5),若超过范围,可能存在cabling issue。客户需要联系APN(AWS Partner Network)provider来检验cabling/signaling。要开TT-DX
- 确保客户的接口没有被禁用,比如Cisco没有处于shutdown状态
- 解决方式:
- 替换SFP
- 确认客户可以访问customer cage中的设备管理接口 ,这样可以让客户和AWS一起troubleshooting;若客户有个parter在customer cage,要让他联系parter来完成troubleshooting步骤
- 做法:
- CRC错误:在K2 VPC Workbench Router Query中有一堆input and/or output errors。可以刷新页面看错误是不是正在增加,也可以看CW的AWS/DX->ConnectionCRCErrorCount指标
- 起因:可能是连接任何一端的transmission media存在错误组件
- 解决方法:
- have the SFPs swapped and fiber cleaned on both sides of the connection (AWS and customer/partner)
2) Layer 2(Data Link)
若Layer 1是UP(RX信号在-14.4~2.5),但还是有connectivity issue,就检查Layer 2
即是否能够学习到MAC地址。正常的二层,you should be able to see the MAC address of the Router with the IP matching to what Customer has configured as Customer Peer IP
若ARP信息不报告任何MAC地址条目,客户也无法ping通本地地址
- 可能的原因:
- 客户绑在CGW上的是错误的VLAN ID,or may not be tagging traffic at all
- CGW和AWS DX router之前的交换机或其他设备may not be tagging traffic properly with Dot1Q encapsulation
- Layer 3(Network)
一个正常的Layer3即:客户可以ping AWS端的VIF,反过来也可以- 检查VIF状态:K2
- 检查BGP状态:K2
DX Architecture
-
DX + VPN:
-
DX + TGW
-
VPN over DX:
-
LAG: Link Aggregation Control Protocol, aggregate multiple connections as a single AWS DX endpoint, allowing you to treat them as a single managed connection
Direct Connect Gateway: only route traffic from AWS DX VIFs to VGW (associated with VPC)
6. 一些常见问题
- Cross-Connect process?
DX Connection订购后,需要up to 72小时才能分配Port。有时AWS会开一个Outbound case来询问更多信息,以确保这是一个有效的请求。需要客户提供:data center、network provider(开一个case提供这些) - 公有VIF的白名单IP
需要确定客户owns the public IP;
若客户使用ISP分配的公有IP地址,他们需要请求ISP发送a Letter of Authorization on ISP letter
下一条是VGW,客户那边是学到的BGP路由
VPC中的route propagation:allows a VGW to automatically propagate routes to the route tables
NOVICE、IGW的blackfoot。
中间是防火墙,是透明的,对于IP包而言,源和目的没变化,实例都不知道自己的包被防火墙检查了
IGW把它换成公网IP
原来没有gateway route table功能,实例配公网IP,包送到eni0,这个eni0把包路由到IGW,是不同的(blackfoot会丢掉)
把IGW绑路由表后会改变
出方向会检查映射关系
看着像NAT实例,但要有自己的公网IP,同时不支持非对称路由
。2发一个DNS请求,底层网络截获,底层服务器帮它到互联网上去解析,relay。
Xen:每个实例跑一个bind
nitro:有个实例跑dnsmasq
bind和dnsmasq都是dns resolver
现在有interface analyser上直接可以看到
performance:下面
connectivity