Recommendation for the Real Application Cluster Interconnect and Jumbo Frames

To BottomTo Bottom

In this Document

Purpose
 Scope
 Details
 Configuration
 Testing
 Performance
 Known Bugs
 Recommendation
 References


Applies to:

Oracle Database - Enterprise Edition - Version 9.2.0.1 to 12.1.0.1 [Release 9.2 to 12.1]
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 9.2.0.1 to 11.2


Purpose

This note covers the current recommendation for the Real Application Cluster Interconnect and Jumbo Frames

Scope

This article points out the issues surrounding Ethernet Jumbo Frame usage for the Oracle Real Application Cluster (RAC) Interconnect. In Oracle Real Application Clusters, the Cluster Interconnect is designed to run on a dedicated, or stand-alone network. The Interconnect is designed to carry the communication between the nodes in the Cluster needed to check for the Clusters condition and to synchronize the various memory caches used by the database.

Ethernet is a widely used networking technology for Cluster Interconnects. Ethernet's variable frame size of 46-1500 bytes is the transfer unit between the all Ethernet participants, such as the hosts and switches. The upper bound, in this case 1500, is called MTU (Maximum Transmission Unit). When an application sends a message greater than 1500 bytes (MTU), it is fragmented into 1500 byte, or smaller, frames from one end-point to another. In Oracle RAC, the setting of DB_BLOCK_SIZE multiplied by the MULTI_BLOCK_READ_COUNT determines the maximum size of a message for the Global Cache and the PARALLEL_EXECUTION_MESSAGE_SIZE determines the maximum size of a message used in Parallel Query. These message sizes can range from 2K to 64K or more, and hence will get fragmented more so with a lower/default MTU.

Jumbo Frames introduces the ability for an Ethernet frame to exceed its IEEE 802 specified Maximum Transfer Unit of 1500 bytes up to a maximum of 9000 bytes. Even though Jumbo Frames is widely available in most NICs and data-center class managed switches it is not an IEEE approved standard. While the benefits are clear, Jumbo Frames interoperability is not guaranteed with some existing networking devices. Though Jumbo Frames can be implemented for private Cluster Interconnects, it requires very careful configuration and testing to realize its benefits. In many cases, failures or inconsistencies can occur due to incorrect setup, bugs in the driver or switch software, which can result in sub-optimal performance and network errors.

Details

Configuration

In order to make Jumbo Frames work properly for a Cluster Interconnect network, careful configuration in the host, its Network Interface Card and switch level is required:
  • The host's network adapter must be configured with a persistent MTU size of 9000 (which will survive reboots).
For example, ifconfig -mtu 9000 followed by ifconfig -a to show the setting completed.
  • Certain NIC's require additional hardware configuration.
For example, some Intel NIC's require special descriptors and buffers to be configured for Jumbo Frames to work properly.
    • The LAN switches must also be properly configured to increase the MTU for Jumbo Frame support. Ensure the changes made are permanent (survives a power cycle) and that both "Jumbo" refer to same size, recommended 9000 (some switches do not support this size).

 

  • Because of the lack of standards with Jumbo Frames the interoperability between switches can be problematic and requires advanced networking skills to troubleshoot.
  • Remember that the smallest MTU used by any device in a given network path determines the maximum MTU (the MTU ceiling) for all traffic travelling along that path.

Failing to properly set these parameters in all nodes of the Cluster and Switches can result in unpredictable errors as well as a degradation in performance. 

Testing

Request your network and system administrator along with vendors to fully test the configuration using standard tools such as SPRAY or NETCAT and show that there is an improvement not degradation when using Jumbo Frames.  Other basic ways to check it's configured correctly on Linux/Unix are using:

Traceroute: Notice the 9000 packet goes through with no error, while the 9001 fails, this is a correct configuration that supports a message of up to 9000 bytes with no fragmentation:

[node01] $ traceroute -F node02-priv 9000
traceroute to node02-priv (10.10.10.2), 30 hops max, 9000 byte packets
1 node02-priv (10.10.10.2) 0.232 ms 0.176 ms 0.160 ms

[node01] $ traceroute -F node02-priv 9001
traceroute to node02-priv (10.10.10.2), 30 hops max, 9001 byte packets
traceroute: sendto: Message too long
1 traceroute: wrote node02-priv 9001 chars, ret=-1

 * Note: Due to Oracle Bugzilla 7182 (must have logon privileges) -- also known as RedHat Bugzilla 464044 -- older than EL4.7 traceroute may not work correctly for this purpose.
 * Note: Some versions of tracroute, e.g. traceroute 2.0.1 shipped with EL5, add the header size on top of what is specified when using the -F flag (same as ping behavior below). Newer versions of traceroute, like 2.0.14 (shipped with OL6) have the old behavior of traceroute version 1 (size of packet is exactly as what is specified with the -F flag).

Ping: With ping we have to take into account an overhead of about 28 bytes per packet, so 8972 bytes go through with no errors, while 8973 fail, this is a correct configuration that supports a message of up to 9000 bytes with no fragmentation:

[node01]$ ping -c 2 -M do -s 8972 node02-priv
PING node02-priv (10.10.10.2) 1472(1500) bytes of data.
1480 bytes from node02-priv (10.10.10.2): icmp_seq=0 ttl=64 time=0.220 ms
1480 bytes from node02-priv (10.10.10.2): icmp_seq=1 ttl=64 time=0.197 ms

[node01]$ ping -c 2 -M do -s 8973 node02-priv
From node02-priv (10.10.10.1) icmp_seq=0 Frag needed and DF set (mtu = 9000)
From node02-priv (10.10.10.1) icmp_seq=0 Frag needed and DF set (mtu = 9000)
--- node02-priv ping statistics ---
0 packets transmitted, 0 received, +2 errors
For Solaris platform, the similar ping command is:
$ ping -c 2 -s  node02-priv 8972

* Note: Ping reports fragmentation errors, due to exceeding the MTU size.

Performance

For RAC Interconnect traffic, devices correctly configured for Jumbo Frame improves performance by reducing the TCP, UDP, and Ethernet overhead that occurs when large messages have to be broken up into the smaller frames of standard Ethernet. Because one larger packet can be sent, inter-packet latency between various smaller packets is eliminated. The increase in performance is most noticeable in scenarios requiring high throughput and bandwidth and when systems are CPU bound.


When using Jumbo Frames, fewer buffer transfers are required which is part of the reduction for fragmentation and reassembly in the IP stack, and thus has an impact in reducing the latency of a an Oracle block transfer.

As illustrated in the configuration section, any incorrect setup may prevent instances from starting up or can have a very negative effect on the performance.

Known Bugs

In some versions of Linux there are specific bugs in Intel's Ethernet drivers and the UDP code path in conjunction with Jumbo Frames that could affect the performance.  Check for and use the latest version of these drivers to be sure you are not running into these older bugs.


The following bugzilla bugs 162197, 125122 are limited to RHEL3.

 

 

 

Recommendation

There is some complexity involved in configuring Jumbo Frames, which is highly hardware and OS specific.  The lack of a specific standard may present OS and hardware bugs. Even with these considerations, Oracle recommends using Jumbo Frames for private Cluster Interconnects.

Since there is no official standard for Jumbo Frames, this configuration should be properly load tested by Customers. Any indication of packet loss, socket buffer or DMA overflows, TX and RX error in adapters should be noted and checked with the hardware and operating system vendors.


The recommendation in this Note is strictly for Oracle private interconnect only, it does not apply to other NAS or iSCSI vendor tested and validated Jumbo Frames configured networks.

Oracle VM didn't support Jumbo in all ovm2 versions and all ovm3.0, however starting with ovm3.1.1 it's supported, refer to:

http://www.oracle.com/us/technologies/virtualization/ovm-3-1-whats-new-1634275.pdf

 

To procedure to change MTU, refer to note 283684.1 -  "Changing private network MTU only"

References

NOTE:1085885.1 - CRS root.sh Script Failing on Second Node When MTU Larger than 1500
NOTE:1166925.1 - Oracle VM: Jumbo Frame on Oracle VM
NOTE:300388.1 - Instances Unable To Start If MTU Size Is Different for Cluster_interconnect
NOTE:1290585.1 - Solaris: Wrong MTU Size for VIP or HAIP
NOTE:283684.1 - How to Modify Private Network Information in Oracle Clusterware
BUG:13332363 - BUG 9795321 HAS REAPPEARED IN 11.2.0.3

Document Details

 
Email link to this document Open document in new window Printable Page
 
 
 
 
 
 
 BULLETIN
 PUBLISHED
 09/05/2014
 09/05/2014
   
 

Related Products

 
Oracle Database - Enterprise Edition
   
 

Document References

 
   
 

Recently Viewed

 
   
   
 
Copyright (c) 2014, Oracle. All rights reserved. Legal Notices and Terms of Use Privacy Statement
 
 
 
 
Restore PaneRestore PaneCritical ErrorWarningConfirmationCollapse PaneInformationWarningCritical ErrorCollapse PaneInformationConfirmationErrorError
深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
【4层】3100平米综合办公楼毕业设计(含计算书、建筑结构图) 、1资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 、1资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值