onload--limitations

Introduction

This chapter outlines configurations 
that Onload does not accelerate 
and ways in which Onload may change behavior 
of the system and applications. 
It is a key goal of Onload 
to be fully compatible with the behavior 
of the regular kernel stack, 
but there are some cases where behavior deviates.

Resources

Onload uses certain physical resources on the network adapter. 
If these resources are exhausted, 
it is not possible to create new Onload stacks 
and not possible to accelerate new sockets or applications. 
The onload_stackdump utility should be used 
to monitor hardware resources. 
Physical resources include: 
Virtual NICs
Virtual NICs provide the interface 
by which a user level application sends and receives network traffic. 
When these are exhausted 
it is not possible to create new Onload stacks, 
meaning new applications cannot be accelerated. 
However, 
Solarflare network adapters support large numbers of Virtual NICs, 
and this resource is not typically the first to become unavailable.
Endpoints
Onload represents sockets and pipes as structures called endpoints. 
The maximum number of accelerated endpoints permitted 
by each Onload stack is set with the EF_MAX_ENDPOINTS variable. 
The stack limit can be reached sooner than expected 
when syn‐receive states (the number of half‐open connections) also 	
consume endpoint buffers. 
Four syn‐receive states consume one endpoint. 
The maximum number of syn‐receive states can be limited 
using the EF_TCP_SYNRECV_MAX variable.
Filters
Filters are used to deliver packets 
received from the wire to the appropriate application. 
When filters are exhausted 
it is not possible to create new accelerated sockets. 
The general recommendation is 
that applications do not allocate more than 4096 filters 
‐ or applications should not create more than 4096 outgoing connections.
The limit does not apply to inbound connections 
to a listening socket.
Buffer Table
The buffer table provides address protection and translation for DMA buffers. 	
When all buffer resources are exhausted 
it is not possible to create new Onload stacks, 
and existing stacks are not able to allocate more DMA buffers.
When hardware resources are exhausted, 
normal operation of the system should continue, 
but it will not be possible to accelerate new sockets or applications.
TX, RX Ring Buffer Size
Onload does not obey RX, TX ring sizes set in the kernel, 
but instead uses the values specified 
by EF_RXQ_SIZE and EF_TXQ_SIZE both default to 512.
Devices
The efrm driver used by Onload supports a maximum of 64 devices.

Changes to Behavior

Multithreaded Applications Termination

As Onload handles networking 
in the context of the calling application's thread 
it is recommended 
that applications ensure all threads exit cleanly 
when the process terminates. 
In particular the exit() function causes all threads to exit immediately 
‐ even those in critical sections. 
This can cause threads currently within the Onload stack 
holding the per stack lock to terminate 
without releasing this shared lock 
‐ this is particularly important for shared stacks 
where a process sharing the stack could ‘hang’ 
when Onload locks are not released.
An unclean exit can prevent the Onload kernel components 
from cleanly closing the application's TCP connections, 
a message similar to the following will be observed:
[onload] Stack [0] released with lock stuck
and any pending TCP connections will be reset. 
To prevent this, 
applications should always ensure that all threads exit cleanly.

Thread Cancellation

Unexpected behavior can result 
when an accelerated application uses a pthread_cancel function. 
There is increased risk from multi‐threaded applications 
or a PTHREAD_CANCEL_ASYNCHRONOUS thread 
calling a non‐async safe function. 
Onload users are strongly advised 
that applications should not use pthread_cancel functions.

Packet Capture

Packets delivered to an application 
via the accelerated path are not visible to the OS kernel. 
As a result, 
diagnostic tools such as tcpdump and wireshark 
do not capture accelerated packets. 
The Solarflare supplied onload_tcpdump does support capture 
of UDP and TCP packets from Onload stacks 
‐ Refer to onload_tcpdump on page 379 for details.

Firewalls

Packets delivered to an application 
via the accelerated path are not visible to the OS kernel. 
As a result, 
these packets are not visible to the kernel firewall (iptables) 
and therefore firewall rules will not be applied to accelerated traffic. 
The onload_iptables feature can be used 
to enforce Linux iptables rules 
as hardware filters on the Solarflare adapter, 
refer to onload_iptables on page 384.
NOTE: Hardware filtering on the network adapter will ensure 
that accelerated applications receive traffic only on ports 
to which they are bound.

System Tools ‐ Socket Visibility

With the exception of ‘listening’ sockets, 
TCP sockets accelerated by Onload are not visible to the netstat tool. 
UDP sockets are visible to netstat.
Accelerated sockets appear in the /proc directory 
as symbolic links to /dev/onload. 
Tools that rely on /proc will probably not identify 
the associated file descriptors as being sockets. 
Refer to Onload and File Descriptors, 
Stacks and Sockets on page 74 for more details.
Accelerated sockets can be inspected 
in detail with the Onload onload_stackdump tool, 
which exposes considerably more information 
than the regular system tools. 
For details of onload_stackdump refer to onload_stackdump on page 324.

Signals

If an application receives a SIGSTOP signal, 
it is possible for the processing 
of network events to be stalled in an Onload stack 
used by the application. 
This happens 
if the application is holding a lock inside the stack 
when the application is stopped, 
and if the application remains stopped for a long time, 
this may cause TCP connections to time‐out.
	
A signal which terminates an application can prevent threads 
from exiting cleanly. 
Refer to Multithreaded Applications Termination on page 169 
for more information.
Undefined content may result 
when a signal handler uses the third argument (ucontext) 
and if the signal is postponed by Onload. 
To avoid this, 
use the Onload module option safe_signals_and_exit=0 
or use EF_SIGNALS_NOPOSTPONE to prevent specific signals 
being postponed by Onload.

Onload and IP_MULTICAST_TTL

Onload will act in accordance with RFC 791 
when it comes to the IP_MULTICAST_TTL setting. 
Using Onload, if IP_MULTICAST_TTL=0, 
packets will never be transmitted on the wire.
This differs from the Linux kernel 
where the following behavior has been observed:
Kernel ‐ IP_MULTICAST_TTL 0 
‐ if there is a local listener, packets will not be transmitted on the wire.
Kernel ‐ IP_MULTICAST_TTL 0 
‐ if there is NO local listener, packets will always be transmitted on the wire.

Source/Policy Based Routing

1.OpenOnload 201710 / EnterpriseOnload 6.0 / Cloud Onload 201811
The Onload 201710, EnterpriseOnload 6.0 and Cloud Onload 201811 	
releases include support 
for source based policy routing 
for unicast and multicast packets. 
The following are supported:
•source ip address
•destination ip address
•outgoing interface (SO_BINDTODEVICE)
•TOS (Type of Service)
Policy rules based on other criteria 
are not supported 
and will be ignored by Onload.
2.Earlier Onload versions
Earlier Onload versions do not support source based or policy based routing. 
Whereas the Linux kernel will select a route and interface 
based on routing metrics, 
Onload will select any of the valid routes and Onload interfaces 
to a destination that are available.
The EF_TCP_LISTEN_REPLIES_BACK environment variable 
provides a pseudo source‐based routing solution. 
This option forces a reply to an incoming SYN 
to ignore routes and reply to the originating network interface.

Enabling this option will allow new TCP connections to be setup, 
but does not guarantee 
that all replies from an Onloaded application will go 
via the receiving Solarflare interface 
‐ and some re‐ordering of the routing table may be needed 
to guarantee this OR an explicit route (to go via the Solarflare interface) 
should be added to the routing table.

Routing Table Metrics

Onload, from version 201606, 
introduced support for routing table metrics, 
therefore, 
if two entries in the routing table will route traffic 
to the destination address, 
the entry with the best metric will be selected 
even if that means routing over a non‐Solarflare interface.

Multipath routes

Onload does not support a multipath route simultaneously 
via Onload accelerated and non‐Onload‐accelerated interfaces. 
The paths in a multipath route should either all be acceleratable, 
or all be non‐acceleratable.

Reverse Path Filtering

Onload does not support Reverse Path Filtering. 
When Onload cannot route traffic to a remote endpoint 
over a Solarflare interface (no suitable route table entry), 
the traffic will be handled via the kernel.

SO_REUSEPORT

Onload vs. kernel behavior is described in Chapter 7 on page 90.

Thread Safe

Onload assumes 
that file descriptor modifications are thread‐safe 
and that file descriptors are not concurrently modified by different threads. 
Concurrent access should not cause problems. 
This is different from kernel behaviour 
and users should set EF_FDS_MT_SAFE=0 
if the application is not considered thread‐safe.
Similar consideration should be given 
when using epoll() where default concurrency control are disabled in Onload. 
Users should set EF_EPOLL_MT_SAFE=0.

Control of Duplicated Sockets

When a socket has been duplicated, for example, using fork(), 
and where the parent fd is controlled by the kernel, 
the child fd controlled by Onload. 
Changes by the kernel using fcntl() to modify flags 
such as O_NONBLOCK will not be reflected in the Onload socket.

UDP sockets shutdown()

When a kernel UDP socket is unconnected, 
a shutdown() call will prompt a blocking recv() operation on the socket 
to successfully complete. 
When an Onload UDP socket is unconnected, 
a shutdown() call does not successfully complete a blocking recv() call 
and thereafter the socket fd cannot be reused.
When a UDP socket is connected, 
kernel and Onload behavior is the same, 
a shutdown() call will prompt a blocking recv() operation 
to complete successfully

Limits to Acceleration

IP Fragmentation

Fragmented IP traffic is not accelerated 
by Onload on the receive side, 
and is instead received transparently via the kernel stack. 
IP fragmentation is rarely seen with TCP, 
because the TCP/IP stacks segment messages 
into MTU‐sized IP datagrams. 
With UDP, 
datagrams are fragmented by IP 
if they are too large for the configured MTU. 
Refer to Fragmented UDP on page 130 for a description of Onload behavior.

Broadcast Traffic

Broadcast sends and receives function as normal 
but will not be accelerated. 
Multicast traffic can be accelerated.

IPv6 Traffic

IPv6 traffic functions as normal but will not be accelerated.
If the kernel also does not support IPv6, 
the following error message is output:
sock_create(10, <1 or 2>, 0) failed (‐97)
where:
•‐97 is the error code EAFNOSUPPORT 
(Address family not supported by protocol)
•the other numbers indicate an IPv6 TCP or UDP socket.
One possible cause of this error is using Java, 
which often creates IPv6 sockets alongside IPv4 ones.

TCP NOP Options

Onload will silently discard packets 
that include IP header No Operation (NOP) options. 
Discards will not increment drop packet counters.
Onload will process packets 
that include NOP options in the TCP header, 
but the options themselves will be ignored.

Raw Sockets

Raw Socket sends and receives function 
as normal but will not be accelerated.

Socketpair and UNIX Domain Sockets

Onload will intercept, 
but does not accelerate the socketpair() system call. 
Sockets created with socketpair() will be handled by the kernel. 
Onload also does not accelerate UNIX domain sockets.

UDP sendfile()

The UDP sendfile()method is not currently accelerated by Onload. 
When an Onload accelerated application calls sendfile() 
this will be handled seamlessly by the kernel.

Statically Linked Applications

Onload will not accelerate statically linked applications. 
This is due to the method 
in which Onload intercepts libc function calls (using LD_PRELOAD).

Local Port Address

Onload is limited to OOF_LOCAL_ADDR_MAX number 
of local interface addresses. 
A local address can identify a physical port 
or a VLAN, 
and multiple addresses can be assigned to a single interface 
where each address contributes to the maximum value. 
Users can allocate additional local interface addresses 
by increasing the compile time constant OOF_LOCAL_ADDR_MAX 
in the /src/lib/efthrm/oof_impl.h file 
and rebuilding Onload. 
In onload‐201205 OOF_LOCAL_ADDR_MAX was replaced 
by the onload module option max_layer2_interfaces.

Bonding, Link aggregation

•Onload will only accelerate traffic over 802.3ad and active‐backup bonds.
•Onload will not accelerate traffic 
if a bond contains any slave interfaces 
that are not Solarflare network devices.
•Adding a non‐Solarflare network device to a bond 
that is currently accelerated by Onload 
may result in unexpected results such as connections being reset.
•Acceleration of bonded interfaces in Onload 
requires a kernel configured 
with sysfs support 
and a bonding module version of 3.0.0 or later.
In cases where Onload will not accelerate the traffic 
it will continue to work via the OS network stack.

VLANs

•Onload will only accelerate traffic over VLANs 
where the master device is either a Solarflare network device, 
or over a bonded interface that is accelerated. 
i.e. If the VLAN's master is accelerated, then so is the VLAN interface itself.
•Nested VLAN tags are not accelerated, but will function as normal.
•The ifconfig command will return inconsistent statistics 
on VLAN interfaces (not master interface).
•When a Solarflare VLAN tagged interface is subsequently placed in a bond, 
the interface will continue to be accelerated, 
but the bond is not accelerated.
•Using SFN7000, SFN8000 and X2 series adapters 
with the low‐latency firmware variant, 
the following limitation applies:
Hardware filters installed by Onload on the adapter 
will only act on the IP address and port, 
but not the VLAN identifier. 
Therefore if the same IP address:port combination 
exists on different VLAN interfaces, 
only the first interface to install the filter will receive the traffic.
This limitation does not apply to SFN7000, SFN8000 and X2 series adapters 
using the full‐feature firmware variant.
In cases where Onload will not accelerate the traffic 
it will continue to work via the OS network stack.
For more information and details 
and configuration options 
refer to the Solarflare Server Adapter User Guide section 
‘Setting Up VLANs’.

Ethernet Bridge Configuration

Onload does not currently support acceleration of interfaces 
added to an Ethernet bridge configured/added with the Linux brctl command.

TCP RTO During Overload Conditions

Using Onload, 
under very high load conditions 
an increased frequency of TCP retransmission timeouts (RTOs) 
might be observed. 
This has the potential to occur 
when a thread servicing the stack is descheduled 
by the CPU 
whilst still holding the stack lock 
thus preventing another thread from accessing/polling the stack. 
A stack not being serviced 
means that ACKs are not received in a timely manner for packets sent, 
resulting in RTOs for the unacknowledged packets 
and increased jitter on the Onload stack.
Enabling the per stack environment variable EF_INT_DRIVEN 
can reduce the likelihood of this behavior 
and reduce jitter by ensuring the stack is serviced promptly. 

TCP with Jumbo Frames

When using jumbo frames with TCP, 
Onload will limit the MSS to 2048 bytes 
to ensure that segments do not exceed the size 
of internal packet buffers.
This should present no problems 
unless the remote end of a connection is unable 
to negotiate this lower MSS value.

Transmission Path ‐ Packet Loss

Occasionally Onload needs to send a packet, 
which would normally be accelerated, via the kernel. 
This occurs when there is no destination address entry 
in the ARP table 
or to prevent an ARP table entry from becoming stale.
By default, the Linux sysctl, unres_qlen, 
will enqueue 3 packets per unresolved address 
when waiting for an ARP reply, 
and on a server subject to a very high UDP or TCP traffic load 
this can result in packet loss 
on the transmit path and packets being discarded.
The unres_qlen value can be identified 
using the following command:
sysctl ‐a | grep unres_qlen
net.ipv4.neigh.eth2.unres_qlen = 3
net.ipv4.neigh.eth0.unres_qlen = 3
net.ipv4.neigh.lo.unres_qlen = 3
net.ipv4.neigh.default.unres_qlen = 3
Changes to the queue lengths can be made permanent 
in the /etc/sysctl.conf file. 
Solarflare recommend setting the unres_qlen value to at least 50.
If packet discards are suspected, 
this extremely rare condition can be indicated 
by the cp_defer counter 
produced by the onload_stackdump 
lots command on UDP sockets 
or from the unresolved_discards counter 
in the Linux /proc/net/stat arp_cache file.

TCP ‐ Unsupported Routing, Timed out Connections

If TCP packets are received over an Onload accelerated interface, 
but Onload cannot find a suitable Onload accelerated return route, 
no response will be sent 
resulting in the connection timing out.

Application Clustering

For details of Application Clustering, 
refer to Application Clustering on page 90.
•Onload matches the Linux kernel implementation 
such that clustering is not supported 
for multicast traffic 
and where setting of SO_REUSEPORT has the same effect 
as SO_REUSEADDR.
•Calling connect() on a TCP socket 
which was previously subject to a bind() call is not currently supported. 
This will be supported in a future release.
•An application cluster will not persist over adapter/server/driver reset. 	
Before restarting the server 
or resetting the adapter the Onload applications should be terminated.
•The environment variable EF_CLUSTER_RESTART determines
 the behavior of the cluster 
 when the application process is restarted 
 ‐ refer to EF_CLUSTER_RESTART in Parameter Reference on page 210.
•If the number of sockets in a cluster is less than EF_CLUSTER_SIZE, 
a portion of the received traffic will be lost.
•There is little benefit 
when clustering involves a TCP loopback listening socket 
as connections will not be distributed amongst all threads. 
A non‐loopback listening socket 
‐ which might occasionally get some loopback connections 
can benefit from Application Clustering.

Duplicate IP or MAC addresses

Onload does not support multiple interfaces 
with the same IP address or MAC

epoll ‐ Known Issues

Onload supports different implementations of epoll 
controlled by the EF_UL_EPOLL environment variable 
‐ see Multiplexed I/O on page 83 for configuration details.
There are various limitations and differences 
in Onload vs. kernel behavior 
‐ refer to Chapter 7 on page 83 for details.
•When using EF_UL_EPOLL=1 or 3, 
it has been identified 
that the behavior of epoll_wait() differs from the kernel 
when the EPOLLONESHOT event is requested, 
resulting in two ‘wakeups’ being observed, 
one from the kernel 
and one from Onload. 
This behavior is apparent 
on SOCK_DGRAM and SOCK_STREAM sockets 
for all combinations of EPOLLONESHOT, EPOLLIN and EPOLLOUT events. 	
This applies for all types of accelerated sockets. 
EF_EPOLL_CTL_FAST is enabled 
by default 
and this modifies the semantics of epoll. 
In particular, 
it buffers up calls to epoll_ctl() 
and only applies them when epoll_wait() is called. 
This can break applications 
that do epoll_wait() in one thread 
and epoll_ctl() in another thread. 
The issue only affects EF_UL_EPOLL=2 
and the solution is to set EF_EPOLL_CTL_FAST=0 if this is a problem. 
The described condition does not occur 
if EF_UL_EPOLL=1 or EF_UL_EPOLL=3.
•When EF_EPOLL_CTL_FAST is enabled 
and an application is testing the readiness of an epoll file descriptor 
without actually calling epoll_wait(), 
for example by doing epoll within epoll() or epoll within select(), 
if one thread is calling select() or epoll_wait() 
and another thread is doing epoll_ctl(), 
then EF_EPOLL_CTL_FAST should be disabled. 
This applies 
when using EF_UL_EPOLL 1, 2 or 3.
If the application is monitoring the state of the epoll file descriptor indirectly, 
e.g. by monitoring the epoll fd with poll, 
then EF_EPOLL_CTL_FAST can cause issues and should be set to zero.
To force Onload to follow the kernel behavior 
when using the epoll_wait() call, 
the following variables should be set:
EF_UL_EPOLL=2
EF_EPOLL_CTL_FAST=0
EF_EPOLL_CTL_HANDOFF=0 (when using EF_UL_EPOLL=1)
•A socket should be removed 
from an epoll set 
only when all references to the socket are closed.
With EF_UL_EPOLL=1 (default) or EF_UL_EPOLL=3, 
a socket is removed from the epoll set if the file descriptor is closed, 
even if other references to the socket exist. 
This can cause problems 
if file descriptors are duplicated 
using dup(), dup2() or fork(). 
For example:
s = socket();
s2 = dup(s);
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, s, ...);
close(s); 
/* socket referenced by s is removed from epoll set when using onload */
Workaround is set EF_UL_EPOLL=2.
•When Onload is unable to accelerate a connected socket, 
e.g. because no route to the destination exists 
which uses a Solarflare interface, 
the socket will be handed off to the kernel 
and is removed from the epoll set. 
Because the socket is no longer in the epoll set, 
attempts to modify the socket with epoll_ctl() will fail 
with the ENOENT (descriptor not present) error. 
The described condition does not occur 
if EF_UL_EPOLL=1 or 3.
•If an epoll file descriptor 
is passed to the read() or write() functions 
these will return a different errorcode than that reported 
by the kernel stack. This issue exists for all implementations of epoll.
•When EPOLLET is used and the event is ready, 
epoll_wait() is triggered by ANY event on the socket 
instead of the requested event. This issue should not affect application 

correctness.

•Users should be aware 
that if a server is overclocked the epoll_wait() timeout value 
will increase as CPU MHz increases 
resulting in unexpected timeout values. 
This has been observed on Intel based systems 
and when the Onload epoll implementation is EF_UL_EPOLL=1 or 3. 
Using EF_UL_EPOLL=2 this behavior is not observed.
•On a spinning thread, 
if epoll acceleration is disabled by setting EF_UL_EPOLL=0, 
sockets on this thread will be handed off to the kernel,
but latency will be worse than expected kernel socket latency.		
•To ensure that non‐accelerated file descriptors are checked 
in poll and select functions, 
the following options should be disabled (set to zero):
•EF_SELECT_FAST and EF_POLL_FAST
•When using poll() and select() calls, to ensure that non‐accelerated file d			
escriptors are checked 
when there are no events on any accelerated descriptors, 
set the following options:
•EF_POLL_FAST_USEC and EF_SELECT_FAST_USEC, 
setting both to zero.

Nested Epoll Sets

When an epoll set includes accelerated sockets 
and is nested inside another epoll set, 
the outer set may [1] not always get notified 
about socket readiness or [2] after a socket becomes ready, 
the state cannot be cleared. 
This limitation is known to affect EF_UL_EPOLL=3.

Spinning ‐ Timing Issues

Onload users should consider
that as different software is being run, 
timings will be affected 
which can result in unexpected scheduling behaviour 
and memory use. 
Spinning applications, in particular, require a dedicated core per spinning Onload thread.

Configuration Issues

Mixed Adapters Sharing a Broadcast Domain
Onload should not be used 
when Solarflare and non‐Solarflare interfaces 
in the same network server are configured 
in the same broadcast domain1 
as depicted by the following diagram.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

raindayinrain

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值