I've been looking at Linux tuning params and see some configs where SACK is turned off. Can anyone explain this?
This would be tuning for a busy web server.
A basic TCP ACK says "I received all bytes up to X." Selective ACK allows you to say "I received bytes X-Y, and V-Z."
So, for instance, if a host sent you 10,000 bytes and bytes 3000-5000 were lost in transit, ACK would say "I got everything up to 3000." The other end would have to send bytes 3001-10000 again. SACK could say "I got 1000-2999, and 5001-10000" and the host would
just send the 3000-5000.
This is great over a high bandwidth, lossy (or high delay) link. The problem is that it can cause severe performance issues in specific circumstances. Normal TCP ACKs will make the server treat a high-bandwidth, lossy connection with kid gloves (send 500 bytes,
wait, send 500 bytes, wait, etc). SACK lets it adapt to the high delay because it knows exactly how many packets were actually lost.
Here is where bad things can happen. An attacker can force your server to keep a massive retransmission queue for a long time, then process that whole damn thing over and over and over again. This can peg the CPU, eat up RAM, and consume more bandwidth than
it should. In a nutshell, a lightweight system can initiate a DoS against a beefier server.
If your server is robust and doesn't serve large files, you're pretty well insulated against this.
If you're mostly serving an intranet or other low-latency group of users, SACK buys you nothing and can be turned off for security reasons with no performance loss.
If you're on a low-bandwidth link (say 1Mbps or less as a completely arbitrary rule of thumb), SACK can cause problems in normal operations by saturating your connection and should be turned off.
Ultimately, it's up to you. Consider what you're serving, to whom, from what, and weigh the degree of your risk against the performance effects of SACK.
There is a great overview of SACK and its vulnerability here.
Another reason that TCP SACK is often disabled is that there is an amazing amount of network gear out there that fails to handle this option correctly. We see this all the time with a high-speed file transfer product that we provide that uses TCP. The most
common issue is that of gateway devices that do things like randomize sequence numbers for TCP packets transiting through the device from internal networks to external, but that don't "un-randomize" the TCP SACK options that might be sent from the remote end.
If the actual SACK values are not translated back to the proper values by these devices, then then TCP session will never complete in the face of packet loss when the remote end tries to use SACK to get the selective ACK benefits.
Probably this would be less of an issue if people were to more aggressively apply preventive software maintenance to this gear, but they tend not to.
I can confirm from bitter experience that tcp_sack = 1 causes stalled data transfer over sftp/rsync/scp etc with files in excess of around 12mb when using certain Cisco ASA firewall appliances.
EVERY Time it would be stalled.
We were transferring over a dedicated 100mbps link between host A and host B in two different data centres, both using cisco firewall and switch hardware with centos.
This can be mitigated somewhat by modifying buffer sizes - e.g. I could not transfer 1GB file via sftp from host A to host B unless I set the sftp buffer to 2048, but I could regardless if host B was pulling the file from A.
Experiments with the same file using rsync and send/receive buffer tuning allowed me to get up yo around 70mb of a 1GB file pushed from A to B.
However, the ultimate answer was to disable tcp_sack on host A. Initially by setting tcp_sack = 0 in the kernel on-the-fly - but ultimately - I added it to my /etc/sysctl.conf