Iptables Tutorial 1.2.1(2)

Chapter 7. The state machine

This chapter will deal with the state machine and explain it in detail. After reading through it, you should have a complete understanding of how the State machine works. We will also go through a large set of examples on how states are dealt with within the state machine itself. These should clarify everything in practice.


7.1. Introduction

The state machine is a special part within iptables that should really not be called the state machine at all, since it is really a connection tracking machine. However, most people recognize it under the first name. Throughout this chapter I will use these names more or less as if they were synonymous. This should not be overly confusing. Connection tracking is done to let the Netfilter framework know the state of a specific connection. Firewalls that implement this are generally called stateful firewalls. A stateful firewall is generally much more secure than non-stateful firewalls since it allows us to write much tighter rule-sets.

Within iptables, packets can be related to tracked connections in four different so called states. These are known as NEWESTABLISHEDRELATED and INVALID. We will discuss each of these in more depth later. With the --state match we can easily control who or what is allowed to initiate new sessions.

All of the connection tracking is done by special framework within the kernel called conntrack. conntrack may be loaded either as a module, or as an internal part of the kernel itself. Most of the time, we need and want more specific connection tracking than the default conntrack engine can maintain. Because of this, there are also more specific parts of conntrack that handles the TCP, UDP or ICMP protocols among others. These modules grab specific, unique, information from the packets, so that they may keep track of each stream of data. The information that conntrack gathers is then used to tell conntrack in which state the stream is currently in. For example, UDP streams are, generally, uniquely identified by their destination IP address, source IP address, destination port and source port.

In previous kernels, we had the possibility to turn on and off defragmentation. However, since iptables and Netfilter were introduced and connection tracking in particular, this option was gotten rid of. The reason for this is that connection tracking can not work properly without defragmenting packets, and hence defragmenting has been incorporated into conntrack and is carried out automatically. It can not be turned off, except by turning off connection tracking. Defragmentation is always carried out if connection tracking is turned on.

All connection tracking is handled in the PREROUTING chain, except locally generated packets which are handled in the OUTPUT chain. What this means is that iptables will do all recalculation of states and so on within the PREROUTING chain. If we send the initial packet in a stream, the state gets set to NEW within the OUTPUT chain, and when we receive a return packet, the state gets changed in the PREROUTING chain to ESTABLISHED, and so on. If the first packet is not originated by ourself, the NEW state is set within the PREROUTING chain of course. So, all state changes and calculations are done within the PREROUTING and OUTPUT chains of the nat table.


7.2. The conntrack entries

Let's take a brief look at a conntrack entry and how to read them in /proc/net/ip_conntrack. This gives a list of all the current entries in your conntrack database. If you have the ip_conntrack module loaded, a cat of /proc/net/ip_conntrack might look like:

tcp 6 117 SYN_SENT src=192.168.1.6 dst=192.168.1.9 sport=32775 \
dport=22 [UNREPLIED] src=192.168.1.9 dst=192.168.1.6 sport=22 \
dport=32775 [ASSURED] use=2

This example contains all the information that the conntrack module maintains to know which state a specific connection is in. First of all, we have a protocol, which in this case is tcp. Next, the same value in normal decimal coding. After this, we see how long this conntrack entry has to live. This value is set to 117 seconds right now and is decremented regularly until we see more traffic. This value is then reset to the default value for the specific state that it is in at that relevant point of time. Next comes the actual state that this entry is in at the present point of time. In the above mentioned case we are looking at a packet that is in the SYN_SENT state. The internal value of a connection is slightly different from the ones used externally with iptables. The value SYN_SENT tells us that we are looking at a connection that has only seen a TCP SYN packet in one direction. Next, we see the source IP address, destination IP address, source port and destination port. At this point we see a specific keyword that tells us that we have seen no return traffic for this connection. Lastly, we see what we expect of return packets. The information details the source IP address and destination IP address (which are both inverted, since the packet is to be directed back to us). The same thing goes for the source port and destination port of the connection. These are the values that should be of any interest to us.

The connection tracking entries may take on a series of different values, all specified in the conntrack headers available in linux/include/netfilter-ipv4/ip_conntrack*.h files. These values are dependent on which sub-protocol of IP we use. TCP, UDP or ICMP protocols take specific default values as specified in linux/include/netfilter-ipv4/ip_conntrack.h. We will look closer at this when we look at each of the protocols; however, we will not use them extensively through this chapter, since they are not used outside of the conntrack internals. Also, depending on how this state changes, the default value of the time until the connection is destroyed will also change.

 

Recently there was a new patch made available in iptables patch-o-matic, called tcp-window-tracking. This patch adds, among other things, all of the above timeouts to special sysctl variables, which means that they can be changed on the fly, while the system is still running. Hence, this makes it unnecessary to recompile the kernel every time you want to change the timeouts.

These can be altered via using specific system calls available in the /proc/sys/net/ipv4/netfilter directory. You should in particular look at the /proc/sys/net/ipv4/netfilter/ip_ct_* variables.

When a connection has seen traffic in both directions, the conntrack entry will erase the [UNREPLIED] flag, and then reset it. The entry that tells us that the connection has not seen any traffic in both directions, will be replaced by the [ASSURED] flag, to be found close to the end of the entry. The [ASSURED] flag tells us that this connection is assured and that it will not be erased if we reach the maximum possible tracked connections. Thus, connections marked as [ASSURED] will not be erased, contrary to the non-assured connections (those not marked as [ASSURED]). How many connections that the connection tracking table can hold depends upon a variable that can be set through the ip-sysctl functions in recent kernels. The default value held by this entry varies heavily depending on how much memory you have. On 128 MB of RAM you will get 8192 possible entries, and at 256 MB of RAM, you will get 16376 entries. You can read and set your settings through the /proc/sys/net/ipv4/ip_conntrack_max setting.

A different way of doing this, that is more efficient, is to set the hashsize option to the ip_conntrack module once this is loaded. Under normal circumstances ip_conntrack_max equals 8 * hashsize. In other words, setting the hashsize to 4096 will result in ip_conntrack_max being set to 32768 conntrack entries. An example of this would be:

work3:/home/blueflux# modprobe ip_conntrack hashsize=4096
work3:/home/blueflux# cat /proc/sys/net/ipv4/ip_conntrack_max
32768
work3:/home/blueflux#

7.3. User-land states

As you have seen, packets may take on several different states within the kernel itself, depending on what protocol we are talking about. However, outside the kernel, we only have the 4 states as described previously. These states can mainly be used in conjunction with the state match which will then be able to match packets based on their current connection tracking state. The valid states are NEWESTABLISHEDRELATED and INVALID. The following table will briefly explain each possible state.

Table 7-1. User-land states

StateExplanation
NEWThe NEW state tells us that the packet is the first packet that we see. This means that the first packet that the conntrack module sees, within a specific connection, will be matched. For example, if we see a SYN packet and it is the first packet in a connection that we see, it will match. However, the packet may as well not be a SYN packet and still be considered NEW. This may lead to certain problems in some instances, but it may also be extremely helpful when we need to pick up lost connections from other firewalls, or when a connection has already timed out, but in reality is not closed.
ESTABLISHEDThe ESTABLISHED state has seen traffic in both directions and will then continuously match those packets. ESTABLISHED connections are fairly easy to understand. The only requirement to get into an ESTABLISHED state is that one host sends a packet, and that it later on gets a reply from the other host. The NEW state will upon receipt of the reply packet to or through the firewall change to the ESTABLISHED state. ICMP reply messages can also be considered as ESTABLISHED, if we created a packet that in turn generated the reply ICMP message.
RELATEDThe RELATED state is one of the more tricky states. A connection is considered RELATED when it is related to another already ESTABLISHED connection. What this means, is that for a connection to be considered as RELATED, we must first have a connection that is considered ESTABLISHED. The ESTABLISHED connection will then spawn a connection outside of the main connection. The newly spawned connection will then be considered RELATED, if the conntrack module is able to understand that it is RELATED. Some good examples of connections that can be considered as RELATED are the FTP-data connections that are considered RELATED to the FTP control port, and the DCC connections issued through IRC. This could be used to allow ICMP error messages, FTP transfers and DCC's to work properly through the firewall. Do note that most TCP protocols and some UDP protocols that rely on this mechanism are quite complex and send connection information within the payload of the TCP or UDP data segments, and hence require special helper modules to be correctly understood.
INVALIDThe INVALID state means that the packet can't be identified or that it does not have any state. This may be due to several reasons, such as the system running out of memory or ICMP error messages that do not respond to any known connections. Generally, it is a good idea to DROP everything in this state.

These states can be used together with the --state match to match packets based on their connection tracking state. This is what makes the state machine so incredibly strong and efficient for our firewall. Previously, we often had to open up all ports above 1024 to let all traffic back into our local networks again. With the state machine in place this is not necessary any longer, since we can now just open up the firewall for return traffic and not for all kinds of other traffic.


7.4. TCP connections

In this section and the upcoming ones, we will take a closer look at the states and how they are handled for each of the three basic protocols TCP, UDP and ICMP. Also, we will take a closer look at how connections are handled per default, if they can not be classified as either of these three protocols. We have chosen to start out with the TCP protocol since it is a stateful protocol in itself, and has a lot of interesting details with regard to the state machine in iptables.

A TCP connection is always initiated with the 3-way handshake, which establishes and negotiates the actual connection over which data will be sent. The whole session is begun with a SYN packet, then a SYN/ACK packet and finally an ACK packet to acknowledge the whole session establishment. At this point the connection is established and able to start sending data. The big problem is, how does connection tracking hook up into this? Quite simply really.

As far as the user is concerned, connection tracking works basically the same for all connection types. Have a look at the picture below to see exactly what state the stream enters during the different stages of the connection. As you can see, the connection tracking code does not really follow the flow of the TCP connection, from the users viewpoint. Once it has seen one packet(the SYN), it considers the connection as NEW. Once it sees the return packet(SYN/ACK), it considers the connection as ESTABLISHED. If you think about this a second, you will understand why. With this particular implementation, you can allow NEW and ESTABLISHED packets to leave your local network, only allow ESTABLISHED connections back, and that will work perfectly. Conversely, if the connection tracking machine were to consider the whole connection establishment as NEW, we would never really be able to stop outside connections to our local network, since we would have to allow NEW packets back in again. To make things more complicated, there are a number of other internal states that are used for TCP connections inside the kernel, but which are not available for us in User-land. Roughly, they follow the state standards specified within RFC 793 - Transmission Control Protocol on pages 21-23. We will consider these in more detail further along in this section.

 

As you can see, it is really quite simple, seen from the user's point of view. However, looking at the whole construction from the kernel's point of view, it's a little more difficult. Let's look at an example. Consider exactly how the connection states change in the /proc/net/ip_conntrack table. The first state is reported upon receipt of the first SYN packet in a connection.

tcp 6 117 SYN_SENT src=192.168.1.5 dst=192.168.1.35 sport=1031 \
dport=23 [UNREPLIED] src=192.168.1.35 dst=192.168.1.5 sport=23 \
dport=1031 use=1

As you can see from the above entry, we have a precise state in which a SYN packet has been sent, (the SYN_SENT flag is set), and to which as yet no reply has been sent (witness the [UNREPLIED] flag). The next internal state will be reached when we see another packet in the other direction.

tcp 6 57 SYN_RECV src=192.168.1.5 dst=192.168.1.35 sport=1031 \
dport=23 src=192.168.1.35 dst=192.168.1.5 sport=23 dport=1031 \
use=1

Now we have received a corresponding SYN/ACK in return. As soon as this packet has been received, the state changes once again, this time to SYN_RECV. SYN_RECV tells us that the original SYN was delivered correctly and that the SYN/ACK return packet also got through the firewall properly. Moreover, this connection tracking entry has now seen traffic in both directions and is hence considered as having been replied to. This is not explicit, but rather assumed, as was the [UNREPLIED] flag above. The final step will be reached once we have seen the final ACK in the 3-way handshake.

tcp 6 431999 ESTABLISHED src=192.168.1.5 dst=192.168.1.35 \
sport=1031 dport=23 src=192.168.1.35 dst=192.168.1.5 \
sport=23 dport=1031 [ASSURED] use=1

In the last example, we have gotten the final ACK in the 3-way handshake and the connection has entered the ESTABLISHED state, as far as the internal mechanisms of iptables are aware. Normally, the stream will be ASSURED by now.

A connection may also enter the ESTABLISHED state, but not be [ASSURED]. This happens if we have connection pickup turned on (Requires the tcp-window-tracking patch, and the ip_conntrack_tcp_loose to be set to 1 or higher). The default, without the tcp-window-tracking patch, is to have this behaviour, and is not changeable.

When a TCP connection is closed down, it is done in the following way and takes the following states.

 

 

As you can see, the connection is never really closed until the last ACK is sent. Do note that this picture only describes how it is closed down under normal circumstances. A connection may also, for example, be closed by sending a RST(reset), if the connection were to be refused. In this case, the connection would be closed down immediately.

When the TCP connection has been closed down, the connection enters the TIME_WAIT state, which is per default set to 2 minutes. This is used so that all packets that have gotten out of order can still get through our rule-set, even after the connection has already closed. This is used as a kind of buffer time so that packets that have gotten stuck in one or another congested router can still get to the firewall, or to the other end of the connection.

If the connection is reset by a RST packet, the state is changed to CLOSE. This means that the connection per default has 10 seconds before the whole connection is definitely closed down. RST packets are not acknowledged in any sense, and will break the connection directly. There are also other states than the ones we have told you about so far. Here is the complete list of possible states that a TCP stream may take, and their timeout values.

Table 7-2. Internal states

StateTimeout value
NONE30 minutes
ESTABLISHED5 days
SYN_SENT2 minutes
SYN_RECV60 seconds
FIN_WAIT2 minutes
TIME_WAIT2 minutes
CLOSE10 seconds
CLOSE_WAIT12 hours
LAST_ACK30 seconds
LISTEN>2 minutes

These values are most definitely not absolute. They may change with kernel revisions, and they may also be changed via the proc file-system in the /proc/sys/net/ipv4/netfilter/ip_ct_tcp_* variables. The default values should, however, be fairly well established in practice. These values are set in seconds. Early versions of the patch used jiffies (which was a bug).

 

Also note that the User-land side of the state machine does not look at TCP flags (i.e., RST, ACK, and SYN are flags) set in the TCP packets. This is generally bad, since you may want to allow packets in the NEW state to get through the firewall, but when you specify the NEW flag, you will in most cases mean SYN packets.

This is not what happens with the current state implementation; instead, even a packet with no bit set or an ACK flag, will count as NEW. This can be used for redundant firewalling and so on, but it is generally extremely bad on your home network, where you only have a single firewall. To get around this behavior, you could use the command explained in the State NEW packets but no SYN bit set section of the Common problems and questions appendix. Another way is to install the tcp-window-tracking extension from patch-o-matic, and set the /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_loose to zero, which will make the firewall drop all NEW packets with anything but the SYN flag set.


7.5. UDP connections

UDP connections are in themselves not stateful connections, but rather stateless. There are several reasons why, mainly because they don't contain any connection establishment or connection closing; most of all they lack sequencing. Receiving two UDP datagrams in a specific order does not say anything about the order in which they were sent. It is, however, still possible to set states on the connections within the kernel. Let's have a look at how a connection can be tracked and how it might look in conntrack.

 

As you can see, the connection is brought up almost exactly in the same way as a TCP connection. That is, from the user-land point of view. Internally, conntrack information looks quite a bit different, but intrinsically the details are the same. First of all, let's have a look at the entry after the initial UDP packet has been sent.

udp 17 20 src=192.168.1.2 dst=192.168.1.5 sport=137 dport=1025 \
[UNREPLIED] src=192.168.1.5 dst=192.168.1.2 sport=1025 \
dport=137 use=1

As you can see from the first and second values, this is an UDP packet. The first is the protocol name, and the second is protocol number. This is just the same as for TCP connections. The third value marks how many seconds this state entry has to live. After this, we get the values of the packet that we have seen and the future expectations of packets over this connection reaching us from the initiating packet sender. These are the source, destination, source port and destination port. At this point, the [UNREPLIED] flag tells us that there's so far been no response to the packet. Finally, we get a brief list of the expectations for returning packets. Do note that the latter entries are in reverse order to the first values. The timeout at this point is set to 30 seconds, as per default.

udp 17 170 src=192.168.1.2 dst=192.168.1.5 sport=137 \
dport=1025 src=192.168.1.5 dst=192.168.1.2 sport=1025 \
dport=137 [ASSURED] use=1

At this point the server has seen a reply to the first packet sent out and the connection is now considered as ESTABLISHED. This is not shown in the connection tracking, as you can see. The main difference is that the [UNREPLIED] flag has now gone. Moreover, the default timeout has changed to 180 seconds - but in this example that's by now been decremented to 170 seconds - in 10 seconds' time, it will be 160 seconds. There's one thing that's missing, though, and can change a bit, and that is the [ASSURED] flag described above. For the [ASSURED] flag to be set on a tracked connection, there must have been a legitimate reply packet to the NEW packet.

udp 17 175 src=192.168.1.5 dst=195.22.79.2 sport=1025 \
dport=53 src=195.22.79.2 dst=192.168.1.5 sport=53 \
dport=1025 [ASSURED] use=1

At this point, the connection has become assured. The connection looks exactly the same as the previous example. If this connection is not used for 180 seconds, it times out. 180 Seconds is a comparatively low value, but should be sufficient for most use. This value is reset to its full value for each packet that matches the same entry and passes through the firewall, just the same as for all of the internal states.


7.6. ICMP connections

ICMP packets are far from a stateful stream, since they are only used for controlling and should never establish any connections. There are four ICMP types that will generate return packets however, and these have 2 different states. These ICMP messages can take the NEW and ESTABLISHED states. The ICMP types we are talking about are Echo request and reply, Timestamp request and reply, Information request and reply and finally Address mask request and reply. Out of these, the timestamp request and information request are obsolete and could most probably just be dropped. However, the Echo messages are used in several setups such as pinging hosts. Address mask requests are not used often, but could be useful at times and worth allowing. To get an idea of how this could look, have a look at the following image.

 

As you can see in the above picture, the host sends an echo request to the target, which is considered as NEW by the firewall. The target then responds with a echo reply which the firewall considers as state ESTABLISHED. When the first echo request has been seen, the following state entry goes into the ip_conntrack.

icmp 1 25 src=192.168.1.6 dst=192.168.1.10 type=8 code=0 \
id=33029 [UNREPLIED] src=192.168.1.10 dst=192.168.1.6 \
type=0 code=0 id=33029 use=1

This entry looks a little bit different from the standard states for TCP and UDP as you can see. The protocol is there, and the timeout, as well as source and destination addresses. The problem comes after that however. We now have 3 new fields called type, code and id. They are not special in any way, the type field contains the ICMP type and the code field contains the ICMP code. These are all available in ICMP types appendix. The final id field, contains the ICMP ID. Each ICMP packet gets an ID set to it when it is sent, and when the receiver gets the ICMP message, it sets the same ID within the new ICMP message so that the sender will recognize the reply and will be able to connect it with the correct ICMP request.

The next field, we once again recognize as the [UNREPLIED] flag, which we have seen before. Just as before, this flag tells us that we are currently looking at a connection tracking entry that has seen only traffic in one direction. Finally, we see the reply expectation for the reply ICMP packet, which is the inversion of the original source and destination IP addresses. As for the type and code, these are changed to the correct values for the return packet, so an echo request is changed to echo reply and so on. The ICMP ID is preserved from the request packet.

The reply packet is considered as being ESTABLISHED, as we have already explained. However, we can know for sure that after the ICMP reply, there will be absolutely no more legal traffic in the same connection. For this reason, the connection tracking entry is destroyed once the reply has traveled all the way through the Netfilter structure.

In each of the above cases, the request is considered as NEW, while the reply is considered as ESTABLISHED. Let's consider this more closely. When the firewall sees a request packet, it considers it as NEW. When the host sends a reply packet to the request it is considered ESTABLISHED.

 

Note that this means that the reply packet must match the criterion given by the connection tracking entry to be considered as established, just as with all other traffic types.

ICMP requests has a default timeout of 30 seconds, which you can change in the /proc/sys/net/ipv4/netfilter/ip_ct_icmp_timeout entry. This should in general be a good timeout value, since it will be able to catch most packets in transit.

Another hugely important part of ICMP is the fact that it is used to tell the hosts what happened to specific UDP and TCP connections or connection attempts. For this simple reason, ICMP replies will very often be recognized as RELATED to original connections or connection attempts. A simple example would be the ICMP Host unreachable or ICMP Network unreachable. These should always be spawned back to our host if it attempts an unsuccessful connection to some other host, but the network or host in question could be down, and hence the last router trying to reach the site in question will reply with an ICMP message telling us about it. In this case, the ICMP reply is considered as a RELATED packet. The following picture should explain how it would look.

 

 

In the above example, we send out a SYN packet to a specific address. This is considered as a NEW connection by the firewall. However, the network the packet is trying to reach is unreachable, so a router returns a network unreachable ICMP error to us. The connection tracking code can recognize this packet as RELATED. thanks to the already added tracking entry, so the ICMP reply is correctly sent to the client which will then hopefully abort. Meanwhile, the firewall has destroyed the connection tracking entry since it knows this was an error message.

The same behavior as above is experienced with UDP connections if they run into any problem like the above. All ICMP messages sent in reply to UDP connections are considered as RELATED. Consider the following image.

 

 

This time an UDP packet is sent to the host. This UDP connection is considered as NEW. However, the network is administratively prohibited by some firewall or router on the way over. Hence, our firewall receives a ICMP Network Prohibited in return. The firewall knows that this ICMP error message is related to the already opened UDP connection and sends it as a RELATED packet to the client. At this point, the firewall destroys the connection tracking entry, and the client receives the ICMP message and should hopefully abort.


7.7. Default connections

In certain cases, the conntrack machine does not know how to handle a specific protocol. This happens if it does not know about that protocol in particular, or doesn't know how it works. In these cases, it goes back to a default behavior. The default behavior is used on, for example, NETBLT, MUX and EGP. This behavior looks pretty much the same as the UDP connection tracking. The first packet is considered NEW, and reply traffic and so forth is considered ESTABLISHED.

When the default behavior is used, all of these packets will attain the same default timeout value. This can be set via the /proc/sys/net/ipv4/netfilter/ip_ct_generic_timeout variable. The default value here is 600 seconds, or 10 minutes. Depending on what traffic you are trying to send over a link that uses the default connection tracking behavior, this might need changing. Especially if you are bouncing traffic through satellites and such, which can take a long time.


7.8. Complex protocols and connection tracking

Certain protocols are more complex than others. What this means when it comes to connection tracking, is that such protocols may be harder to track correctly. Good examples of these are the ICQ, IRC and FTP protocols. Each and every one of these protocols carries information within the actual data payload of the packets, and hence requires special connection tracking helpers to enable it to function correctly.

This is a list of the complex protocols that has support inside the linux kernel, and which kernel version it was introduced in.

Table 7-3. Complex protocols support

Protocol nameKernel versions
FTP2.3
IRC2.3
TFTP2.5
Amanda2.5
  • FTP

  • IRC

  • TFTP

Let's take the FTP protocol as the first example. The FTP protocol first opens up a single connection that is called the FTP control session. When we issue commands through this session, other ports are opened to carry the rest of the data related to that specific command. These connections can be done in two ways, either actively or passively. When a connection is done actively, the FTP client sends the server a port and IP address to connect to. After this, the FTP client opens up the port and the server connects to that specified port from a random unprivileged port (>1024) and sends the data over it.

The problem here is that the firewall will not know about these extra connections, since they were negotiated within the actual payload of the protocol data. Because of this, the firewall will be unable to know that it should let the server connect to the client over these specific ports.

The solution to this problem is to add a special helper to the connection tracking module which will scan through the data in the control connection for specific syntaxes and information. When it runs into the correct information, it will add that specific information as RELATED and the server will be able to track the connection, thanks to that RELATED entry. Consider the following picture to understand the states when the FTP server has made the connection back to the client.

 

Passive FTP works the opposite way. The FTP client tells the server that it wants some specific data, upon which the server replies with an IP address to connect to and at what port. The client will, upon receipt of this data, connect to that specific port, from its own port 20(the FTP-data port), and get the data in question. If you have an FTP server behind your firewall, you will in other words require this module in addition to your standard iptables modules to let clients on the Internet connect to the FTP server properly. The same goes if you are extremely restrictive to your users, and only want to let them reach HTTP and FTP servers on the Internet and block all other ports. Consider the following image and its bearing on Passive FTP.

 

 

Some conntrack helpers are already available within the kernel itself. More specifically, the FTP and IRC protocols have conntrack helpers as of writing this. If you can not find the conntrack helpers that you need within the kernel itself, you should have a look at the patch-o-matic tree within user-land iptables. The patch-o-matic tree may contain more conntrack helpers, such as for the ntalk or H.323 protocols. If they are not available in the patch-o-matic tree, you have a number of options. Either you can look at the CVS source of iptables, if it has recently gone into that tree, or you can contact the Netfilter-devel mailing list and ask if it is available. If it is not, and there are no plans for adding it, you are left to your own devices and would most probably want to read the Rusty Russell's Unreliable Netfilter Hacking HOW-TO which is linked from the Other resources and links appendix.

Conntrack helpers may either be statically compiled into the kernel, or as modules. If they are compiled as modules, you can load them with the following command

modprobe ip_conntrack_ftp
modprobe ip_conntrack_irc
modprobe ip_conntrack_tftp
modprobe ip_conntrack_amanda

Do note that connection tracking has nothing to do with NAT, and hence you may require more modules if you are NAT'ing connections as well. For example, if you were to want to NAT and track FTP connections, you would need the NAT module as well. All NAT helpers starts with ip_nat_ and follow that naming convention; so for example the FTP NAT helper would be named ip_nat_ftp and the IRC module would be named ip_nat_irc. The conntrack helpers follow the same naming convention, and hence the IRC conntrack helper would be named ip_conntrack_irc, while the FTP conntrack helper would be named ip_conntrack_ftp.


Chapter 8. Saving and restoring large rule-sets

The iptables package comes with two more tools that are very useful, specially if you are dealing with larger rule-sets. These two tools are called iptables-save and iptables-restore and are used to save and restore rule-sets to a specific file-format that looks quite a bit different from the standard shell code that you will see in the rest of this tutorial.

 

iptables-restore can be used together with scripting languages. The big problem is that you will need to output the results into the stdin of iptables-restore. If you are creating a very big ruleset (several thousand rules) this might be a very good idea, since it will be much faster to insert all the new rules. For example, you would then run make_rules.sh | iptables-restore.


8.1. Speed considerations

One of the largest reasons for using the iptables-save and iptables-restore commands is that they will speed up the loading and saving of larger rule-sets considerably. The main problem with running a shell script that contains iptables rules is that each invocation of iptables within the script will first extract the whole rule-set from the Netfilter kernel space, and after this, it will insert or append rules, or do whatever change to the rule-set that is needed by this specific command. Finally, it will insert the new rule-set from its own memory into kernel space. Using a shell script, this is done for each and every rule that we want to insert, and for each time we do this, it takes more time to extract and insert the rule-set.

To solve this problem, there is the iptables-save and restore commands. The iptables-save command is used to save the rule-set into a specially formatted text-file, and the iptables-restore command is used to load this text-file into kernel again. The best parts of these commands is that they will load and save the rule-set in one single request. iptables-save will grab the whole rule-set from kernel and save it to a file in one single movement. iptables-restore will upload that specific rule-set to kernel in a single movement for each table. In other words, instead of dropping the rule-set out of kernel some 30,000 times, for really large rule-sets, and then upload it to kernel again that many times, we can now save the whole thing into a file in one movement and then upload the whole thing in as little as three movements depending on how many tables you use.

As you can understand, these tools are definitely something for you if you are working on a huge set of rules that needs to be inserted. However, they do have drawbacks that we will discuss more in the next section.


8.2. Drawbacks with restore

As you may have already wondered, can iptables-restore handle any kind of scripting? So far, no, it cannot and it will most probably never be able to. This is the main flaw in using iptables-restore since you will not be able to do a huge set of things with these files. For example, what if you have a connection that has a dynamically assigned IP address and you want to grab this dynamic IP every-time the computer boots up and then use that value within your scripts? With iptables-restore, this is more or less impossible.

One possibility to get around this is to make a small script which grabs the values you would like to use in the script, then sed the iptables-restore file for specific keywords and replace them with the values collected via the small script. At this point, you could save it to a temporary file, and then use iptables-restore to load the new values. This causes a lot of problems however, and you will be unable to use iptables-save properly since it would probably erase your manually added keywords in the restore script. It is, in other words, a clumsy solution.

A second possibility is to do as previously described. Make a script that outputs rules in iptables-restore format, and then feed them on standard input of iptables-restore. For very large rulesets this would be to be preferred over running iptables itself, since it has a bad habit of taking a lot of processing power on very large rulesets as previously described in this chapter.

Another solution is to load the iptables-restore scripts first, and then load a specific shell script that inserts more dynamic rules in their proper places. Of course, as you can understand, this is just as clumsy as the first solution. iptables-restore is simply not very well suited for configurations where IP addresses are dynamically assigned to your firewall or where you want different behaviors depending on configuration options and so on.

Another drawback with iptables-restore and iptables-save is that it is not fully functional as of writing this. The problem is simply that not a lot of people use it as of today and hence there are not a lot of people finding bugs, and in turn some matches and targets will simply be inserted badly, which may lead to some strange behaviors that you did not expect. Even though these problems exist, I would highly recommend using these tools which should work extremely well for most rule-sets as long as they do not contain some of the new targets or matches that it does not know how to handle properly.


8.3. iptables-save

The iptables-save command is, as we have already explained, a tool to save the current rule-set into a file that iptables-restore can use. This command is quite simple really, and takes only two arguments. Take a look at the following example to understand the syntax of the command.

 

iptables-save [-c] [-t table]

The -c argument tells iptables-save to keep the values specified in the byte and packet counters. This could for example be useful if we would like to reboot our main firewall, but not lose byte and packet counters which we may use for statistical purposes. Issuing a iptables-save command with the -c argument would then make it possible for us to reboot without breaking our statistical and accounting routines. The default value is, of course, to not keep the counters intact when issuing this command.

The -t argument tells the iptables-save command which tables to save. Without this argument the command will automatically save all tables available into the file. The following is an example on what output you can expect from the iptables-save command if you do not have any rule-set loaded.

# Generated by iptables-save v1.2.6a on Wed Apr 24 10:19:17 2002
*filter
:INPUT ACCEPT [404:19766]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [530:43376]
COMMIT
# Completed on Wed Apr 24 10:19:17 2002
# Generated by iptables-save v1.2.6a on Wed Apr 24 10:19:17 2002
*mangle
:PREROUTING ACCEPT [451:22060]
:INPUT ACCEPT [451:22060]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [594:47151]
:POSTROUTING ACCEPT [594:47151]
COMMIT
# Completed on Wed Apr 24 10:19:17 2002
# Generated by iptables-save v1.2.6a on Wed Apr 24 10:19:17 2002
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [3:450]
:OUTPUT ACCEPT [3:450]
COMMIT
# Completed on Wed Apr 24 10:19:17 2002

This contains a few comments starting with a # sign. Each table is marked like *, for example *mangle. Then within each table we have the chain specifications and rules. A chain specification looks like : [:]. The chain-name may be for example PREROUTING, the policy is described previously and can, for example, be ACCEPT. Finally the packet-counter and byte-counters are the same counters as in the output from iptables -L -v. Finally, each table declaration ends in a COMMIT keyword. The COMMIT keyword tells us that at this point we should commit all rules currently in the pipeline to kernel.

The above example is pretty basic, and hence I believe it is nothing more than proper to show a brief example which contains a very small Iptables-save ruleset. If we would run iptables-save on this, it would look something like this in the output:

# Generated by iptables-save v1.2.6a on Wed Apr 24 10:19:55 2002
*filter
:INPUT DROP [1:229]
:FORWARD DROP [0:0]
:OUTPUT DROP [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth1 -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Wed Apr 24 10:19:55 2002
# Generated by iptables-save v1.2.6a on Wed Apr 24 10:19:55 2002
*mangle
:PREROUTING ACCEPT [658:32445]
:INPUT ACCEPT [658:32445]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [891:68234]
:POSTROUTING ACCEPT [891:68234]
COMMIT
# Completed on Wed Apr 24 10:19:55 2002
# Generated by iptables-save v1.2.6a on Wed Apr 24 10:19:55 2002
*nat
:PREROUTING ACCEPT [1:229]
:POSTROUTING ACCEPT [3:450]
:OUTPUT ACCEPT [3:450]
-A POSTROUTING -o eth0 -j SNAT --to-source 195.233.192.1
COMMIT
# Completed on Wed Apr 24 10:19:55 2002

As you can see, each command has now been prefixed with the byte and packet counters since we used the -c argument. Except for this, the command-line is quite intact from the script. The only problem now, is how to save the output to a file. Quite simple, and you should already know how to do this if you have used linux at all before. It is only a matter of piping the command output on to the file that you would like to save it as. This could look like the following:

iptables-save -c > /etc/iptables-save

The above command will in other words save the whole rule-set to a file called /etc/iptables-save with byte and packet counters still intact.


8.4. iptables-restore

The iptables-restore command is used to restore the iptables rule-set that was saved with the iptables-save command. It takes all the input from standard input and can't load from files as of writing this, unfortunately. This is the command syntax for iptables-restore:

 

iptables-restore [-c] [-n]

The -c argument restores the byte and packet counters and must be used if you want to restore counters that were previously saved with iptables-save. This argument may also be written in its long form --counters.

The -n argument tells iptables-restore to not overwrite the previously written rules in the table, or tables, that it is writing to. The default behavior of iptables-restore is to flush and destroy all previously inserted rules. The short -n argument may also be replaced with the longer format --noflush.

To load a rule-set with the iptables-restore command, we could do this in several ways, but we will mainly look at the simplest and most common way here.

cat /etc/iptables-save | iptables-restore -c

The following will also work:

iptables-restore -c < /etc/iptables-save

This would cat the rule-set located within the /etc/iptables-save file and then pipe it to iptables-restore which takes the rule-set on the standard input and then restores it, including byte and packet counters. It is that simple to begin with. This command could be varied until oblivion and we could show different piping possibilities, however, this is a bit out of the scope of this chapter, and hence we will skip that part and leave it as an exercise for the reader to experiment with.

The rule-set should now be loaded properly to kernel and everything should work. If not, you may possibly have run into a bug in these commands.


Chapter 9. How a rule is built

This chapter and the upcoming three chapters will discuss at length how to build your own rules. A rule could be described as the directions the firewall will adhere to when blocking or permitting different connections and packets in a specific chain. Each line you write that's inserted in a chain should be considered a rule. We will also discuss the basic matches that are available, and how to use them, as well as the different targets and how we can construct new targets of our own (i.e.,new sub chains).

This chapter will deal with the raw basics of how a rule is created and how you write it and enter it so that it will be accepted by the userspace program iptables, the different tables, as well as the commands that you can issue to iptables. After that we will in the next chapter look at all the matches that are available to iptables, and then get more into detail of each type of target and jump.


9.1. Basics of the iptables command

As we have already explained, each rule is a line that the kernel looks at to find out what to do with a packet. If all the criteria - or matches - are met, we perform the target - or jump - instruction. Normally we would write our rules in a syntax that looks something like this:

 

iptables [-t table] command [match] [target/jump]

There is nothing that says that the target instruction has to be the last function in the line. However, you would usually adhere to this syntax to get the best readability. Anyway, most of the rules you'll see are written in this way. Hence, if you read someone else's script, you'll most likely recognize the syntax and easily understand the rule.

If you want to use a table other than the standard table, you could insert the table specification at the point at which [table] is specified. However, it is not necessary to state explicitly what table to use, since by default iptables uses the filter table on which to implement all commands. Neither do you have to specify the table at just this point in the rule. It could be set pretty much anywhere along the line. However, it is more or less standard to put the table specification at the beginning.

One thing to think about though: The command should always come first, or alternatively directly after the table specification. We use 'command' to tell the program what to do, for example to insert a rule or to add a rule to the end of the chain, or to delete a rule. We shall take a further look at this below.

The match is the part of the rule that we send to the kernel that details the specific character of the packet, what makes it different from all other packets. Here we could specify what IP address the packet comes from, from which network interface, the intended IP address, port, protocol or whatever. There is a heap of different matches that we can use that we will look closer at further on in this chapter.

Finally we have the target of the packet. If all the matches are met for a packet, we tell the kernel what to do with it. We could, for example, tell the kernel to send the packet to another chain that we've created ourselves, and which is part of this particular table. We could tell the kernel to drop the packet dead and do no further processing, or we could tell the kernel to send a specified reply to the sender. As with the rest of the content in this section, we'll look closer at it further on in the chapter.


9.2. Tables

The -t option specifies which table to use. Per default, the filter table is used. We may specify one of the following tables with the -t option. Do note that this is an extremely brief summary of some of the contents of the Traversing of tables and chains chapter.

Table 9-1. Tables

TableExplanation
natThe nat table is used mainly for Network Address Translation. "NAT"ed packets get their IP addresses altered, according to our rules. Packets in a stream only traverse this table once. We assume that the first packet of a stream is allowed. The rest of the packets in the same stream are automatically "NAT"ed or Masqueraded etc, and will be subject to the same actions as the first packet. These will, in other words, not go through this table again, but will nevertheless be treated like the first packet in the stream. This is the main reason why you should not do any filtering in this table, which we will discuss at greater length further on. The PREROUTING chain is used to alter packets as soon as they get in to the firewall. The OUTPUT chain is used for altering locally generated packets (i.e., on the firewall) before they get to the routing decision. Finally we have the POSTROUTING chain which is used to alter packets just as they are about to leave the firewall.
mangleThis table is used mainly for mangling packets. Among other things, we can change the contents of different packets and that of their headers. Examples of this would be to change the TTLTOS or MARK. Note that the MARK is not really a change to the packet, but a mark value for the packet is set in kernel space. Other rules or programs might use this mark further along in the firewall to filter or do advanced routing on; tc is one example. The table consists of five built in chains, the PREROUTING, POSTROUTING, OUTPUT, INPUT and FORWARD chains. PREROUTING is used for altering packets just as they enter the firewall and before they hit the routing decision. POSTROUTING is used to mangle packets just after all routing decisions have been made. OUTPUT is used for altering locally generated packets after they enter the routing decision. INPUT is used to alter packets after they have been routed to the local computer itself, but before the user space application actually sees the data. FORWARD is used to mangle packets after they have hit the first routing decision, but before they actually hit the last routing decision. Note that mangle can't be used for any kind of Network Address Translation or Masquerading, the nat table was made for these kinds of operations.
filterThe filter table should be used exclusively for filtering packets. For example, we could DROPLOGACCEPT or REJECT packets without problems, as we can in the other tables. There are three chains built in to this table. The first one is named FORWARD and is used on all non-locally generated packets that are not destined for our local host (the firewall, in other words). INPUT is used on all packets that are destined for our local host (the firewall) and OUTPUT is finally used for all locally generated packets.

The above details should have explained the basics about the three different tables that are available. They should be used for totally different purposes, and you should know what to use each chain for. If you do not understand their usage, you may well dig a pit for yourself in your firewall, into which you will fall as soon as someone finds it and pushes you into it. We have already discussed the requisite tables and chains in more detail within the Traversing of tables and chains chapter. If you do not understand this fully, I advise you to go back and read through it again.


9.3. Commands

In this section we will cover all the different commands and what can be done with them. The command tells iptables what to do with the rest of the rule that we send to the parser. Normally we would want either to add or delete something in some table or another. The following commands are available to iptables:

Table 9-2. Commands

Command-A--append
Exampleiptables -A INPUT ...
ExplanationThis command appends the rule to the end of the chain. The rule will in other words always be put last in the rule-set and hence be checked last, unless you append more rules later on.
Command-D--delete
Exampleiptables -D INPUT --dport 80 -j DROPiptables -D INPUT 1
ExplanationThis command deletes a rule in a chain. This could be done in two ways; either by entering the whole rule to match (as in the first example), or by specifying the rule number that you want to match. If you use the first method, your entry must match the entry in the chain exactly. If you use the second method, you must match the number of the rule you want to delete. The rules are numbered from the top of each chain, starting with number 1.
Command-R--replace
Exampleiptables -R INPUT 1 -s 192.168.0.1 -j DROP
ExplanationThis command replaces the old entry at the specified line. It works in the same way as the --delete command, but instead of totally deleting the entry, it will replace it with a new entry. The main use for this might be while you're experimenting with iptables.
Command-I--insert
Exampleiptables -I INPUT 1 --dport 80 -j ACCEPT
ExplanationInsert a rule somewhere in a chain. The rule is inserted as the actual number that we specify. In other words, the above example would be inserted as rule 1 in the INPUT chain, and hence from now on it would be the very first rule in the chain.
Command-L--list
Exampleiptables -L INPUT
ExplanationThis command lists all the entries in the specified chain. In the above case, we would list all the entries in the INPUT chain. It's also legal to not specify any chain at all. In the last case, the command would list all the chains in the specified table (To specify a table, see the Tables section). The exact output is affected by other options sent to the parser, for example the -n and -v options, etc.
Command-F--flush
Exampleiptables -F INPUT
ExplanationThis command flushes all rules from the specified chain and is equivalent to deleting each rule one by one, but is quite a bit faster. The command can be used without options, and will then delete all rules in all chains within the specified table.
Command-Z--zero
Exampleiptables -Z INPUT
ExplanationThis command tells the program to zero all counters in a specific chain, or in all chains. If you have used the -v option with the -L command, you have probably seen the packet counter at the beginning of each field. To zero this packet counter, use the -Z option. This option works the same as -L, except that -Z won't list the rules. If -L and -Z is used together (which is legal), the chains will first be listed, and then the packet counters are zeroed.
Command-N--new-chain
Exampleiptables -N allowed
ExplanationThis command tells the kernel to create a new chain of the specified name in the specified table. In the above example we create a chain called allowed. Note that there must not already be a chain or target of the same name.
Command-X--delete-chain
Exampleiptables -X allowed
ExplanationThis command deletes the specified chain from the table. For this command to work, there must be no rules that refer to the chain that is to be deleted. In other words, you would have to replace or delete all rules referring to the chain before actually deleting the chain. If this command is used without any options, all chains but those built in to the specified table will be deleted.
Command-P--policy
Exampleiptables -P INPUT DROP
ExplanationThis command tells the kernel to set a specified default target, or policy, on a chain. All packets that don't match any rule will then be forced to use the policy of the chain. Legal targets are DROP and ACCEPT (There might be more, mail me if so).
Command-E--rename-chain
Exampleiptables -E allowed disallowed
ExplanationThe -E command tells iptables to change the first name of a chain, to the second name. In the example above we would, in other words, change the name of the chain from allowed to disallowed. Note that this will not affect the actual way the table will work. It is, in other words, just a cosmetic change to the table.

You should always enter a complete command line, unless you just want to list the built-in help for iptables or get the version of the command. To get the version, use the -v option and to get the help message, use the -h option. As usual, in other words. Next comes a few options that can be used with various different commands. Note that we tell you with which commands the options can be used and what effect they will have. Also note that we do not include any options here that affect rules or matches. Instead, we'll take a look at matches and targets in a later section of this chapter.

Table 9-3. Options

Option-v--verbose
Commands used with--list--append--insert--delete--replace
ExplanationThis command gives verbose output and is mainly used together with the --list command. If used together with the --list command, it outputs the interface address, rule options and TOS masks. The --list command will also include a bytes and packet counter for each rule, if the --verbose option is set. These counters uses the K (x1000), M (x1,000,000) and G (x1,000,000,000) multipliers. To overrule this and get exact output, you can use the -x option, described later. If this option is used with the --append--insert--delete or --replace commands, the program will output detailed information on how the rule was interpreted and whether it was inserted correctly, etc.
Option-x--exact
Commands used with--list
ExplanationThis option expands the numerics. The output from --list will in other words not contain the K, M or G multipliers. Instead we will get an exact output from the packet and byte counters of how many packets and bytes that have matched the rule in question. Note that this option is only usable in the --list command and isn't really relevant for any of the other commands.
Option-n--numeric
Commands used with--list
ExplanationThis option tells iptables to output numerical values. IP addresses and port numbers will be printed by using their numerical values and not host-names, network names or application names. This option is only applicable to the --list command. This option overrides the default of resolving all numerics to hosts and names, where this is possible.
Option--line-numbers
Commands used with--list
ExplanationThe --line-numbers command, together with the --list command, is used to output line numbers. Using this option, each rule is output with its number. It could be convenient to know which rule has which number when inserting rules. This option only works with the --list command.
Option-c--set-counters
Commands used with--insert--append--replace
ExplanationThis option is used when creating a rule or modifying it in some way. We can then use the option to initialize the packet and byte counters for the rule. The syntax would be something like --set-counters 20 4000, which would tell the kernel to set the packet counter to 20 and byte counter to 4000.
Option--modprobe
Commands used withAll
ExplanationThe --modprobe option is used to tell iptables which module to use when probing for modules or adding them to the kernel. It could be used if your modprobe command is not somewhere in the search path etc. In such cases, it might be necessary to specify this option so the program knows what to do in case a needed module is not loaded. This option can be used with all commands.

Chapter 10. Iptables matches

In this chapter we'll talk a bit more about matches. I've chosen to narrow down the matches into five different subcategories. First of all we have the generic matches, which can be used in all rules. Then we have the TCP matches which can only be applied to TCP packets. We have UDP matches which can only be applied to UDP packets, and ICMP matches which can only be used on ICMP packets. Finally we have special matches, such as the state, owner and limit matches and so on. These final matches have in turn been narrowed down to even more subcategories, even though they might not necessarily be different matches at all. I hope this is a reasonable breakdown and that all people out there can understand it.

As you may already understand if you have read the previous chapters, a match is something that specifies a special condition within the packet that must be true (or false). A single rule can contain several matches of these kinds. For example, we may want to match packets that come from a specific host on a our local area network, and on top of that only from specific ports on that host. We could then use matches to tell the rule to only apply the target - or jump specification - on packets that have a specific source address, that come in on the interface that connects to the LAN and the packets must be one of the specified ports. If any one of these matches fails (e.g., the source address isn't correct, but everything else is true), the whole rule fails and the next rule is tested on the packet. If all matches are true, however, the target specified by the rule is applied.


10.1. Generic matches

This section will deal with Generic matches. A generic match is a kind of match that is always available, whatever kind of protocol we are working on, or whatever match extensions we have loaded. No special parameters at all are needed to use these matches; in other words. I have also included the --protocol match here, even though it is more specific to protocol matches. For example, if we want to use a TCP match, we need to use the --protocol match and send TCP as an option to the match. However, --protocol is also a match in itself, since it can be used to match specific protocols. The following matches are always available.

Table 10-1. Generic matches

Match-p--protocol
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp
ExplanationThis match is used to check for certain protocols. Examples of protocols are TCP, UDP and ICMP. The protocol must either be one of the internally specified TCP, UDP or ICMP. It may also take a value specified in the /etc/protocols file, and if it can't find the protocol there it will reply with an error. The protocl may also be an integer value. For example, the ICMP protocol is integer value 1, TCP is 6 and UDP is 17. Finally, it may also take the value ALL. ALL means that it matches only TCP, UDP and ICMP. If this match is given the integer value of zero (0), it means ALL protocols, which in turn is the default behavior, if the --protocol match is not used. This match can also be inversed with the ! sign, so --protocol ! tcp would mean to match UDP and ICMP.
Match-s--src--source
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -s 192.168.1.1
ExplanationThis is the source match, which is used to match packets, based on their source IP address. The main form can be used to match single IP addresses, such as 192.168.1.1. It could also be used with a netmask in a CIDR "bit" form, by specifying the number of ones (1's) on the left side of the network mask. This means that we could for example add /24 to use a 255.255.255.0 netmask. We could then match whole IP ranges, such as our local networks or network segments behind the firewall. The line would then look something like 192.168.0.0/24. This would match all packets in the 192.168.0.x range. Another way is to do it with a regular netmask in the 255.255.255.255 form (i.e., 192.168.0.0/255.255.255.0). We could also invert the match with an ! just as before. If we were, in other words, to use a match in the form of --source ! 192.168.0.0/24, we would match all packets with a source address not coming from within the 192.168.0.x range. The default is to match all IP addresses.
Match-d--dst--destination
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -d 192.168.1.1
ExplanationThe --destination match is used for packets based on their destination address or addresses. It works pretty much the same as the --source match and has the same syntax, except that the match is based on where the packets are going to. To match an IP range, we can add a netmask either in the exact netmask form, or in the number of ones (1's) counted from the left side of the netmask bits. Examples are: 192.168.0.0/255.255.255.0 and 192.168.0.0/24. Both of these are equivalent. We could also invert the whole match with an ! sign, just as before. --destination ! 192.168.0.1 would in other words match all packets except those destined to the 192.168.0.1 IP address.
Match-i--in-interface
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -i eth0
ExplanationThis match is used for the interface the packet came in on. Note that this option is only legal in the INPUT, FORWARD and PREROUTING chains and will return an error message when used anywhere else. The default behavior of this match, if no particular interface is specified, is to assume a string value of +. The + value is used to match a string of letters and numbers. A single + would, in other words, tell the kernel to match all packets without considering which interface it came in on. The + string can also be appended to the type of interface, so eth+ would be all Ethernet devices. We can also invert the meaning of this option with the help of the ! sign. The line would then have a syntax looking something like -i ! eth0, which would match all incoming interfaces, except eth0.
Match-o--out-interface
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A FORWARD -o eth0
ExplanationThe --out-interface match is used for packets on the interface from which they are leaving. Note that this match is only available in the OUTPUT, FORWARD and POSTROUTING chains, the opposite in fact of the --in-interface match. Other than this, it works pretty much the same as the --in-interface match. The + extension is understood as matching all devices of similar type, so eth+ would match all eth devices and so on. To invert the meaning of the match, you can use the ! sign in exactly the same way as for the --in-interface match. If no --out-interface is specified, the default behavior for this match is to match all devices, regardless of where the packet is going.
Match-f--fragment
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -f
ExplanationThis match is used to match the second and third part of a fragmented packet. The reason for this is that in the case of fragmented packets, there is no way to tell the source or destination ports of the fragments, nor ICMP types, among other things. Also, fragmented packets might in rather special cases be used to compound attacks against other computers. Packet fragments like this will not be matched by other rules, and hence this match was created. This option can also be used in conjunction with the ! sign; however, in this case the ! sign must precede the match, i.e. ! -f. When this match is inverted, we match all header fragments and/or unfragmented packets. What this means, is that we match all the first fragments of fragmented packets, and not the second, third, and so on. We also match all packets that have not been fragmented during transfer. Note also that there are really good defragmentation options within the kernel that you can use instead. As a secondary note, if you use connection tracking you will not see any fragmented packets, since they are dealt with before hitting any chain or table in iptables.

10.2. Implicit matches

This section will describe the matches that are loaded implicitly. Implicit matches are implied, taken for granted, automatic. For example when we match on --protocol tcp without any further criteria. There are currently three types of implicit matches for three different protocols. These are TCP matchesUDP matches and ICMP matches. The TCP based matches contain a set of unique criteria that are available only for TCP packets. UDP based matches contain another set of criteria that are available only for UDP packets. And the same thing for ICMP packets. On the other hand, there can be explicit matches that are loaded explicitly. Explicit matches are not implied or automatic, you have to specify them specifically. For these you use the -m or --match option, which we will discuss in the next section.


10.2.1. TCP matches

These matches are protocol specific and are only available when working with TCP packets and streams. To use these matches, you need to specify --protocol tcp on the command line before trying to use them. Note that the --protocol tcp match must be to the left of the protocol specific matches. These matches are loaded implicitly in a sense, just as the UDP and ICMP matches are loaded implicitly. The other matches will be looked over in the continuation of this section, after the TCP match section.

Table 10-2. TCP matches

Match--sport--source-port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp --sport 22
ExplanationThe --source-port match is used to match packets based on their source port. Without it, we imply all source ports. This match can either take a service name or a port number. If you specify a service name, the service name must be in the /etc/services file, since iptables uses this file in which to find. If you specify the port by its number, the rule will load slightly faster, since iptables don't have to check up the service name. However, the match might be a little bit harder to read than if you use the service name. If you are writing a rule-set consisting of a 200 rules or more, you should definitely use port numbers, since the difference is really noticeable. (On a slow box, this could make as much as 10 seconds' difference, if you have configured a large rule-set containing 1000 rules or so). You can also use the --source-port match to match any range of ports, --source-port 22:80 for example. This example would match all source ports between 22 and 80. If you omit specifying the first port, port 0 is assumed (is implicit). --source-port :80 would then match port 0 through 80. And if the last port specification is omitted, port 65535 is assumed. If you were to write --source-port 22:, you would have specified a match for all ports from port 22 through port 65535. If you invert the port range, iptables automatically reverses your inversion. If you write --source-port 80:22, it is simply interpreted as --source-port 22:80. You can also invert a match by adding a ! sign. For example, --source-port ! 22 means that you want to match all ports but port 22. The inversion could also be used together with a port range and would then look like --source-port ! 22:80, which in turn would mean that you want to match all ports but ports 22 through 80. Note that this match does not handle multiple separated ports and port ranges. For more information about those, look at the multiport match extension.
Match--dport--destination-port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp --dport 22
ExplanationThis match is used to match TCP packets, according to their destination port. It uses exactly the same syntax as the --source-port match. It understands port and port range specifications, as well as inversions. It also reverses high and low ports in port range specifications, as above. The match will also assume values of 0 and 65535 if the high or low port is left out in a port range specification. In other words, exactly the same as the --source-port syntax. Note that this match does not handle multiple separated ports and port ranges. For more information about those, look at the multiport match extension.
Match--tcp-flags
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -p tcp --tcp-flags SYN,FIN,ACK SYN
ExplanationThis match is used to match on the TCP flags in a packet. First of all, the match takes a list of flags to compare (a mask) and secondly it takes list of flags that should be set to 1, or turned on. Both lists should be comma-delimited. The match knows about the SYN, ACK, FIN, RST, URG, PSH flags, and it also recognizes the words ALL and NONE. ALL and NONE is pretty much self describing: ALL means to use all flags and NONE means to use no flags for the option. --tcp-flags ALL NONE would in other words mean to check all of the TCP flags and match if none of the flags are set. This option can also be inverted with the ! sign. For example, if we specify ! SYN,FIN,ACK SYN, we would get a match that would match packets that had the ACK and FIN bits set, but not the SYN bit. Also note that the comma delimitation should not include spaces. You can see the correct syntax in the example above.
Match--syn
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -p tcp --syn
ExplanationThe --syn match is more or less an old relic from the ipchains days and is still there for backward compatibility and for and to make transition one to the other easier. It is used to match packets if they have the SYN bit set and the ACK and RST bits unset. This command would in other words be exactly the same as the --tcp-flags SYN,RST,ACK SYN match. Such packets are mainly used to request new TCP connections from a server. If you block these packets, you should have effectively blocked all incoming connection attempts. However, you will not have blocked the outgoing connections, which a lot of exploits today use (for example, hacking a legitimate service and then installing a program or suchlike that enables initiating an existing connection to your host, instead of opening up a new port on it). This match can also be inverted with the ! sign in this, ! --syn, way. This would match all packets with the RST or the ACK bits set, in other words packets in an already established connection.
Match--tcp-option
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -p tcp --tcp-option 16
ExplanationThis match is used to match packets depending on their TCP options. A TCP Option is a specific part of the header. This part consists of 3 different fields. The first one is 8 bits long and tells us which Options are used in this stream, the second one is also 8 bits long and tells us how long the options field is. The reason for this length field is that TCP options are, well, optional. To be compliant with the standards, we do not need to implement all options, but instead we can just look at what kind of option it is, and if we do not support it, we just look at the length field and can then jump over this data. This match is used to match different TCP options depending on their decimal values. It may also be inverted with the ! flag, so that the match matches all TCP options but the option given to the match. For a complete list of all options, take a closer look at the Internet Engineering Task Force who maintains a list of all the standard numbers used on the Internet.

10.2.2. UDP matches

This section describes matches that will only work together with UDP packets. These matches are implicitly loaded when you specify the --protocol UDP match and will be available after this specification. Note that UDP packets are not connection oriented, and hence there is no such thing as different flags to set in the packet to give data on what the datagram is supposed to do, such as open or closing a connection, or if they are just simply supposed to send data. UDP packets do not require any kind of acknowledgment either. If they are lost, they are simply lost (Not taking ICMP error messaging etc into account). This means that there are quite a lot less matches to work with on a UDP packet than there is on TCP packets. Note that the state machine will work on all kinds of packets even though UDP or ICMP packets are counted as connectionless protocols. The state machine works pretty much the same on UDP packets as on TCP packets.

Table 10-3. UDP matches

Match--sport--source-port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p udp --sport 53
ExplanationThis match works exactly the same as its TCP counterpart. It is used to perform matches on packets based on their source UDP ports. It has support for port ranges, single ports and port inversions with the same syntax. To specify a UDP port range, you could use 22:80 which would match UDP ports 22 through 80. If the first value is omitted, port 0 is assumed. If the last port is omitted, port 65535 is assumed. If the high port comes before the low port, the ports switch place with each other automatically. Single UDP port matches look as in the example above. To invert the port match, add a ! sign, --source-port ! 53. This would match all ports but port 53. The match can understand service names, as long as they are available in the /etc/services file. Note that this match does not handle multiple separated ports and port ranges. For more information about this, look at the multiport match extension.
Match--dport--destination-port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p udp --dport 53
ExplanationThe same goes for this match as for --source-port above. It is exactly the same as for the equivalent TCP match, but here it applies to UDP packets. It matches packets based on their UDP destination port. The match handles port ranges, single ports and inversions. To match a single port you use, for example, --destination-port 53, to invert this you would use --destination-port ! 53. The first would match all UDP packets going to port 53 while the second would match packets but those going to the destination port 53. To specify a port range, you would, for example, use --destination-port 9:19. This example would match all packets destined for UDP port 9 through 19. If the first port is omitted, port 0 is assumed. If the second port is omitted, port 65535 is assumed. If the high port is placed before the low port, they automatically switch place, so the low port winds up before the high port. Note that this match does not handle multiple ports and port ranges. For more information about this, look at the multiport match extension.

10.2.3. ICMP matches

These are the ICMP matches. These packets are even more ephemeral, that is to say short lived, than UDP packets, in the sense that they are connectionless. The ICMP protocol is mainly used for error reporting and for connection controlling and suchlike. ICMP is not a protocol subordinated to the IP protocol, but more of a protocol that augments the IP protocol and helps in handling errors. The headers of ICMP packets are very similar to those of the IP headers, but differ in a number of ways. The main feature of this protocol is the type header, that tells us what the packet is for. One example is, if we try to access an unaccessible IP address, we would normally get an ICMP host unreachable in return. For a complete listing of ICMP types, see the ICMP types appendix. There is only one ICMP specific match available for ICMP packets, and hopefully this should suffice. This match is implicitly loaded when we use the --protocol ICMP match and we get access to it automatically. Note that all the generic matches can also be used, so that among other things we can match on the source and destination addresses.

Table 10-4. ICMP matches

Match--icmp-type
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p icmp --icmp-type 8
Explanation

This match is used to specify the ICMP type to match. ICMP types can be specified either by their numeric values or by their names. Numerical values are specified in RFC 792. To find a complete listing of the ICMP name values, do an iptables --protocol icmp --help, or check the ICMP types appendix. This match can also be inverted with the ! sign in this, --icmp-type ! 8, fashion. Note that some ICMP types are obsolete, and others again may be "dangerous" for an unprotected host since they may, among other things, redirect packets to the wrong places. The type and code may also be specified by their typename, numeric type, and type/code as well. For example --icmp-type network-redirect--icmp-type 8 or --icmp-type 8/0. For a complete listing of the names, type iptables -p icmp --help.

 

Please note that netfilter uses ICMP type 255 to match all ICMP types. If you try to match this ICMP type, you will wind up with matching all ICMP types.


10.3. Explicit matches

Explicit matches are those that have to be specifically loaded with the -m or --match option. State matches, for example, demand the directive -m state prior to entering the actual match that you want to use. Some of these matches may be protocol specific . Some may be unconnected with any specific protocol - for example connection states. These might be NEW (the first packet of an as yet unestablished connection), ESTABLISHED (a connection that is already registered in the kernel), RELATED (a new connection that was created by an older, established one) etc. A few may just have been evolved for testing or experimental purposes, or just to illustrate what iptables is capable of. This in turn means that not all of these matches may at first sight be of any use. Nevertheless, it may well be that you personally will find a use for specific explicit matches. And there are new ones coming along all the time, with each new iptables release. Whether you find a use for them or not depends on your imagination and your needs. The difference between implicitly loaded matches and explicitly loaded ones, is that the implicitly loaded matches will automatically be loaded when, for example, you match on the properties of TCP packets, while explicitly loaded matches will never be loaded automatically - it is up to you to discover and activate explicit matches.


10.3.1. AH/ESP match

These matches are used for the IPSEC AH and ESP protocols. IPSEC is used to create secure tunnels over an insecure Internet connection. The AH and ESP protocols are used by IPSEC to create these secure connections. The AH and ESP matches are really two separate matches, but are both described here since they look very much alike, and both are used in the same function.

I will not go into detail to describe IPSEC here, instead look at the following pages and documents for more information:

There is also a ton more documentation on the Internet on this, but you are free to look it up as needed.

To use the AH/ESP matches, you need to use -m ah to load the AH matches, and -m esp to load the ESP matches.

 

In 2.2 and 2.4 kernels, Linux used something called FreeS/WAN for the IPSEC implementation, but as of Linux kernel 2.5.47 and up, Linux kernels have a direct implementation of IPSEC that requires no patching of the kernel. This is a total rewrite of the IPSEC implementation on Linux.

Table 10-5. AH match options

Match--ahspi
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p 51 -m ah --ahspi 500
ExplanationThis matches the AH Security Parameter Index (SPI) number of the AH packets. Please note that you must specify the protocol as well, since AH runs on a different protocol than the standard TCP, UDP or ICMP protocols. The SPI number is used in conjunction with the source and destination address and the secret keys to create a security association (SA). The SA uniquely identifies each and every one of the IPSEC tunnels to all hosts. The SPI is used to uniquely distinguish each IPSEC tunnel connected between the same two peers. Using the --ahspi match, we can match a packet based on the SPI of the packets. This match can match a whole range of SPI values by using a : sign, such as 500:520, which will match the whole range of SPI's.

Table 10-6. ESP match options

Match--espspi
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p 50 -m esp --espspi 500
ExplanationThe ESP counterpart Security Parameter Index (SPI) is used exactly the same way as the AH variant. The match looks exactly the same, with the esp/ah difference. Of course, this match can match a whole range of SPI numbers as well as the AH variant of the SPI match, such as --espspi 200:250 which matches the whole range of SPI's.

10.3.2. Conntrack match

The conntrack match is an extended version of the state match, which makes it possible to match packets in a much more granular way. It let's you look at information directly available in the connection tracking system, without any "frontend" systems, such as in the state match. For more information about the connection tracking system, take a look at the The state machine chapter.

There are a number of different matches put together in the conntrack match, for several different fields in the connection tracking system. These are compiled together into the list below. To load these matches, you need to specify -m conntrack.

Table 10-7. Conntrack match options

Match--ctstate
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctstate RELATED
Explanation

This match is used to match the state of a packet, according to the conntrack state. It is used to match pretty much the same states as in the original state match. The valid entries for this match are:

  • INVALID

  • ESTABLISED

  • NEW

  • RELATED

  • SNAT

  • DNAT

The entries can be used together with each other separated by a comma. For example, -m conntrack --ctstate ESTABLISHED,RELATED. It can also be inverted by putting a ! in front of --ctstate. For example: -m conntrack ! --ctstate ESTABLISHED,RELATED, which matches all but the ESTABLISHED and RELATED states.

Match--ctproto
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctproto TCP
Explanation

This matches the protocol, the same as the --protocol does. It can take the same types of values, and is inverted using the ! sign. For example, -m conntrack ! --ctproto TCP matches all protocols but the TCP protocol.

Match--ctorigsrc
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctorigsrc 192.168.0.0/24
Explanation

--ctorigsrc matches based on the original source IP specification of the conntrack entry that the packet is related to. The match can be inverted by using a ! between the --ctorigsrc and IP specification, such as --ctorigsrc ! 192.168.0.1. It can also take a netmask of the CIDR form, such as --ctorigsrc 192.168.0.0/24.

Match--ctorigdst
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctorigdst 192.168.0.0/24
Explanation

This match is used exactly as the --ctorigsrc, except that it matches on the destination field of the conntrack entry. It has the same syntax in all other respects.

Match--ctreplsrc
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctreplsrc 192.168.0.0/24
Explanation

The --ctreplsrc match is used to match based on the original conntrack reply source of the packet. Basically, this is the same as the --ctorigsrc, but instead we match the reply source expected of the upcoming packets. This target can, of course, be inverted and address a whole range of addresses, just the same as the the previous targets in this class.

Match--ctrepldst
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctrepldst 192.168.0.0/24
Explanation

The --ctrepldst match is the same as the --ctreplsrc match, with the exception that it matches the reply destination of the conntrack entry that matched the packet. It too can be inverted, and accept ranges, just as the --ctreplsrc match.

Match--ctstatus
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctstatus RELATED
Explanation

This matches the status of the connection, as described in the The state machine chapter. It can match the following statuses.

  • NONE - The connection has no status at all.

  • EXPECTED - This connection is expected and was added by one of the expectation handlers.

  • SEEN_REPLY - This connection has seen a reply but isn't assured yet.

  • ASSURED - The connection is assured and will not be removed until it times out or the connection is closed by either end.

This can also be inverted by using the ! sign. For example -m conntrack ! --ctstatus ASSURED which will match all but the ASSURED status.

Match--ctexpire
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m conntrack --ctexpire 100:150
Explanation

This match is used to match on packets based on how long is left on the expiration timer of the conntrack entry, measured in seconds. It can either take a single value and match against, or a range such as in the example above. It can also be inverted by using the ! sign, such as this -m conntrack ! --ctexpire 100. This will match every expiration time, which does not have exactly 100 seconds left to it.


10.3.3. DSCP match

This match is used to match on packets based on their DSCP (Differentiated Services Code Point) field. This is documented in the RFC 2638 - A Two-bit Differentiated Services Architecture for the Internet RFC. The match is explicitly loaded by specifying -m dscp. The match can take two mutually exclusive options, described below.

Table 10-8. DSCP match options

Match--dscp
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m dscp --dscp 32
Explanation

This option takes a DSCP value in either decimal or in hex. If the option value is in decimal, it would be written like 32 or 16, et cetera. If written in hex, it should be prefixed with 0x, like this: 0x20. It can also be inverted by using the ! character, like this: -m dscp ! --dscp 32.

Match--dscp-class
Kernel2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m dscp --dscp-class BE
Explanation

The --dscp-class match is used to match on the DiffServ class of a packet. The values can be any of the BE, EF, AFxx or CSx classes as specified in the various RFC's. This match can be inverted just the same way as the --dscp option.

Note

Please note that the --dscp and --dscp-class options are mutually exclusive and can not be used in conjunction with each other.


10.3.4. ECN match

The ECN match is used to match on the different ECN fields in the TCP and IPv4 headers. ECN is described in detail in the RFC 3168 - The Addition of Explicit Congestion Notification (ECN) to IP RFC. The match is explicitly loaded by using -m ecn in the command line. The ECN match takes three different options as described below.

Table 10-9. ECN match options

Match--ecn
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m ecn --ecn-tcp-cwr
Explanation

This match is used to match the CWR (Congestion Window Received) bit, if it has been set. The CWR flag is set to notify the other endpoint of the connection that they have received an ECE, and that they have reacted to it. Per default this matches if the CWR bit is set, but the match may also be inversed using an exclamation point.

Match--ecn-tcp-ece
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m ecn --ecn-tcp-ece
Explanation

This match can be used to match the ECE (ECN-Echo) bit. The ECE is set once one of the endpoints has received a packet with the CE bit set by a router. The endpoint then sets the ECE in the returning ACK packet, to notify the other endpoint that it needs to slow down. The other endpoint then sends a CWR packet as described in the --ecn-tcp-cwr explanation. This matches per default if the ECE bit is set, but may be inversed by using an exclamation point.

Match--ecn-ip-ect
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m ecn --ecn-ip-ect 1
Explanation

The --ecn-ip-ect match is used to match the ECT (ECN Capable Transport) codepoints. The ECT codepoints has several types of usage. Mainly, they are used to negotiate if the connection is ECN capable by setting one of the two bits to 1. The ECT is also used by routers to indicate that they are experiencing congestion, by setting both ECT codepoints to 1. The ECT values are all available in the in the ECN Field in IP table below.

The match can be inversed using an exclamation point, for example ! --ecn-ip-ect 2 which will match all ECN values but the ECT(0) codepoint. The valid value range is 0-3 in iptables. See the above table for their values.

Table 10-10. ECN Field in IP

Iptables valueECTCE[Obsolete] RFC 2481 names for the ECN bits.
000Not-ECT, ie. non-ECN capable connection.
101ECT(1), New naming convention of ECT codepoints in RFC 3168.
210ECT(0), New naming convention of ECT codepoints in RFC 3168.
311CE (Congestion Experienced), Used to notify endpoints of congestion

10.3.5. Helper match

This is a rather unorthodox match in comparison to the other matches, in the sense that it uses a little bit specific syntax. The match is used to match packets, based on which conntrack helper that the packet is related to. For example, let's look at the FTP session. The Control session is opened up, and the ports/connection is negotiated for the Data session within the Control session. The ip_conntrack_ftp helper module will find this information, and create a related entry in the conntrack table. Now, when a packet enters, we can see which protocol it was related to, and we can match the packet in our ruleset based on which helper was used.

Table 10-11. Helper match options

Match--helper
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m helper --helper ftp-21
Explanation

The --helper option is used to specify a string value, telling the match which conntrack helper to match. In the basic form, it may look like --helper irc. This is where the syntax starts to change from the normal syntax. We can also choose to only match packets based on which port that the original expectation was caught on. For example, the FTP Control session is normally transferred over port 21, but it may as well be port 954 or any other port. We may then specify upon which port the expectation should be caught on, like --helper ftp-954.


10.3.6. IP range match

The IP range match is used to match IP ranges, just as the --source and --destination matches are able to do as well. However, this match adds a different kind of matching in the sense that it is able to match in the manner of from IP - to IP, which the --source and --destination matches are unable to. This may be needed in some specific network setups, and it is rather a bit more flexible.

Table 10-12. IP range match options

Match--src-range
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m iprange --src-range 192.168.1.13-192.168.2.19
Explanation

This matches a range of source IP addresses. The range includes every single IP address from the first to the last, so the example above includes everything from 192.168.1.13 to 192.168.2.19. The match may also be inverted by adding an !. The above example would then look like -m iprange ! --src-range 192.168.1.13-192.168.2.19, which would match every single IP address, except the ones specified.

Match--dst-range
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m iprange --dst-range 192.168.1.13-192.168.2.19
Explanation

The --dst-range works exactly the same as the --src-range match, except that it matches destination IP's instead of source IP's.


10.3.7. Length match

The length match is used to match packets based on their length. It is very simple. If you want to limit packet length for some strange reason, or want to block ping-of-death-like behaviour, use the length match.

Table 10-13. Length match options

Match--length
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m length --length 1400:1500
Explanation

The example --length will match all packets with a length between 1400 and 1500 bytes. The match may also be inversed using the ! sign, like this: -m length ! --length 1400:1500 . It may also be used to match only a specific length, removing the : sign and onwards, like this: -m length --length 1400. The range matching is, of course, inclusive, which means that it includes all packet lengths in between the values you specify.


10.3.8. Limit match

The limit match extension must be loaded explicitly with the -m limit option. This match can, for example, be used to advantage to give limited logging of specific rules etc. For example, you could use this to match all packets that do not exceed a given value, and after this value has been exceeded, limit logging of the event in question. Think of a time limit: You could limit how many times a certain rule may be matched in a certain time frame, for example to lessen the effects of DoS syn flood attacks. This is its main usage, but there are more usages, of course. The limit match may also be inverted by adding a ! flag in front of the limit match. It would then be expressed as -m limit ! --limit 5/s.This means that all packets will be matched after they have broken the limit.

To further explain the limit match, it is basically a token bucket filter. Consider having a leaky bucket where the bucket leaks X packets per time-unit. X is defined depending on how many matching packets we get, so if we get 3 packets, the bucket leaks 3 packets per that time-unit. The --limit option tells us how many packets to refill the bucket with per time-unit, while the --limit-burst option tells us how big the bucket is in the first place. So, setting --limit 3/minute --limit-burst 5, and then receiving 5 matches will empty the bucket. After 20 seconds, the bucket is refilled with another token, and so on until the --limit-burst is reached again or until they get used.

Consider the example below for further explanation of how this may look.

  1. We set a rule with -m limit --limit 5/second --limit-burst 10/second. The limit-burst token bucket is set to 10 initially. Each packet that matches the rule uses a token.

  2. We get packet that matches, 1-2-3-4-5-6-7-8-9-10, all within a 1/1000 of a second.

  3. The token bucket is now empty. Once the token bucket is empty, the packets that qualify for the rule otherwise no longer match the rule and proceed to the next rule if any, or hit the chain policy.

  4. For each 1/5 s without a matching packet, the token count goes up by 1, upto a maximum of 10. 1 second after receiving the 10 packets, we will once again have 5 tokens left.

  5. And of course, the bucket will be emptied by 1 token for each packet it receives.

Table 10-14. Limit match options

Match--limit
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m limit --limit 3/hour
ExplanationThis sets the maximum average match rate for the limit match. You specify it with a number and an optional time unit. The following time units are currently recognized: /second /minute /hour /day. The default value here is 3 per hour, or 3/hour. This tells the limit match how many times to allow the match to occur per time unit (e.g. per minute).
Match--limit-burst
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m limit --limit-burst 5
ExplanationThis is the setting for the burst limit of the limit match. It tells iptables the maximum number of packets to match within the given time unit. This number gets decremented by one for every time unit (specified by the --limit option) in which the event does not occur, back down to the lowest possible value, 1. If the event is repeated, the counter is again incremented, until the count reaches the burst limit. And so on. The default --limit-burst value is 5. For a simple way of checking out how this works, you can use the example Limit-match.txt one-rule-script. Using this script, you can see for yourself how the limit rule works, by simply sending ping packets at different intervals and in different burst numbers. All echo replies will be blocked until the threshold for the burst limit has again been reached.

10.3.9. MAC match

The MAC (Ethernet Media Access Control) match can be used to match packets based on their MAC source address. As of writing this documentation, this match is a little bit limited, however, in the future this may be more evolved and may be more useful. This match can be used to match packets on the source MAC address only as previously said.

 

Do note that to use this module we explicitly load it with the -m mac option. The reason that I am saying this is that a lot of people wonder if it should not be -m mac-source, which it should not.

Table 10-15. MAC match options

Match--mac-source
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m mac --mac-source 00:00:00:00:00:01
ExplanationThis match is used to match packets based on their MAC source address. The MAC address specified must be in the form XX:XX:XX:XX:XX:XX, else it will not be legal. The match may be reversed with an ! sign and would look like --mac-source ! 00:00:00:00:00:01. This would in other words reverse the meaning of the match, so that all packets except packets from this MAC address would be matched. Note that since MAC addresses are only used on Ethernet type networks, this match will only be possible to use for Ethernet interfaces. The MAC match is only valid in the PREROUTING, FORWARD and INPUT chains and nowhere else.

10.3.10. Mark match

The mark match extension is used to match packets based on the marks they have set. A mark is a special field, only maintained within the kernel, that is associated with the packets as they travel through the computer. Marks may be used by different kernel routines for such tasks as traffic shaping and filtering. As of today, there is only one way of setting a mark in Linux, namely the MARK target in iptables. This was previously done with the FWMARK target in ipchains, and this is why people still refer to FWMARK in advanced routing areas. The mark field is currently set to an unsigned integer, or 4294967296 possible values on a 32 bit system. In other words, you are probably not going to run into this limit for quite some time.

Table 10-16. Mark match options

Match--mark
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -t mangle -A INPUT -m mark --mark 1
ExplanationThis match is used to match packets that have previously been marked. Marks can be set with the MARK target which we will discuss in the next section. All packets traveling through Netfilter get a special mark field associated with them. Note that this mark field is not in any way propagated, within or outside the packet. It stays inside the computer that made it. If the mark field matches the mark, it is a match. The mark field is an unsigned integer, hence there can be a maximum of 4294967296 different marks. You may also use a mask with the mark. The mark specification would then look like, for example, --mark 1/1. If a mask is specified, it is logically AND ed with the mark specified before the actual comparison.

10.3.11. Multiport match

The multiport match extension can be used to specify multiple destination ports and port ranges. Without the possibility this match gives, you would have to use multiple rules of the same type, just to match different ports.

 

You can not use both standard port matching and multiport matching at the same time, for example you can't write: --sport 1024:63353 -m multiport --dport 21,23,80. This will simply not work. What in fact happens, if you do, is that iptables honors the first element in the rule, and ignores the multiport instruction.

Table 10-17. Multiport match options

Match--source-port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m multiport --source-port 22,53,80,110
ExplanationThis match matches multiple source ports. A maximum of 15 separate ports may be specified. The ports must be comma delimited, as in the above example. The match may only be used in conjunction with the -p tcp or -p udp matches. It is mainly an enhanced version of the normal --source-port match.
Match--destination-port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m multiport --destination-port 22,53,80,110
ExplanationThis match is used to match multiple destination ports. It works exactly the same way as the above mentioned source port match, except that it matches destination ports. It too has a limit of 15 ports and may only be used in conjunction with -p tcp and -p udp.
Match--port
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m multiport --port 22,53,80,110
ExplanationThis match extension can be used to match packets based both on their destination port and their source port. It works the same way as the --source-port and --destination-port matches above. It can take a maximum of 15 ports and can only be used in conjunction with -p tcp and -p udp. Note that the --port match will only match packets coming in from and going to the same port, for example, port 80 to port 80, port 110 to port 110 and so on.

10.3.12. Owner match

The owner match extension is used to match packets based on the identity of the process that created them. The owner can be specified as the process ID either of the user who issued the command in question, that of the group, the process, the session, or that of the command itself. This extension was originally written as an example of what iptables could be used for. The owner match only works within the OUTPUT chain, for obvious reasons: It is pretty much impossible to find out any information about the identity of the instance that sent a packet from the other end, or where there is an intermediate hop to the real destination. Even within the OUTPUT chain it is not very reliable, since certain packets may not have an owner. Notorious packets of that sort are (among other things) the different ICMP responses. ICMP responses will never match.

Table 10-18. Owner match options

Match--uid-owner
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m owner --uid-owner 500
ExplanationThis packet match will match if the packet was created by the given User ID (UID). This could be used to match outgoing packets based on who created them. One possible use would be to block any other user than root from opening new connections outside your firewall. Another possible use could be to block everyone but the http user from sending packets from the HTTP port.
Match--gid-owner
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m owner --gid-owner 0
ExplanationThis match is used to match all packets based on their Group ID (GID). This means that we match all packets based on what group the user creating the packets is in. This could be used to block all but the users in the network group from getting out onto the Internet or, as described above, only to allow members of the http group to create packets going out from the HTTP port.
Match--pid-owner
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m owner --pid-owner 78
ExplanationThis match is used to match packets based on the Process ID (PID) that was responsible for them. This match is a bit harder to use, but one example would be only to allow PID 94 to send packets from the HTTP port (if the HTTP process is not threaded, of course). Alternatively we could write a small script that grabs the PID from a ps output for a specific daemon and then adds a rule for it. For an example, you could have a rule as shown in the Pid-owner.txt example.
Match--sid-owner
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m owner --sid-owner 100
ExplanationThis match is used to match packets based on the Session ID used by the program in question. The value of the SID, or Session ID of a process, is that of the process itself and all processes resulting from the originating process. These latter could be threads, or a child of the original process. So, for example, all of our HTTPD processes should have the same SID as their parent process (the originating HTTPD process), if our HTTPD is threaded (most HTTPDs are, Apache and Roxen for instance). To show this in example, we have created a small script called Sid-owner.txt. This script could possibly be run every hour or so together with some extra code to check if the HTTPD is actually running and start it again if necessary, then flush and re-enter our OUTPUT chain if needed.

10.3.13. Packet type match

The packet type match is used to match packets based on their type. I.e., are they destined to a specific person, to everyone or to a specific group of machines or users. These three groups are generally called unicast, broadcast and multicast, as discussed in the TCP/IP repetition chapter. The match is loaded by using -m pkttype.

Table 10-19. Packet type match options

Match--pkt-type
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m pkttype --pkt-type unicast
Explanation

The --pkt-type match is used to tell the packet type match which packet type to match. It can either take unicast , broadcast or multicast as an argument, as in the example. It can also be inverted by using a ! like this: -m pkttype --pkt-type ! broadcast, which will match all other packet types.


10.3.14. Recent match

The recent match is a rather large and complex matching system, which allows us to match packets based on recent events that we have previously matched. For example, if we would see an outgoing IRC connection, we could set the IP addresses into a list of hosts, and have another rule that allows identd requests back from the IRC server within 15 seconds of seeing the original packet.

Before we can take a closer look at this match, let's try and explain a little bit how it works. First of all, we use several different rules to accomplish the use of the recent match. The recent match uses several different lists of recent events. The default list being used is the DEFAULT list. We create a new entry in a list with the set option, so once a rule is entirely matched (the set option is always a match), we also add an entry in the recent list specified. The list entry contains a timestamp, and the source IP address used in the packet that triggered the set option. Once this has happened, we can use a series of different recent options to match on this information, as well as update the entries timestamp, et cetera.

Finally, if we would for some reason want to remove a list entry, we would do this using the remove match from the recent module. All rules using the recent match, must load the recent module (-m recent) as usual. Before we go on with an example of the recent match, let's take a look at all the options.

Table 10-20. Recent match options

Match--name
Kernel2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m recent --name examplelist
Explanation

The name option gives the name of the list to use. Per default the DEFAULT list is used, which is probably not what we want if we are using more than one list.

Match--set
Kernel2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m recent --set
Explanation

This creates a new list entry in the named recent list, which contains a timestamp and the source IP address of the host that triggered the rule. This match will always return success, unless it is preceded by a ! sign, in which case it will return failure.

Match--rcheck
Kernel2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m recent --name examplelist --rcheck
Explanation

The --rcheck option will check if the source IP address of the packet is in the named list. If it is, the match will return true, otherwise it returns false. The option may be inverted by using the ! sign. In the later case, it will return true if the source IP address is not in the list, and false if it is in the list.

Match--update
Kernel2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m recent --name examplelist --update
Explanation

This match is true if the source combination is available in the specified list and it also updates the last-seen time in the list. This match may also be reversed by setting the ! mark in front of the match. For example, ! --update.

Match--remove
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m recent --name example --remove
Explanation

This match will try to find the source address of the packet in the list, and returns true if the packet is there. It will also remove the corresponding list entry from the list. The command is also possible to inverse with the ! sign.

Match--seconds
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m recent --name example --check --seconds 60
Explanation

This match is only valid together with the --check and --update matches. The --seconds match is used to specify how long since the "last seen" column was updated in the recent list. If the last seen column was older than this amount in seconds, the match returns false. Other than this the recent match works as normal, so the source address must still be in the list for a true return of the match.

Match--hitcount
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m recent --name example --check --hitcount 20
Explanation

The --hitcount match must be used together with the --check or --update matches and it will limit the match to only include packets that have seen at least the hitcount amount of packets. If this match is used together with the --seconds match, it will require the specified hitcount packets to be seen in the specific timeframe. This match may also be reversed by adding a ! sign in front of the match. Together with the --seconds match, this means that a maximum of this amount of packets may have been seen during the specified timeframe. If both of the matches are inversed, then a maximum of this amount of packets may have been seen during the last minumum of seconds.

Match--rttl
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m recent --name example --check --rttl
Explanation

The --rttl match is used to verify that the TTL value of the current packet is the same as the original packet that was used to set the original entry in the recent list. This can be used to verify that people are not spoofing their source address to deny others access to your servers by making use of the recent match.

Match--rsource
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m recent --name example --rsource
Explanation

The --rsource match is used to tell the recent match to save the source address and port in the recent list. This is the default behavior of the recent match.

Match--rdest
Kernel2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m recent --name example --rdest
Explanation

The --rdest match is the opposite of the --rsource match in that it tells the recent match to save the destination address and port to the recent list.

I have created a small sample script of how the recent match can be used, which you can find in the Recent-match.txt section.

Briefly, this is a poor replacement for the state engine available in netfilter. This version was created with a http server in mind, but will work with any TCP connection. First we have created two chains named http-recent and http-recent-final. The http-recent chain is used in the starting stages of the connection, and for the actual data transmission, while the http-recent-final chain is used for the last and final FIN, FIN/ACK handshake.

 

This is a very bad replacement for the built in state engine and can not handle all of the possibilities that the state engine can handle. However, it is a good example of what can be done with the recent match without being too specific. Do not use this example in a real world environment. It is slow, handles special cases badly, and should generally never be used more than as an example.

For example, it does not handle closed ports on connection, asyncronuous FIN handshake (where one of the connected parties closes down, while the other continues to send data), etc.

Let's follow a packet through the example ruleset. First a packet enters the INPUT chain, and we send it to the http-recent chain.

  1. The first packet should be a SYN packet, and should not have the ACK,FIN or RST bits set. Hence it is matched using the --tcp-flags SYN,ACK,FIN,RST SYN line. At this point we add the connection to the httplist using -m recent --name httplist --set line. Finally we accept the packet.

  2. After the first packet we should receive a SYN/ACK packet to acknowledge that the SYN packet was received. This can be matched using the --tcp-flags SYN,ACK,FIN,RST SYN,ACK line. FIN and RST should be illegal at this point as well. At this point we update the entry in the httplist using -m recent --name httplist --update and finally we ACCEPT the packet.

  3. By now we should get a final ACK packet, from the original creater of the connection, to acknowledge the SYN/ACK sent by the server. SYN, FIN and RST are illegal at this point of the connection, so the line should look like --tcp-flags SYN,ACK,FIN,RST ACK. We update the list in exactly the same way as in the previous step, and ACCEPT it.

  4. At this point the data tranmission can start. The connection should never contain any SYN packet now, but it will contain ACK packets to acknowledge the data packets that are sent. Each time we see any packet like this, we update the list and ACCEPT the packets.

  5. The transmission can be ended in two ways, the simplest is the RST packet. RST will simply reset the connection and it will die. With FIN, the other endpoint answers with a FIN,ACK, and this closes down the connection so that the original source of the FIN can no longer send any data. The receiver of the FIN, will still be able to send data, hence we send the connection to a "final" stage chain to handle the rest.

  6. In the http-recent-final chain we check if the packet is still in the httplist, and if so, we send it to the http-recent-final1 chain. In that chain we remove the connection from the httplist and add it to the http-recent-final list instead. If the connection has already been removed and moved over to the http-recent-final list, we send te packet to the http-recent-final2 chain.

  7. In the final http-recent-final2 chain, we wait for the non-closed side to finish sending its data, and to close the connection from their side as well. Once this is done, the connection is completely removed.

As you can see the recent list can become quite complex, but it will give you a huge set of possibilities if need be. Still, try and remember not to reinvent the wheel. If the ability you need is already implemented, try and use it instead of trying to create your own solution.


10.3.15. State match

The state match extension is used in conjunction with the connection tracking code in the kernel. The state match accesses the connection tracking state of the packets from the conntracking machine. This allows us to know in what state the connection is, and works for pretty much all protocols, including stateless protocols such as ICMP and UDP. In all cases, there will be a default timeout for the connection and it will then be dropped from the connection tracking database. This match needs to be loaded explicitly by adding a -m state statement to the rule. You will then have access to one new match called state. The concept of state matching is covered more fully in the The state machine chapter, since it is such a large topic.

Table 10-21. State matches

Match--state
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -m state --state RELATED,ESTABLISHED
Explanation

This match option tells the state match what states the packets must be in to be matched. There are currently 4 states that can be used. INVALIDESTABLISHEDNEW and RELATEDINVALID means that the packet is associated with no known stream or connection and that it may contain faulty data or headers. ESTABLISHED means that the packet is part of an already established connection that has seen packets in both directions and is fully valid. NEW means that the packet has or will start a new connection, or that it is associated with a connection that has not seen packets in both directions. Finally, RELATED means that the packet is starting a new connection and is associated with an already established connection. This could for example mean an FTP data transfer, or an ICMP error associated with a TCP or UDP connection. Note that the NEW state does not look for SYN bits in TCP packets trying to start a new connection and should, hence, not be used unmodified in cases where we have only one firewall and no load balancing between different firewalls. However, there may be times where this could be useful. For more information on how this could be used, read the The state machine chapter.


10.3.16. TCPMSS match

The tcpmss match is used to match a packet based on the Maximum Segment Size in TCP. This match is only valid for SYN and SYN/ACK packets. For a more complete explanation of the MSS value, see the TCP options appendix, the RFC 793 - Transmission Control Protocol and the RFC 1122 - Requirements for Internet Hosts - Communication Layers documents. This match is loaded using -m tcpmss and takes only one option.

Table 10-22. TCPMSS match options

Match--mss
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp --tcp-flags SYN,ACK,RST SYN -m tcpmss --mss 2000:2500
Explanation

The --mss option tells the tcpmss match which Maximum Segment Sizes to match. This can either be a single specific MSS value, or a range of MSS values separated by a :. The value may also be inverted as usual using the ! sign, as in the following example:

-m tcpmss ! --mss 2000:2500

. This example will match all MSS values, except for values in the range 2000 through 2500.


10.3.17. TOS match

The TOS match can be used to match packets based on their TOS field. TOS stands for Type Of Service, consists of 8 bits, and is located in the IP header. This match is loaded explicitly by adding -m tos to the rule. TOS is normally used to inform intermediate hosts of the precedence of the stream and its content (it doesn't really, but it informs of any specific requirements for the stream, such as it having to be sent as fast as possible, or it needing to be able to send as much payload as possible). How different routers and administrators deal with these values depends. Most do not care at all, while others try their best to do something good with the packets in question and the data they provide.

Table 10-23. TOS matches

Match--tos
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A INPUT -p tcp -m tos --tos 0x16
ExplanationThis match is used as described above. It can match packets based on their TOS field and their value. This could be used, among other things together with the iproute2 and advanced routing functions in Linux, to mark packets for later usage. The match takes a hex or numeric value as an option, or possibly one of the names resulting from 'iptables -m tos -h'. At the time of writing it contained the following named values: Minimize-Delay 16 (0x10), Maximize-Throughput 8 (0x08), Maximize-Reliability 4 (0x04), Minimize-Cost 2 (0x02), and Normal-Service 0 (0x00). Minimize-Delay means to minimize the delay in putting the packets through - example of standard services that would require this include telnet, SSH and FTP-control. Maximize-Throughput means to find a path that allows as big a throughput as possible - a standard protocol would be FTP-data. Maximize-Reliability means to maximize the reliability of the connection and to use lines that are as reliable as possible - a couple of typical examples are BOOTP and TFTP. Minimize-Cost means minimizing the cost of packets getting through each link to the client or server; for example finding the route that costs the least to travel along. Examples of normal protocols that would use this would be RTSP (Real Time Stream Control Protocol) and other streaming video/radio protocols. Finally, Normal-Service would mean any normal protocol that has no special needs.

10.3.18. TTL match

The TTL match is used to match packets based on their TTL (Time To Live) field residing in the IP headers. The TTL field contains 8 bits of data and is decremented once every time it is processed by an intermediate host between the client and recipient host. If the TTL reaches 0, an ICMP type 11 code 0 (TTL equals 0 during transit) or code 1 (TTL equals 0 during reassembly) is transmitted to the party sending the packet and informing it of the problem. This match is only used to match packets based on their TTL, and not to change anything. The latter, incidentally, applies to all kinds of matches. To load this match, you need to add an -m ttl to the rule.

Table 10-24. TTL matches

Match--ttl
Kernel2.3, 2.4, 2.5 and 2.6
Exampleiptables -A OUTPUT -m ttl --ttl 60
ExplanationThis match option is used to specify the TTL value to match. It takes a numeric value and matches this value within the packet. There is no inversion and there are no other specifics to match. It could, for example, be used for debugging your local network - e.g. LAN hosts that seem to have problems connecting to hosts on the Internet - or to find possible ingress by Trojans etc. The usage is relatively limited, however; its usefulness really depends on your imagination. One example would be to find hosts with bad default TTL values (could be due to a badly implemented TCP/IP stack, or simply to misconfiguration).

10.3.19. Unclean match

The unclean match takes no options and requires no more than explicitly loading it when you want to use it. Note that this option is regarded as experimental and may not work at all times, nor will it take care of all unclean packages or problems. The unclean match tries to match packets that seem malformed or unusual, such as packets with bad headers or checksums and so on. This could be used to DROP connections and to check for bad streams, for example; however you should be aware that it could possibly break legal connections.


Chapter 11. Iptables targets and jumps

The target/jumps tells the rule what to do with a packet that is a perfect match with the match section of the rule. There are a couple of basic targets, the ACCEPT and DROP targets, which we will deal with first. However, before we do that, let us have a brief look at how a jump is done.

The jump specification is done in exactly the same way as in the target definition, except that it requires a chain within the same table to jump to. To jump to a specific chain, it is of course a prerequisite that that chain exists. As we have already explained, a user-defined chain is created with the -N command. For example, let's say we create a chain in the filter table called tcp_packets, like this:

iptables -N tcp_packets

We could then add a jump target to it like this:

iptables -A INPUT -p tcp -j tcp_packets

We would then jump from the INPUT chain to the tcp_packets chain and start traversing that chain. When/If we reach the end of that chain, we get dropped back to the INPUT chain and the packet starts traversing from the rule one step below where it jumped to the other chain (tcp_packets in this case). If a packet is ACCEPTed within one of the sub chains, it will be ACCEPT'ed in the superset chain also and it will not traverse any of the superset chains any further. However, do note that the packet will traverse all other chains in the other tables in a normal fashion. For more information on table and chain traversing, see the Traversing of tables and chains chapter.

Targets on the other hand specify an action to take on the packet in question. We could for example, DROP or ACCEPT the packet depending on what we want to do. There are also a number of other actions we may want to take, which we will describe further on in this section. Jumping to targets may incur different results, as it were. Some targets will cause the packet to stop traversing that specific chain and superior chains as described above. Good examples of such rules are DROP and ACCEPT. Rules that are stopped, will not pass through any of the rules further on in the chain or in superior chains. Other targets, may take an action on the packet, after which the packet will continue passing through the rest of the rules. A good example of this would be the LOGULOG and TOS targets. These targets can log the packets, mangle them and then pass them on to the other rules in the same set of chains. We might, for example, want this so that we in addition can mangle both the TTL and the TOS values of a specific packet/stream. Some targets will accept extra options (What TOS value to use etc), while others don't necessarily need any options - but we can include them if we want to (log prefixes, masquerade-to ports and so on). We will try to cover all of these points as we go through the target descriptions. Let us have a look at what kinds of targets there are.


11.1. ACCEPT target

This target needs no further options. As soon as the match specification for a packet has been fully satisfied, and we specify ACCEPT as the target, the rule is accepted and will not continue traversing the current chain or any other ones in the same table. Note however, that a packet that was accepted in one chain might still travel through chains within other tables, and could still be dropped there. There is nothing special about this target whatsoever, and it does not require, nor have the possibility of, adding options to the target. To use this target, we simply specify -j ACCEPT.

 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.2. CLASSIFY target

The CLASSIFY target can be used to classify packets in such a way that can be used by a couple of different qdiscs (Queue Disciplines). For example, atm, cbq, dsmark, pfifo_fast, htb and the prio qdiscs. For more information about qdiscs and traffic controlling, visit the Linux Advanced Routing and Traffic Control HOW-TO webpage.

The CLASSIFY target is only valid in the POSTROUTING chain of the mangle table.

Table 11-1. CLASSIFY target options

Option--set-class
Exampleiptables -t mangle -A POSTROUTING -p tcp --dport 80 -j CLASSIFY --set-class 20:10
ExplanationThe CLASSIFY target only takes one argument, the --set-class. This tells the target how to class the packet. The class takes 2 values separated by a coma sign, like this MAJOR:MINOR. Once again, if you want more information on this, check the Linux Advanced Routing and Traffic Control HOW-TO webpage.
 

Works under Linux kernel 2.5 and 2.6.


11.3. DNAT target

The DNAT target is used to do Destination Network Address Translation, which means that it is used to rewrite the Destination IP address of a packet. If a packet is matched, and this is the target of the rule, the packet, and all subsequent packets in the same stream will be translated, and then routed on to the correct device, host or network. This target can be extremely useful, for example,when you have a host running your web server inside a LAN, but no real IP to give it that will work on the Internet. You could then tell the firewall to forward all packets going to its own HTTP port, on to the real web server within the LAN. We may also specify a whole range of destination IP addresses, and the DNAT mechanism will choose the destination IP address at random for each stream. Hence, we will be able to deal with a kind of load balancing by doing this.

Note that the DNAT target is only available within the PREROUTING and OUTPUT chains in the nat table, and any of the chains called upon from any of those listed chains. Note that chains containing DNAT targets may not be used from any other chains, such as the POSTROUTING chain.

Table 11-2. DNAT target

Option--to-destination
Exampleiptables -t nat -A PREROUTING -p tcp -d 15.45.23.67 --dport 80 -j DNAT --to-destination 192.168.1.1-192.168.1.10
ExplanationThe --to-destination option tells the DNAT mechanism which Destination IP to set in the IP header, and where to send packets that are matched. The above example would send on all packets destined for IP address 15.45.23.67 to a range of LAN IP's, namely 192.168.1.1 through 10. Note, as described previously, that a single stream will always use the same host, and that each stream will randomly be given an IP address that it will always be Destined for, within that stream. We could also have specified only one IP address, in which case we would always be connected to the same host. Also note that we may add a port or port range to which the traffic would be redirected to. This is done by adding, for example, an :80 statement to the IP addresses to which we want to DNAT the packets. A rule could then look like --to-destination 192.168.1.1:80 for example, or like --to-destination 192.168.1.1:80-100 if we wanted to specify a port range. As you can see, the syntax is pretty much the same for the DNAT target, as for the SNAT target even though they do two totally different things. Do note that port specifications are only valid for rules that specify the TCP or UDP protocols with the --protocol option.

Since DNAT requires quite a lot of work to work properly, I have decided to add a larger explanation on how to work with it. Let's take a brief example on how things would be done normally. We want to publish our website via our Internet connection. We only have one IP address, and the HTTP server is located on our internal network. Our firewall has the external IP address $INET_IP, and our HTTP server has the internal IP address $HTTP_IP and finally the firewall has the internal IP address $LAN_IP. The first thing to do is to add the following simple rule to the PREROUTING chain in the nat table:

iptables -t nat -A PREROUTING --dst $INET_IP -p tcp --dport 80 -j DNAT \
--to-destination $HTTP_IP

Now, all packets from the Internet going to port 80 on our firewall are redirected (or DNAT'ed) to our internal HTTP server. If you test this from the Internet, everything should work just perfect. So, what happens if you try connecting from a host on the same local network as the HTTP server? It will simply not work. This is a problem with routing really. We start out by dissecting what happens in a normal case. The external box has IP address $EXT_BOX, to maintain readability.

  1. Packet leaves the connecting host going to $INET_IP and source $EXT_BOX.

  2. Packet reaches the firewall.

  3. Firewall DNAT's the packet and runs the packet through all different chains etcetera.

  4. Packet leaves the firewall and travels to the $HTTP_IP.

  5. Packet reaches the HTTP server, and the HTTP box replies back through the firewall, if that is the box that the routing database has entered as the gateway for $EXT_BOX. Normally, this would be the default gateway of the HTTP server.

  6. Firewall Un-DNAT's the packet again, so the packet looks as if it was replied to from the firewall itself.

  7. Reply packet travels as usual back to the client $EXT_BOX.

Now, we will consider what happens if the packet was instead generated by a client on the same network as the HTTP server itself. The client has the IP address $LAN_BOX, while the rest of the machines maintain the same settings.

  1. Packet leaves $LAN_BOX to $INET_IP.

  2. The packet reaches the firewall.

  3. The packet gets DNAT'ed, and all other required actions are taken, however, the packet is not SNAT'ed, so the same source IP address is used on the packet.

  4. The packet leaves the firewall and reaches the HTTP server.

  5. The HTTP server tries to respond to the packet, and sees in the routing databases that the packet came from a local box on the same network, and hence tries to send the packet directly to the original source IP address (which now becomes the destination IP address).

  6. The packet reaches the client, and the client gets confused since the return packet does not come from the host that it sent the original request to. Hence, the client drops the reply packet, and waits for the "real" reply.

The simple solution to this problem is to SNAT all packets entering the firewall and leaving for a host or IP that we know we do DNAT to. For example, consider the above rule. We SNAT the packets entering our firewall that are destined for $HTTP_IP port 80 so that they look as if they came from $LAN_IP. This will force the HTTP server to send the packets back to our firewall, which Un-DNAT's the packets and sends them on to the client. The rule would look something like this:

iptables -t nat -A POSTROUTING -p tcp --dst $HTTP_IP --dport 80 -j SNAT \
--to-source $LAN_IP

Remember that the POSTROUTING chain is processed last of the chains, and hence the packet will already be DNAT'ed once it reaches that specific chain. This is the reason that we match the packets based on the internal address.

 

This last rule will seriously harm your logging, so it is really advisable not to use this method, but the whole example is still a valid one. What will happen is this, packet comes from the Internet, gets SNAT'ed and DNAT'ed, and finally hits the HTTP server (for example). The HTTP server now only sees the request as if it was coming from the firewall, and hence logs all requests from the internet as if they came from the firewall.

This can also have even more severe implications. Take an SMTP server on the LAN, that allows requests from the internal network, and you have your firewall set up to forward SMTP traffic to it. You have now effectively created an open relay SMTP server, with horrenduously bad logging!

One solution to this problem is to simply make the SNAT rule even more specific in the match part, and to only work on packets that come in from our LAN interface. In other words, add a --src $LAN_IP_RANGE to the whole command as well. This will make the rule only work on streams that come in from the LAN, and hence will not affect the Source IP, so the logs will look correct, except for streams coming from our LAN.

You will, in other words, be better off solving these problems by either setting up a separate DNS server for your LAN, or to actually set up a separate DMZ, the latter being preferred if you have the money.

You think this should be enough by now, and it really is, unless considering one final aspect to this whole scenario. What if the firewall itself tries to access the HTTP server, where will it go? As it looks now, it will unfortunately try to get to its own HTTP server, and not the server residing on $HTTP_IP. To get around this, we need to add a DNAT rule in the OUTPUT chain as well. Following the above example, this should look something like the following:

iptables -t nat -A OUTPUT --dst $INET_IP -p tcp --dport 80 -j DNAT \
--to-destination $HTTP_IP

Adding this final rule should get everything up and running. All separate networks that do not sit on the same net as the HTTP server will run smoothly, all hosts on the same network as the HTTP server will be able to connect and finally, the firewall will be able to do proper connections as well. Now everything works and no problems should arise.

 

Everyone should realize that these rules only affect how the packet is DNAT'ed and SNAT'ed properly. In addition to these rules, you may also need extra rules in the filter table (FORWARD chain) to allow the packets to traverse through those chains as well. Don't forget that all packets have already gone through the PREROUTING chain, and should hence have their destination addresses rewritten already by DNAT.

 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.4. DROP target

The DROP target does just what it says, it drops packets dead and will not carry out any further processing. A packet that matches a rule perfectly and is then Dropped will be blocked. Note that this action might in certain cases have an unwanted effect, since it could leave dead sockets around on either host. A better solution in cases where this is likely would be to use the REJECT target, especially when you want to block port scanners from getting too much information, such as on filtered ports and so on. Also note that if a packet has the DROP action taken on it in a subchain, the packet will not be processed in any of the main chains either in the present or in any other table. The packet is in other words totally dead. As we've seen previously, the target will not send any kind of information in either direction, nor to intermediaries such as routers.

 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.5. DSCP target

This is a target that changes the DSCP (Differentiated Services Field) marks inside a packet. The DSCP target is able to set any DSCP value inside a TCP packet, which is a way of telling routers the priority of the packet in question. For more information about DSCP, look at the RFC 2474 - Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers RFC document.

Basically, DSCP is a way of differentiating different services into separate categories, and based on this, give them different priority through the routers. This way, you can give interactive TCP sessions (such as telnet, SSH, POP3) a very high fast connection, that may not be very suitable for large bulk transfers. If on the other hand the connection is one of low importance (SMTP, or whatever you classify as low priority), you could send it over a large bulky network with worse latency than the other network, that is cheaper to utilize than the faster and lower latency connections.

Table 11-3. DSCP target options

Option--set-dscp
Exampleiptables -t mangle -A FORWARD -p tcp --dport 80 -j DSCP --set-dscp 1
ExplanationThis sets the DSCP value to the specified value. The values can be set either via class, see below, or with the --set-dscp, which takes either an integer value, or a hex value.
Option--set-dscp-class
Exampleiptables -t mangle -A FORWARD -p tcp --dport 80 -j DSCP --set-dscp-class EF
ExplanationThis sets the DSCP field according to a predefined DiffServ class. Some of the possible values are EF, BE and the CSxx and AFxx values available. You can find more information at Implementing Quality of Service Policies with DSCP site. Do note that the --set-dscp-class and --set-dscp commands are mutually exclusive, which means you can not use both of them in the same command!
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.6. ECN target

This target can be great, used in the correct way. Simply put, the ECN target can be used to reset the ECN bits from the IPv4 header, or to put it correctly, reset them to 0 at least. Since ECN is a relatively new thing on the net, there are problems with it. For example, it uses 2 bits that are defined in the original RFC for the TCP protocol to be 0. Some routers and other internet appliances will not forward packets that have these bits set to 1. If you want to make use of at least parts of the ECN functionality from your hosts, you could for example reset the ECN bits to 0 for specific networks that you know you are having troubles reaching because of ECN.

 

Please do note that it isn't possible to turn ECN on in the middle of a stream. It isn't allowed according to the RFC's, and it isn't possible anyways. Both endpoints of the stream must negotiate ECN. If we turn it on, then one of the hosts is not aware of it, and can't respond properly to the ECN notifications.

Table 11-4. ECN target options

Option--ecn-tcp-remove
Exampleiptables -t mangle -A FORWARD -p tcp --dport 80 -j ECN --ecn-tcp-remove
ExplanationThe ECN target only takes one argument, the --ecn-tcp-remove argument. This tells the target to remove the ECN bits inside the TCP headers. Read above for more information.
 

Works under Linux kernel 2.5 and 2.6.


11.7. LOG target options

The LOG target is specially designed for logging detailed information about packets. These could, for example, be considered as illegal. Or, logging can be used purely for bug hunting and error finding. The LOG target will return specific information on packets, such as most of the IP headers and other information considered interesting. It does this via the kernel logging facility, normally syslogd. This information may then be read directly with dmesg, or from the syslogd logs, or with other programs or applications. This is an excellent target to use to debug your rule-sets, so that you can see what packets go where and what rules are applied on what packets. Note as well that it could be a really great idea to use the LOG target instead of the DROP target while you are testing a rule you are not 100% sure about on a production firewall, since a syntax error in the rule-sets could otherwise cause severe connectivity problems for your users. Also note that the ULOG target may be interesting if you are using really extensive logging, since the ULOG target has support for direct logging to MySQL databases and suchlike.

 

Note that if you get undesired logging direct to consoles, this is not an iptables or Netfilter problem, but rather a problem caused by your syslogd configuration - most probably /etc/syslog.conf. Read more in man syslog.conf for information about this kind of problem.

You may also need to tweak your dmesg settings. dmesg is the command that changes which errors from the kernel that should be shown on the console. dmesg -n 1 should prevent all messages from showing up on the console, except panic messages. The dmesg message levels matches exactly the syslogd levels, and it only works on log messages from the kernel facility. For more information, see man dmesg.

The LOG target currently takes five options that could be of interest if you have specific information needs, or want to set different options to specific values. They are all listed below.

Table 11-5. LOG target options

Option--log-level
Exampleiptables -A FORWARD -p tcp -j LOG --log-level debug
ExplanationThis is the option to tell iptables and syslog which log level to use. For a complete list of log levels read the syslog.conf manual. Normally there are the following log levels, or priorities as they are normally referred to: debug, info, notice, warning, warn, err, error, crit, alert, emerg and panic. The keyword error is the same as err, warn is the same as warning and panic is the same as emerg. Note that all three of these are deprecated, in other words do not use error, warn and panic. The priority defines the severity of the message being logged. All messages are logged through the kernel facility. In other words, setting kern.=info /var/log/iptables in your syslog.conf file and then letting all your LOG messages in iptables use log level info, would make all messages appear in the /var/log/iptables file. Note that there may be other messages here as well from other parts of the kernel that uses the info priority. For more information on logging I recommend you to read the syslog and syslog.conf man-pages as well as other HOWTOs etc.
Option--log-prefix
Exampleiptables -A INPUT -p tcp -j LOG --log-prefix "INPUT packets"
ExplanationThis option tells iptables to prefix all log messages with a specific prefix, which can then easily be combined with grep or other tools to track specific problems and output from different rules. The prefix may be up to 29 letters long, including white-spaces and other special symbols.
Option--log-tcp-sequence
Exampleiptables -A INPUT -p tcp -j LOG --log-tcp-sequence
ExplanationThis option will log the TCP Sequence numbers, together with the log message. The TCP Sequence numbers are special numbers that identify each packet and where it fits into a TCP sequence, as well as how the stream should be reassembled. Note that this option constitutes a security risk if the logs are readable by unauthorized users, or by the world for that matter. As does any log that contains output from iptables.
Option--log-tcp-options
Exampleiptables -A FORWARD -p tcp -j LOG --log-tcp-options
ExplanationThe --log-tcp-options option logs the different options from the TCP packet headers and can be valuable when trying to debug what could go wrong, or what has actually gone wrong. This option does not take any variable fields or anything like that, just as most of the LOG options don't.
Option--log-ip-options
Exampleiptables -A FORWARD -p tcp -j LOG --log-ip-options
ExplanationThe --log-ip-options option will log most of the IP packet header options. This works exactly the same as the --log-tcp-options option, but instead works on the IP options. These logging messages may be valuable when trying to debug or track specific culprits, as well as for debugging - in just the same way as the previous option.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.8. MARK target

The MARK target is used to set Netfilter mark values that are associated with specific packets. This target is only valid in the mangle table, and will not work outside there. The MARK values may be used in conjunction with the advanced routing capabilities in Linux to send different packets through different routes and to tell them to use different queue disciplines (qdisc), etc. For more information on advanced routing, check out the Linux Advanced Routing and Traffic Control HOW-TO. Note that the mark value is not set within the actual package, but is a value that is associated within the kernel with the packet. In other words, you can not set a MARK for a packet and then expect the MARK still to be there on another host. If this is what you want, you will be better off with the TOS target which will mangle the TOS value in the IP header.

Table 11-6. MARK target options

Option--set-mark
Exampleiptables -t mangle -A PREROUTING -p tcp --dport 22 -j MARK --set-mark 2
ExplanationThe --set-mark option is required to set a mark. The --set-mark match takes an integer value. For example, we may set mark 2 on a specific stream of packets, or on all packets from a specific host and then do advanced routing on that host, to decrease or increase the network bandwidth, etc.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.9. MASQUERADE target

The MASQUERADE target is used basically the same as the SNAT target, but it does not require any --to-source option. The reason for this is that the MASQUERADE target was made to work with, for example, dial-up connections, or DHCP connections, which gets dynamic IP addresses when connecting to the network in question. This means that you should only use the MASQUERADE target with dynamically assigned IP connections, which we don't know the actual address of at all times. If you have a static IP connection, you should instead use the SNAT target.

When you masquerade a connection, it means that we set the IP address used on a specific network interface instead of the --to-source option, and the IP address is automatically grabbed from the information about the specific interface. The MASQUERADE target also has the effect that connections are forgotten when an interface goes down, which is extremely good if we, for example, kill a specific interface. If we would have used the SNAT target, we may have been left with a lot of old connection tracking data, which would be lying around for days, swallowing up useful connection tracking memory. This is, in general, the correct behavior when dealing with dial-up lines that are probably assigned a different IP every time they are brought up. In case we are assigned a different IP, the connection is lost anyways, and it is more or less idiotic to keep the entry around.

It is still possible to use the MASQUERADE target instead of SNAT even though you do have a static IP, however, it is not favorable since it will add extra overhead, and there may be inconsistencies in the future which will thwart your existing scripts and render them "unusable".

Note that the MASQUERADE target is only valid within the POSTROUTING chain in the nat table, just as the SNAT target. The MASQUERADE target takes one option specified below, which is optional.

Table 11-7. MASQUERADE target

Option--to-ports
Exampleiptables -t nat -A POSTROUTING -p TCP -j MASQUERADE --to-ports 1024-31000
ExplanationThe --to-ports option is used to set the source port or ports to use on outgoing packets. Either you can specify a single port like --to-ports 1025 or you may specify a port range as --to-ports 1024-3000. In other words, the lower port range delimiter and the upper port range delimiter separated with a hyphen. This alters the default SNAT port-selection as described in the SNAT target section. The --to-ports option is only valid if the rule match section specifies the TCP or UDP protocols with the --protocol match.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.10. MIRROR target

 

Be warned, the MIRROR is dangerous and was only developed as an example code of the new conntrack and NAT code. It can cause dangerous things to happen, and very serious DDoS/DoS will be possible if used improperly. Avoif using it at all costs! It was removed from 2.5 and 2.6 kernels due to it's bad security implications!

The MIRROR target is an experimental and demonstration target only, and you are warned against using it, since it may result in really bad loops hence, among other things, resulting in serious Denial of Service. The MIRROR target is used to invert the source and destination fields in the IP header, and then to retransmit the packet. This can cause some really funny effects, and I'll bet that, thanks to this target, not just one red faced cracker has cracked his own box by now. The effect of using this target is stark, to say the least. Let's say we set up a MIRROR target for port 80 at computer A. If host B were to come from yahoo.com, and try to access the HTTP server at host A, the MIRROR target would return the yahoo host's own web page (since this is where the request came from).

Note that the MIRROR target is only valid within the INPUT, FORWARD and PREROUTING chains, and any user-defined chains which are called from those chains. Also note that outgoing packets resulting from the MIRROR target are not seen by any of the normal chains in the filter, nat or mangle tables, which could give rise to loops and other problems. This could make the target the cause of unforeseen headaches. For example, a host might send a spoofed packet to another host that uses the MIRROR command with a TTL of 255, at the same time spoofing its own packet, so as to seem as if it comes from a third host that uses the MIRROR command. The packet will then bounce back and forth incessantly, for the number of hops there are to be completed. If there is only 1 hop, the packet will jump back and forth 240-255 times. Not bad for a cracker, in other words, to send 1500 bytes of data and eat up 380 kbyte of your connection. Note that this is a best case scenario for the cracker or script kiddie, whatever we want to call them.

 

Works under Linux kernel 2.3 and 2.4. It was removed from 2.5 and 2.6 kernels due to it's inherent insecurity. Do not use this target!


11.11. NETMAP target

NETMAP is a new implementation of the SNAT and DNAT targets where the host part of the IP address isn't changed. It provides a 1:1 NAT function for whole networks which isn't available in the standard SNAT and DNAT functions. For example, lets say we have a network containing 254 hosts using private IP addresses (a /24 network), and we just got a new /24 network of public IP's. Instead of walking around and changing the IP of each and every one of the hosts, we would be able to simply use the NETMAP target like -j NETMAP -to 10.5.6.0/24 and voila, all the hosts are seen as 10.5.6.x when they leave the firewall. For example, 192.168.0.26 would become 10.5.6.26.

Table 11-8. NETMAP target options

Option--to
Exampleiptables -t mangle -A PREROUTING -s 192.168.1.0/24 -j NETMAP --to 10.5.6.0/24
ExplanationThis is the only option of the NETMAP target. In the above example, the 192.168.1.x hosts will be directly translated into 10.5.6.x.
 

Works under Linux kernel 2.5 and 2.6.


11.12. QUEUE target

The QUEUE target is used to queue packets to User-land programs and applications. It is used in conjunction with programs or utilities that are extraneous to iptables and may be used, for example, with network accounting, or for specific and advanced applications which proxy or filter packets. We will not discuss this target in depth, since the coding of such applications is out of the scope of this tutorial. First of all it would simply take too much time, and secondly such documentation does not have anything to do with the programming side of Netfilter and iptables. All of this should be fairly well covered in the Netfilter Hacking HOW-TO.

 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.13. REDIRECT target

The REDIRECT target is used to redirect packets and streams to the machine itself. This means that we could for example REDIRECT all packets destined for the HTTP ports to an HTTP proxy like squid, on our own host. Locally generated packets are mapped to the 127.0.0.1 address. In other words, this rewrites the destination address to our own host for packets that are forwarded, or something alike. The REDIRECT target is extremely good to use when we want, for example, transparent proxying, where the LAN hosts do not know about the proxy at all.

Note that the REDIRECT target is only valid within the PREROUTING and OUTPUT chains of the nat table. It is also valid within user-defined chains that are only called from those chains, and nowhere else. The REDIRECT target takes only one option, as described below.

Table 11-9. REDIRECT target

Option--to-ports
Exampleiptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8080
Explanation

The --to-ports option specifies the destination port, or port range, to use. Without the --to-ports option, the destination port is never altered. This is specified, as above, --to-ports 8080 in case we only want to specify one port. If we would want to specify a port range, we would do it like --to-ports 8080-8090, which tells the REDIRECT target to redirect the packets to the ports 8080 through 8090. Note that this option is only available in rules specifying the TCP or UDP protocol with the --protocol matcher, since it wouldn't make any sense anywhere else.

 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.14. REJECT target

The REJECT target works basically the same as the DROP target, but it also sends back an error message to the host sending the packet that was blocked. The REJECT target is as of today only valid in the INPUT, FORWARD and OUTPUT chains or their sub chains. After all, these would be the only chains in which it would make any sense to put this target. Note that all chains that use the REJECT target may only be called by the INPUT, FORWARD, and OUTPUT chains, else they won't work. There is currently only one option which controls the nature of how this target works, though this may in turn take a huge set of variables. Most of them are fairly easy to understand, if you have a basic knowledge of TCP/IP.

Table 11-10. REJECT target

Option--reject-with
Exampleiptables -A FORWARD -p TCP --dport 22 -j REJECT --reject-with tcp-reset
ExplanationThis option tells the REJECT target what response to send to the host that sent the packet that we are rejecting. Once we get a packet that matches a rule in which we have specified this target, our host will first of all send the associated reply, and the packet will then be dropped dead, just as the DROP target would drop it. The following reject types are currently valid: icmp-net-unreachable, icmp-host-unreachable, icmp-port-unreachable, icmp-proto-unreachable, icmp-net-prohibited and icmp-host-prohibited. The default error message is to send a port-unreachable to the host. All of the above are ICMP error messages and may be set as you wish. You can find further information on their various purposes in the appendix ICMP types. Finally, there is one more option called tcp-reset, which may only be used together with the TCP protocol. The tcp-reset option will tell REJECT to send a TCP RST packet in reply to the sending host. TCP RST packets are used to close open TCP connections gracefully. For more information about the TCP RST read RFC 793 - Transmission Control Protocol. As stated in the iptables man page, this is mainly useful for blocking ident probes which frequently occur when sending mail to broken mail hosts, that won't otherwise accept your mail.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.15. RETURN target

The RETURN target will cause the current packet to stop traveling through the chain where it hit the rule. If it is the subchain of another chain, the packet will continue to travel through the superior chains as if nothing had happened. If the chain is the main chain, for example the INPUT chain, the packet will have the default policy taken on it. The default policy is normally set to ACCEPTDROP or similar.

For example, let's say a packet enters the INPUT chain and then hits a rule that it matches and that tells it to --jump EXAMPLE_CHAIN. The packet will then start traversing the EXAMPLE_CHAIN, and all of a sudden it matches a specific rule which has the --jump RETURN target set. It will then jump back to the INPUT chain. Another example would be if the packet hit a --jump RETURN rule in the INPUT chain. It would then be dropped to the default policy as previously described, and no more actions would be taken in this chain.

 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.16. SAME target

The SAME target works almost in the same fashion as the SNAT target, but it still differs. Basically, the SAME target will try to always use the same outgoing IP address for all connections initiated by a single host on your network. For example, say you have one /24 network (192.168.1.0) and 3 IP addresses (10.5.6.7-9). Now, if 192.168.1.20 went out through the .7 address the first time, the firewall will try to keep that machine always going out through that IP address.

Table 11-11. SAME target options

Option--to
Exampleiptables -t mangle -A PREROUTING -s 192.168.1.0/24 -j SAME --to 10.5.6.7-10.5.6.9
ExplanationAs you can see, the --to argument takes 2 IP addresses bound together by a - sign. These IP addresses, and all in between, are the IP addresses that we NAT to using the SAME algorithm.
Option--nodst
Exampleiptables -t mangle -A PREROUTING -s 192.168.1.0/24 -j SAME --to 10.5.6.7-10.5.6.9 --nodst
ExplanationUnder normal action, the SAME target is calculating the followup connections based on both destination and source IP addresses. Using the --nodst option, it uses only the source IP address to find out which outgoing IP the NAT function should use for the specific connection. Without this argument, it uses a combination of the destination and source IP address.
 

Works under Linux kernel 2.5 and 2.6.


11.17. SNAT target

The SNAT target is used to do Source Network Address Translation, which means that this target will rewrite the Source IP address in the IP header of the packet. This is what we want, for example, when several hosts have to share an Internet connection. We can then turn on ip forwarding in the kernel, and write an SNAT rule which will translate all packets going out from our local network to the source IP of our own Internet connection. Without doing this, the outside world would not know where to send reply packets, since our local networks mostly use the IANA specified IP addresses which are allocated for LAN networks. If we forwarded these packets as is, no one on the Internet would know that they were actually from us. The SNAT target does all the translation needed to do this kind of work, letting all packets leaving our LAN look as if they came from a single host, which would be our firewall.

The SNAT target is only valid within the nat table, within the POSTROUTING chain. This is in other words the only chain in which you may use SNAT. Only the first packet in a connection is mangled by SNAT, and after that all future packets using the same connection will also be SNATted. Furthermore, the initial rules in the POSTROUTING chain will be applied to all the packets in the same stream.

Table 11-12. SNAT target options

Option--to-source
Exampleiptables -t nat -A POSTROUTING -p tcp -o eth0 -j SNAT --to-source 194.236.50.155-194.236.50.160:1024-32000
ExplanationThe --to-source option is used to specify which source the packet should use. This option, at its simplest, takes one IP address which we want to use for the source IP address in the IP header. If we want to balance between several IP addresses, we can use a range of IP addresses, separated by a hyphen. The --to--source IP numbers could then, for instance, be something like in the above example: 194.236.50.155-194.236.50.160. The source IP for each stream that we open would then be allocated randomly from these, and a single stream would always use the same IP address for all packets within that stream. We can also specify a range of ports to be used by SNAT. All the source ports would then be confined to the ports specified. The port bit of the rule would then look like in the example above, :1024-32000. This is only valid if -p tcp or -p udp was specified somewhere in the match of the rule in question. iptables will always try to avoid making any port alterations if possible, but if two hosts try to use the same ports, iptables will map one of them to another port. If no port range is specified, then if they're needed, all source ports below 512 will be mapped to other ports below 512. Those between source ports 512 and 1023 will be mapped to ports below 1024. All other ports will be mapped to 1024 or above. As previously stated, iptables will always try to maintain the source ports used by the actual workstation making the connection. Note that this has nothing to do with destination ports, so if a client tries to make contact with an HTTP server outside the firewall, it will not be mapped to the FTP control port.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.18. TCPMSS target

The TCPMSS target can be used to alter the MSS (Maximum Segment Size) value of TCP SYN packets that the firewall sees. The MSS value is used to control the maximum size of packets for specific connections. Under normal circumstances, this means the size of the MTU (Maximum Transfer Unit) value, minus 40 bytes. This is used to overcome some ISP's and servers that block ICMP fragmentation needed packets, which can result in really weird problems which can mainly be described such that everything works perfectly from your firewall/router, but your local hosts behind the firewall can't exchange large packets. This could mean such things as mail servers being able to send small mails, but not large ones, web browsers that connect but then hang with no data received, and ssh connecting properly, but scp hangs after the initial handshake. In other words, everything that uses any large packets will be unable to work.

The TCPMSS target is able to solve these problems, by changing the size of the packets going out through a connection. Please note that we only need to set the MSS on the SYN packet since the hosts take care of the MSS after that. The target takes two arguments.

Table 11-13. TCPMSS target options

Option--set-mss
Exampleiptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o eth0 -j TCPMSS --set-mss 1460
ExplanationThe --set-mss argument explicitly sets a specific MSS value of all outgoing packets. In the example above, we set the MSS of all SYN packets going out over the eth0 interface to 1460 bytes -- normal MTU for ethernet is 1500 bytes, minus 40 bytes is 1460 bytes. MSS only has to be set properly in the SYN packet, and then the peer hosts take care of the MSS automatically.
Option--clamp-mss-to-pmtu
Exampleiptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o eth0 -j TCPMSS --clamp-mss-to-pmtu
ExplanationThe --clamp-mss-to-pmtu automatically sets the MSS to the proper value, hence you don't need to explicitly set it. It is automatically set to PMTU (Path Maximum Transfer Unit) minus 40 bytes, which should be a reasonable value for most applications.
 

Works under Linux kernel 2.5 and 2.6.


11.19. TOS target

The TOS target is used to set the Type of Service field within the IP header. The TOS field consists of 8 bits which are used to help in routing packets. This is one of the fields that can be used directly within iproute2 and its subsystem for routing policies. Worth noting, is that if you handle several separate firewalls and routers, this is the only way to propagate routing information within the actual packet between these routers and firewalls. As previously noted, the MARK target - which sets a MARK associated with a specific packet - is only available within the kernel, and can't be propagated with the packet. If you feel a need to propagate routing information for a specific packet or stream, you should therefore set the TOS field, which was developed for this.

There are currently a lot of routers on the Internet which do a pretty bad job at this, so as of now it may prove to be a bit useless to attempt TOS mangling before sending the packets on to the Internet. At best the routers will not pay any attention to the TOS field. At worst, they will look at the TOS field and do the wrong thing. However, as stated above, the TOS field can most definitely be put to good use if you have a large WAN or LAN with multiple routers. You then in fact have the possibility of giving packets different routes and preferences, based on their TOS value - even though this might be confined to your own network.

 

The TOS target is only capable of setting specific values, or named values on packets. These predefined TOS values can be found in the kernel include files, or more precisely, the Linux/ip.h file. The reasons are many, and you should actually never need to set any other values; however, there are ways around this limitation. To get around the limitation of only being able to set the named values on packets, you can use the FTOS patch available at the Paksecured Linux Kernel patches site maintained by Matthew G. Marsh. However, be cautious with this patch! You should not need to use any other than the default values, except in extreme cases.

 

Note that this target is only valid within the mangle table and can't be used outside it.

 

Also note that some old versions (1.2.2 or below) of iptables provided a broken implementation of this target which did not fix the packet checksum upon mangling, hence rendering the packets bad and in need of retransmission. That in turn would most probably lead to further mangling and the connection never working.

The TOS target only takes one option as described below.

Table 11-14. TOS target

Option--set-tos
Exampleiptables -t mangle -A PREROUTING -p TCP --dport 22 -j TOS --set-tos 0x10
ExplanationThe --set-tos option tells the TOS mangler what TOS value to set on packets that are matched. The option takes a numeric value, either in hex or in decimal value. As the TOS value consists of 8 bits, the value may be 0-255, or in hex 0x00-0xFF. Note that in the standard TOS target you are limited to using the named values available (which should be more or less standardized), as mentioned in the previous warning. These values are Minimize-Delay (decimal value 16, hex value 0x10), Maximize-Throughput (decimal value 8, hex value 0x08), Maximize-Reliability (decimal value 4, hex value 0x04), Minimize-Cost (decimal value 2, hex 0x02) or Normal-Service (decimal value 0, hex value 0x00). The default value on most packets is Normal-Service, or 0. Note that you can, of course, use the actual names instead of the actual hex values to set the TOS value; in fact this is generally to be recommended, since the values associated with the names may be changed in future. For a complete listing of the "descriptive values", do an iptables -j TOS -h. This listing is complete as of iptables 1.2.5 and should hopefully remain so for a while.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.20. TTL target

 

This patch requires the TTL patch from the patch-o-matic tree available in the base directory from http://www.netfilter.org/.

The TTL target is used to modify the Time To Live field in the IP header. One useful application of this is to change all Time To Live values to the same value on all outgoing packets. One reason for doing this is if you have a bully ISP which don't allow you to have more than one machine connected to the same Internet connection, and who actively pursues this. Setting all TTL values to the same value, will effectively make it a little bit harder for them to notice that you are doing this. We may then reset the TTL value for all outgoing packets to a standardized value, such as 64 as specified in the Linux kernel.

For more information on how to set the default value used in Linux, read the ip-sysctl.txt, which you may find within the Other resources and links appendix.

The TTL target is only valid within the mangle table, and nowhere else. It takes 3 options as of writing this, all of them described below in the table.

Table 11-15. TTL target

Option--ttl-set
Exampleiptables -t mangle -A PREROUTING -i eth0 -j TTL --ttl-set 64
ExplanationThe --ttl-set option tells the TTL target which TTL value to set on the packet in question. A good value would be around 64 somewhere. It's not too long, and it is not too short. Do not set this value too high, since it may affect your network and it is a bit immoral to set this value to high, since the packet may start bouncing back and forth between two mis-configured routers, and the higher the TTL, the more bandwidth will be eaten unnecessarily in such a case. This target could be used to limit how far away our clients are. A good case of this could be DNS servers, where we don't want the clients to be too far away.
Option--ttl-dec
Exampleiptables -t mangle -A PREROUTING -i eth0 -j TTL --ttl-dec 1
ExplanationThe --ttl-dec option tells the TTL target to decrement the Time To Live value by the amount specified after the --ttl-dec option. In other words, if the TTL for an incoming packet was 53 and we had set --ttl-dec 3, the packet would leave our host with a TTL value of 49. The reason for this is that the networking code will automatically decrement the TTL value by 1, hence the packet will be decremented by 4 steps, from 53 to 49. This could for example be used when we want to limit how far away the people using our services are. For example, users should always use a close-by DNS, and hence we could match all packets leaving our DNS server and then decrease it by several steps. Of course, the --set-ttl may be a better idea for this usage.
Option--ttl-inc
Exampleiptables -t mangle -A PREROUTING -i eth0 -j TTL --ttl-inc 1
ExplanationThe --ttl-inc option tells the TTL target to increment the Time To Live value with the value specified to the --ttl-inc option. This means that we should raise the TTL value with the value specified in the --ttl-inc option, and if we specified --ttl-inc 4, a packet entering with a TTL of 53 would leave the host with TTL 56. Note that the same thing goes here, as for the previous example of the --ttl-dec option, where the network code will automatically decrement the TTL value by 1, which it always does. This may be used to make our firewall a bit more stealthy to trace-routes among other things. By setting the TTL one value higher for all incoming packets, we effectively make the firewall hidden from trace-routes. Trace-routes are a loved and hated thing, since they provide excellent information on problems with connections and where it happens, but at the same time, it gives the hacker/cracker some good information about your upstreams if they have targeted you. For a good example on how this could be used, see the Ttl-inc.txt script.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


11.21. ULOG target

The ULOG target is used to provide user-space logging of matching packets. If a packet is matched and the ULOG target is set, the packet information is multicasted together with the whole packet through a netlink socket. One or more user-space processes may then subscribe to various multicast groups and receive the packet. This is in other words a more complete and more sophisticated logging facility that is only used by iptables and Netfilter so far, and it contains much better facilities for logging packets. This target enables us to log information to MySQL databases, and other databases, making it much simpler to search for specific packets, and to group log entries. You can find the ULOGD user-land applications at the ULOGD project page.

Table 11-16. ULOG target

Option--ulog-nlgroup
Exampleiptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-nlgroup 2
ExplanationThe --ulog-nlgroup option tells the ULOG target which netlink group to send the packet to. There are 32 netlink groups, which are simply specified as 1-32. If we would like to reach netlink group 5, we would simply write --ulog-nlgroup 5. The default netlink group used is 1.
Option--ulog-prefix
Exampleiptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-prefix "SSH connection attempt: "
ExplanationThe --ulog-prefix option works just the same as the prefix value for the standard LOG target. This option prefixes all log entries with a user-specified log prefix. It can be 32 characters long, and is definitely most useful to distinguish different log-messages and where they came from.
Option--ulog-cprange
Exampleiptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-cprange 100
ExplanationThe --ulog-cprange option tells the ULOG target how many bytes of the packet to send to the user-space daemon of ULOG. If we specify 100 as above, we would copy 100 bytes of the whole packet to user-space, which would include the whole header hopefully, plus some leading data within the actual packet. If we specify 0, the whole packet will be copied to user-space, regardless of the packets size. The default value is 0, so the whole packet will be copied to user-space.
Option--ulog-qthreshold
Exampleiptables -A INPUT -p TCP --dport 22 -j ULOG --ulog-qthreshold 10
ExplanationThe --ulog-qthreshold option tells the ULOG target how many packets to queue inside the kernel before actually sending the data to user-space. For example, if we set the threshold to 10 as above, the kernel would first accumulate 10 packets inside the kernel, and then transmit it outside to the user-space as one single netlink multi part message. The default value here is 1 because of backward compatibility, the user-space daemon did not know how to handle multi-part messages previously.
 

Works under Linux kernel 2.3, 2.4, 2.5 and 2.6.


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值