SIP is structured as a layered protocol, which means that its
behavior is described in terms of a set of fairly independent
processing stages with only a loose coupling between each stage. The
protocol behavior is described as layers for the purpose of
presentation, allowing the description of functions common across
elements in a single section. It does not dictate an implementation
in any way. When we say that an element "contains" a layer, we mean
it is compliant to the set of rules defined by that layer.
Not every element specified by the protocol contains every layer.
Furthermore, the elements specified by SIP are logical elements, not
physical ones. A physical realization can choose to act as different
logical elements, perhaps even on a transaction-by-transaction basis.
The lowest layer of SIP is its syntax and encoding. Its encoding is
specified using an augmented Backus-Naur Form grammar (BNF). The
complete BNF is specified in Section 25; an overview of a SIP
message's structure can be found in Section 7.
The second layer is the transport layer. It defines how a client
sends requests and receives responses and how a server receives
requests and sends responses over the network. All SIP elements
contain a transport layer. The transport layer is described in
Section 18.
The third layer is the transaction layer. Transactions are a
fundamental component of SIP. A transaction is a request sent by a
client transaction (using the transport layer) to a server
transaction, along with all responses to that request sent from the
server transaction back to the client. The transaction layer handles
application-layer retransmissions, matching of responses to requests,
and application-layer timeouts. Any task that a user agent client
(UAC) accomplishes takes place using a series of transactions.
Discussion of transactions can be found in Section 17. User agents
contain a transaction layer, as do stateful proxies. Stateless
proxies do not contain a transaction layer. The transaction layer
has a client component (referred to as a client transaction) and a
server component (referred to as a server transaction), each of which
are represented by a finite state machine that is constructed to
process a particular request.