SIP:A Simple Session Establishment Example
Figure 2.1 shows the SIP message exchange between two SIP-enabled devices. The two devices could be SIP phones, hand-helds, palmtops, or cell phones. It is assumed that both devices are connected to an IP network such as the Internet and know each other's IP address.
The calling party, Tesla, begins the message exchange by sending a SIP INVITE message to the called party, Marconi. The INVITE contains the details of the type of session or call that is requested. It could be a simple voice (audio) session, a multimedia session such as a video conference, or it could be a gaming session.
The INVITE message contains the following fields:
INVITE sip:email@example.com SIP/2.0 Via: SIP/2.0/UDP lab.high-voltage.org:5060;branch=z9hG4bKfw19b Max-Forwards: 70 To: G. Marconi <sip:Marconi@radio.org> From: Nikola Tesla <sip:firstname.lastname@example.org>;tag=76341 Call-ID: email@example.com CSeq: 1 INVITE Subject: About That Power Outage... Contact: <sip:firstname.lastname@example.org> Content-Type: application/sdp Content-Length: 158 v=0 o=Tesla 2890844526 2890844526 IN IP4 lab.high-voltage.org s=Phone Call c=IN IP4 100.101.102.103 t=0 0 m=audio 49170 RTP/AVP 0 a=rtpmap:0 PCMU/8000
The fields listed in the INVITE message are called header fields. They have the form Header: Value CRLF. The first line of the request message, called the start line, lists the method, which is INVITE, the Request-URI, then the SIP version number (2.0), all separated by spaces. Each line of a SIP message is terminated by a CRLF. The Request-URI is a special form of SIP URI and indicates the resource to which the request is being sent, also known as the request target. SIP URIs are discussed in more detail in later sections.
The first header field following the start line shown is a Via header field. Each SIP device that originates or forwards a SIP message stamps its own address in a Via header field, usually written as a host name that can be resolved into an IP address using a DNS query. The Via header field contains the SIP version number (2.0), a "/", then UDP for UDP transport, a space, then the hostname or address, a colon, then a port number, in this example the "well-known" SIP port number 5060. Transport of SIP using TCP, UDP, TLS, and SCTP and the use of port numbers are covered later in this chapter. The branch parameter is a transaction identifier. Responses relating to this request can be correlated because they will contain this same transaction identifier.
The next header field shown is the Max-Forwards header field, which is initialized to some large integer and decremented by each SIP server, which receives and forwards the request, providing simple loop detection.
The next header fields are the To and From header fields, which show the originator and destination of the SIP request. When a name label is used, as in this example, the SIP URI is enclosed in brackets and used for routing the request. The name label could be displayed during alerting, for example, but is not used by the protocol.
The Call-ID header field is an identifier used to keep track of a particular SIP session. The originator of the request creates a locally unique string, then usually adds an "@" and its host name to make it globally unique. In addition to the Call-ID, each party in the session also contributes a random identifier, unique for each call. These identifiers, called tags, are included in the To and From header fields as the session is established. The initial INVITE shown contains a From tag but no To tag.
The user agent that generates the initial INVITE to establish the session generates the unique Call-ID and From tag. In the response to the INVITE, the user agent answering the request will generate the To tag. The combination of the local tag (contained in the From header field), remote tag (contained in the To header field), and the Call-ID uniquely identifies the established session, known as a "dialog". This dialog identifier is used by both parties to identify this call because they could have multiple calls set up between them. Subsequent requests within the established session will use this dialog identifier, as will be shown in the following examples.
The next header field shown is the CSeq, or command sequence. It contains a number, followed by the method name, INVITE in this case. This number is incremented for each new request sent. In this example, the command sequence number is initialized to 1, but it could start at another integer value.
The Via header fields plus the Max-Forwards, To, From, Call-ID, and CSeq header fields represent the minimum required header field set in any SIP request message. Other header fields can be included as optional additional information, or information needed for a specific request type. A Contact header field is also required in this INVITE message, which contains the SIP URI of Tesla's communication device, known as a user agent (UA); this URI can be used to route messages directly to Tesla. The optional Subject header field is present in this example. It is not used by the protocol, but could be displayed during alerting to aid the called party in deciding whether to accept the call. The same sort of useful prioritization and screening commonly performed using the Subject and From header fields in an e-mail message is also possible with a SIP INVITE request. Additional header fields are present in this INVITE message, which contain the media information necessary to set up the call.
The Content-Type and Content-Length header fields indicate that the message body is SDP  and contains 158 octets of data. The basis for the octet count of 158 is shown in Table 2.1, where the CR LF at the end of each line is shown as a ©® and the octet count for each line is shown on the right-hand side. A blank line separates the message body from the header field list, which ends with the Content-Length header field. In this case, there are seven lines of SDP data describing the media attributes that the caller Tesla desires for the call. This media information is needed because SIP makes no assumptions about the type of media session to be established-the caller must specify exactly what type of session (audio, video, gaming) that he wishes to establish. The SDP field names are listed in Table 2.2, and will be discussed detail in Section 7.1, but a quick review of the lines shows the basic information necessary to establish a session.
o=Tesla 2890844526 2890844526 IN IP4 lab.high-voltage.org©®
c=IN IP4 100.101.102.103©®
m=audio 49170 RTP/AVP 0©®
o=Tesla 2890844526 2890844526 IN IP4 lab.high-voltage.org
Origin containing name
c=IN IP4 100.101.102.103
m=audio 49170 RTP/AVP 0
Table 2.2 includes the:
Connection IP address (100.101.102.103);
Media format (audio);
Port number (49170);
Media transport protocol (RTP);
Sampling rate (8,000 Hz).
INVITE is an example of a SIP request message. There are five other methods or types of SIP requests currently defined in the SIP specification RFC 3261 and others in extension RFCs. The next message in Figure 2.1 is a 180 Ringing message sent in response to the INVITE. This message indicates that the called party Marconi has received the INVITE and that alerting is taking place. The alerting could be ringing a phone, flashing a message on a screen, or any other method of attracting the attention of the called party, Marconi.
The 180 Ringing is an example of a SIP response message. Responses are numerical and are classified by the first digit of the number. A 180 response is an "informational class" response, identified by the first digit being a 1. Informational responses are used to convey noncritical information about the progress of the call. Many SIP response codes were based on HTTP version 1.1 response codes with some extensions and additions. Anyone who has ever browsed the World Wide Web has likely received a "404 Not Found" response from a Web server when a requested page was not found. 404 Not Found is also a valid SIP "client error class" response in a request to an unknown user. The other classes of SIP responses are covered in Chapter 5.
The response code number in SIP alone determines the way the response is interpreted by the server or the user. The reason phrase, Ringing in this case, is suggested in the standard, but any text can be used to convey more information. For instance, 180 Hold your horses, I'm trying to wake him up! is a perfectly valid SIP response and has the same meaning as a 180 Ringing response.
The 180 Ringing response has the following structure:
SIP/2.0 180 Ringing Via: SIP/2.0/UDP lab.high-voltage.org:5060;branch=z9hG4bKfw19b ;received=100.101.102.103 To: G. Marconi <sip:email@example.com>;tag=a53e42 From: Nikola Tesla <sip:firstname.lastname@example.org>>;tag=76341 Call-ID: email@example.com CSeq: 1 INVITE Contact: <sip:firstname.lastname@example.org> Content-Length: 0
The message was created by copying many of the header fields from the INVITE message, including the Via, To, From, Call-ID, and CSeq, then adding a response start line containing the SIP version number, the response code, and the reason phrase. This approach simplifies the message processing for responses.
The Via header field contains the original branch parameter but also has an additional received parameter. This parameter contains the literal IP address that the request was received from (100.101.102.103), which typically is the same address that the URI in the Via resolves using DNS (lab.high-voltage.org).
Note that the To and From header fields are not reversed in the response message as one might expect them to be. Even though this message is sent to Marconi from Tesla, the header fields read the opposite. This is because the To and From header fields in SIP are defined to indicate the direction of the request, not the direction of the message. Since Tesla initiated this request, all responses will read To: Marconi From: Tesla.
The To header field now contains a tag that was generated by Marconi. All future requests and responses in this session or dialog will contain both the tag generated by Tesla and the tag generated by Marconi.
The response also contains a Contact header field, which contains an address at which Marconi can be contacted directly once the session is established.
When the called party Marconi decides to accept the call (i.e., the phone is answered), a 200 OK response is sent. This response also indicates that the type of media session proposed by the caller is acceptable. The 200 OK is an example of a "success class" response. The 200 OK message body contains Marconi's media information:
SIP/2.0 200 OK Via: SIP/2.0/UDP lab.high-voltage.org:5060;branch=z9hG4bKfw19b ;received=100.101.102.103 To: G. Marconi <sip:email@example.com>;tag=a53e42 From: Nikola Tesla <sip:firstname.lastname@example.org>;tag=76341 Call-ID: email@example.com CSeq: 1 INVITE Contact: <sip:firstname.lastname@example.org> Content-Type: application/sdp Content-Length: 155 v=0 o=Marconi 2890844528 2890844528 IN IP4 tower.radio.org s=Phone Call c=IN IP4 220.127.116.11 t=0 0 m=audio 60000 RTP/AVP 0 a=rtpmap:0 PCMU/8000
This response is constructed the same way as the 180 Ringing response and contains the same To tag and Contact URI. The media capabilities, however, must be communicated in a SDP message body added to the response. From the same SDP fields as Table 2.2, the SDP contains:
End-point IP address (18.104.22.168);
Media format (audio);
Port number (60000);
Media transport protocol (RTP);
Media encoding (PCM μ-Law);
Sampling rate (8,000 Hz).
The final step is to confirm the media session with an "acknowledgment" request. The confirmation means that Tesla has received successfully Marconi's response. This exchange of media information allows the media session to be established using another protocol, RTP in this example.
ACK sip:email@example.com SIP/2.0 Via: SIP/2.0/UDP lab.high-voltage.org:5060;branch=z9hG4bK321g Max-Forwards: 70 To: G. Marconi <sip:firstname.lastname@example.org>;tag=a53e42 From: Nikola Tesla <sip:email@example.com>;tag=76341 Call-ID: firstname.lastname@example.org CSeq: 1 ACK Content-Length: 0
The command sequence, CSeq, has the same number as the INVITE, but the method is set to ACK. At this point, the media session begins using the media information carried in the SIP messages. The media session takes place using another protocol, typically RTP. The branch parameter in the Via header field contains a new transaction identifier than the INVITE, since an ACK sent to acknowledge a 200 OK is considered a separate transaction.
This message exchange shows that SIP is an end-to-end signaling protocol. A SIP network, or SIP server is not required for the protocol to be used. Two end points running a SIP protocol stack and knowing each other's IP addresses can use SIP to set up a media session between them. Although less obvious, this example also shows the client-server nature of the SIP protocol. When Tesla originates the INVITE request, he is acting as a SIP client. When Marconi responds to the request, he is acting as a SIP server. After the media session is established, Marconi originates the BYE request and acts as the SIP client, while Tesla acts as the SIP server when he responds. This is why a SIP-enabled device must contain both SIP server and SIP client software-during a typical session, both are needed. This is quite different from other client-server Internet protocols such as HTTP or FTP. The Web browser is always an HTTP client, and the Web server is always an HTTP server, and similarly for FTP. In SIP, an end point will switch back and forth during a session between being a client and a server.
In Figure 2.1, a BYE request is sent by Marconi to terminate the media session:
BYE sip:email@example.com SIP/2.0 Via: SIP/2.0/UDP tower.radio.org:5060;branch=z9hG4bK392kf Max-Forwards: 70 To: Nikola Tesla <sip:firstname.lastname@example.org>;tag=76341 From: G. Marconi <sip:email@example.com>;tag=a53e42 Call-ID: firstname.lastname@example.org CSeq: 1 BYE Content-Length: 0
The Via header field in this example is populated with Marconi's host address and contains a new transaction identifier since the BYE is considered a separate transaction from the INVITE or ACK transactions shown previously. The To and From header fields reflect that this request is originated by Marconi, as they are reversed from the messages in the previous transaction. Tesla, however, is able to identify the dialog using the presence of the same local and remote tags and Call-ID as the INVITE, and tear down the correct media session.
Notice that all the branch IDs shown in the example so far begin with the string z9hG4bK. This is a special string that indicates that the branch ID has been calculated using strict rules defined in RFC 3261 and is as a result usable as a transaction identifier.
The confirmation response to the BYE is a 200 OK:
SIP/2.0 200 OK Via: SIP/2.0/UDP tower.radio.org:5060;branch=z9hG4bK392kf ;received=22.214.171.124 To: Nikola Tesla <sip:email@example.com>;tag=76341 From: G. Marconi <sip:firstname.lastname@example.org>;tag=a53e42 Call-ID: email@example.com CSeq: 1 BYE Content-Length: 0
The response echoes the CSeq of the original request: 1 BYE.
This string is needed because branch IDs generated by user agents prior to RFC 3261 may have constructed branch IDs which are not suitable as transaction identifiers. In this case, a client must construct its own transaction identifier using the To tag, From tag, Call-ID, and CSeq.