Below is quoted for RFC 3960 which describes the various possibilities for media clipping.
General Reason:
Media clipping occurs when the user (or the machine generating media)
believes that the media session is already established, but the
establishment process has not finished yet. The user starts speaking
(i.e., generating media) and the first few syllables or even the
first few words are lost.
1:
When the offer/answer exchange takes place in the 200 (OK) response
and in the ACK, media clipping is unavoidable. The called user
starts speaking at the same time the 200 (OK) is sent, but the UAS
cannot send any media until the answer from the User Agent Client
(UAC) arrives in the ACK.
2:
Another form of media clipping (not related to early media either)
occurs in the caller-to-callee direction. When the callee picks up
and starts speaking, the UAS sends a 200 (OK) response with an
answer, in parallel with the first media packets. If the first media
packets arrive at the UAC before the answer and the caller starts
speaking, the UAC cannot send media until the 200 (OK) response from
the UAS arrives.
in this case, media clipping rarely happens
On the other hand, media clipping does not appear in the most common
offer/answer exchange (an INVITE with an offer and a 200 (OK) with an
answer). UACs are ready to play incoming media packets as soon as
they send an offer, because they cannot count on the reception of the
200 (OK) to start playing out media for the caller; SIP signalling
and media packets typically traverse different paths, and so, media
packets may arrive before the 200 (OK) response.
总结来说,有两种常见的media clipping的情况。
第一种是INVITE消息中没有SDP,这种情况下media clipping是无法避免的。
第二种情况是因为200OK消息比第一个总callee发出的RTP包晚到caller。