Question:
If I understand DTLS-SRTP correctly, DTLS is used to exchange keys and then the endpoints switch to SRTP for encryption. What is the benefit of this setup versus just sending RTP over DTLS? Is it just about compatibility with existing SRTP stacks?
Answer:
It's all about encryption overhead; how much the extra data the encryption method extends the packet by.
DTLS has a noticeable amount of overhead; the DTLS header alone is 13 bytes, and then you have the IV/nonce, and the tag; this overhead can be more than the actual VoIP payload. In contrast, SRTP was specifically designed to minimize this overhead; except for the tag (which is optional; IMHO, bad idea to omit it, but some people insisted), there is no overhead compared to RTP.
You might ask "what's the big deal about encryption overhead? Doesn't the internet not care that much about packet sizes?" Well, yes, if you're talking about wired internet connections, actually, this overhead might not be that significant. However, for wireless, yes, people do worry about it, because:
Because of power; the more bytes you have, the more bytes need to be transmitted (and if you're on a battery, well, that's a concern)
Because wireless is a shared media, so the more bytes you broadcast, that's less bandwidth everyone else connecting to the same AP gets.