These two articles say that “fancy” classification features like packet interarrival times are coming to commercial firewalls. Cisco calls it Encrypted Traffic Analysis (ETA). It is based on some published research and a free software prototype.
https://www.theregister.co.uk/2017/06/22/ciscos_encrypted_traffic_fingerprinting_turned_into_product/ (https://web.archive.org/web/20170624235303/https://www.theregister.co.uk/2017/06/22/ciscos_encrypted_traffic_fingerprinting_turned_into_product/)
That academic work, or something an awful lot like it, has now appeared in a product as Encrypted Traffic Analysis (ETA), which watches three characteristics that Cisco says provide enough information to spot malware. Those three factors are:
- The initial data packet in the connection;
- The sequence of packet lengths and times, which Cisco’s post
says “offers vital clues into traffic contents beyond the
beginning of the encrypted flow”; and - Byte distribution across packet payloads within a flow, a
detection process that improves over time, because it helps
build machine learning models.
https://newsroom.cisco.com/feature-content type=webcontent&articleId=1853370 (https://web.archive.org/web/20170624234826/https://newsroom.cisco.com/feature-content?type=webcontent&articleId=1853370)
The resulting technique, called Encrypted Traffic Analytics (ETA), involves looking for telltale signs in three features of encrypted data. The first is the initial data packet of the connection. This by itself may contain valuable data about the rest of the content. Then there is the sequence of packet
lengths and times, which offers vital clues into traffic contents beyond the beginning of the encrypted flow. Finally, ETA checks the byte distribution across the payloads of the packets within the flow being analyzed. Since this network-based detection process is aided by machine learning, its efficacy improves over time.
This research from November 2016 seems the most directly related:
“Enhanced telemetry for encrypted threat analytics” David McGrew and Blake Anderson
http://gen.lib.rus.ec/scimag/?s=10.1109%2FICNP.2016.7785325
In our implementation, we keep two arrays per flow: an array of
sizes (in bytes) of the packets, and an array of times representing the number of milliseconds since the previous packet was observed.
To represent the byte distribution of a flow, we use a length-256 array of counters, 1 counter per byte value… We export the Shannon entropy of the flow computed over the full byte distribution. Also, we compute the streaming mean/variance of the byte distribution…
They mention a classification tool called Joy, which seems to have been
a precursor/testbed for ETA.
https://github.com/cisco/joy
Relation to Cisco ETA Joy has helped support the research that paved the way for Cisco’s Encrypted Traffic Analytics (ETA), but it is not directly integrated into any of the Cisco products or services that implement ETA. The classifiers in Joy were trained on a small dataset several years ago, and do not represent the classification methods or performance of ETA.
This paper from July 2016 is mainly about TLS features but also mentions
flow features.
“Deciphering Malware’s use of TLS (without Decryption)”
Blake Anderson, Subharthi Paul, David McGrew
https://arxiv.org/abs/1607.01639
We used an open source project to collect the data and transform
it to a JSON format that contained the typical network 5-tuple,
the sequence of packet lengths and interarrival times, the byte
distribution, and the unencrypted TLS handshake information
The machine learning classifiers are built using
traditional flow features, traditional “side-channel” features,
and features collected from the unencrypted TLS handshake
messages.
- Flow Metadata: These features include the number of inbound
bytes, outbound bytes, inbound packets, outbound packets; the
source and destination ports; and the total duration of the flow
in seconds. - Sequence of Packet Lengths and Times: In our open source
implementation, the SPLT elements are collected for the first 50
packets of a flow. Zero-length payloads (such as ACKs) and
retransmissions are ignored. - Byte Distribution: The byte distribution is a length-256
array that keeps a count for each byte value encountered in the
payloads of the packets for each packet in the flow. - Unencrypted TLS Header Information: The TLS version, the
ordered list of offered ciphersuites, and the list of supported
TLS extensions are collected from the client hello message. The
selected ciphersuite and selected TLS extensions are collected
from the server hello message. The server’s certificate is
collected from the certificate message. The client’s public key
length is collected from the client key exchange message, and is
the length of the RSA ciphertext or DH/ECDH public key,
depending on the ciphersuite. Similar to the sequence of packet
lengths and times, the sequence of record lengths, times, and
types is collected from TLS sessions.
This paper from October 2016 is kind of a synthesis of the above papers,
combining them with plaintext and DNS contextual features.
“Identifying Encrypted Malware Traffic with Contextual Flow Data”
Blake Anderson and David McGrew
http://gen.lib.rus.ec/scimag/?s=10.1145%2F2996758.2996768
Features based on observable metadata, such as the sequence of
packet lengths and inter-arrival times, were used, and were
modeled as Markov chains. We exclude TCP retransmissions from
our data by having our tool track the TCP sequence number. The
packet lengths were taken to be the sizes of the UDP, TCP, or
ICMP packet payloads. If the packet was not one of those three
types, then the length was set to the size of the IP packet. The
inter-arrival times had a millisecond resolution.
For both the lengths and times, the values were discretized into
equally sized bins. The length data Markov chain had 10 bins of
150 bytes each. A 1500 byte MTU was assumed, and any packets
observed with a size greater than 1350 bytes were put into the
same bin. The timing data Markov chain used 50 millisecond bins
and 10 bins for 100 total features. Any inter-packet time
greater than 450ms fell into the same bin.
Another form of observable metadata, the byte distribution, was
represented as a length-256 array that keeps a count for each
byte value encountered in the payloads of the packets of the
flow being analyzed.
I recently wrote about to what extent packet sizes and timing can
theoretically be obfuscated in obfs4.
https://people.torproject.org/~dcf/obfs4-timing/
https://lists.torproject.org/pipermail/tor-dev/2017-June/012310.html
I tried to make a constant bitrate mode that sends 500 bytes
every 100 ms, in which the size and timing of obfs4 packets is
independent of the TLS packets underneath. It turns out that
obfs4 mostly makes this possible, with one exception: a gap in
the client traffic while the client waits for the server’s
handshake response, during which time the client cannot send
anything because it doesn’t yet know the shared key.