【Paper Reading Note】Brief Introduction of Encrypted Traffic Classification Using PEAN

Brief Introduction of Encrypted Traffic Classification

[Data Preprocessing-Oriented] [Essay Reading & Understanding Record] [2023.4] [Author : LWC]

Catalogue

Core

PEAN (Pakect-level End-to-end Attentive Network) : A novel “multimodel deep learning framework” for EFC.

※ Mechanism of PEAN

  • INPUT —— raw bytes & length sequence.
  • OUTPUT —— traffic classification result.
  • Self Attention mechanism —— do better in learning inter-relationships between network packets.
  • Unsupervised pre-training —— enhance characterize ability.

Motivation and Challenges

Nowadays, people are more concerned about “data security”, resulting for the emergency of encrypted protocols (such as SSL/TSL). Thus traditional payload-based classification method or DPI can no longer works once packets are encrypted.

Using machine learning algorithms is a promising method, but this method is based on “hand-designed flow features” which may thus ignore some important packet details, in which case classification is poor-performed. (In PEAN, there might be some better way to attain flow features.)

Challenges of EFC are as follows :

  • Encrypted packets cannot be classified from their content.
  • No effective ways to intergrate all kinds of information (such as IP, TCP headers).

Basic Knowledge

SSL/TLS handshake process

在这里插入图片描述

What is necessary to be accentuated is that, not all cases contain complete “handshake packets”. Thus, there exists three types of “handshake packets” which we need to capture in our terminal system :

  • Fully Complete Handshake Packet.
  • Partially Complete Handshake Packet.
  • No Handshake Packet.

Preprocessing of Input Traffic Data [Important]

This part is actually what we need to do in this project, cause the PEAN is ready-made which is open source on git-hub and all we need to do is “preprocess” the data going into the PEAN.

Traffic Data Structure

First, let us analyse what is the input traffic data look like.
R = [ P 1 , P 2 , . . . , P n ] R = [P^1,P^2,...,P^n] R=[P1,P2,...,Pn]
R represents the “network traffic” set. In this set, every P represents one “packet” and can also be expressed by a matrix:
P i = ( X i , B i , T i ) P^i = (X^i,B^i,T^i) Pi=(Xi,Bi,Ti)
B represents “byte content”, T represents “start time”, X is a 5-tuple which contains SRC, DST and Protocol features. They can be unfolded as :
B i = [ b y t e 1 , b y t e 2 , . . . b y t e q ] , 0 x 00 ≤ b y t e ≤ 0 x f f B^i = [byte^1,byte^2,...byte^q], 0x00≤byte≤0xff Bi=[byte1,byte2,...byteq],0x00byte0xff

T i > 0 T^i > 0 Ti>0

X i = < S r c I P i , D s t I P i , S r c P o r t i , D s t P o r t i , P r o t o c o l i > X^i = <SrcIP^i,DstIP^i,SrcPort^i,DstPort^i,Protocol^i> Xi=<SrcIPi,DstIPi,SrcPorti,DstPorti,Protocoli>

R can also be expressed by “flow”. ‘‘Flow’’ f is the set of the “Packets” P with the same X. One flow contains many packets, so it can be expressed as (l means l-th flow):
f l = [ P l 1 , P l 2 , . . . , P l m ] f_{l} = [{P_{l}^1,P_{l}^2,...,P_{l}^m}] fl=[Pl1,Pl2,...,Plm]
Once all of the “flow” contains all of the “packet”, R thus can be expressed as :
R = [ f 1 , f 2 , . . . , f k ] R = [f_{1},f_{2},...,f_{k}] R=[f1,f2,...,fk]

Preprocessing procedure

Preprocessing procedure is listed as follows :

  • Bi-directional Flows Extraction : Divide pakects with the same 5-tuple into the same group. [Tool : SplitCap]
  • TLS Traffic Filtering : Only concentrated on TLS encrypted traffic and filter other types. [Tool : tshark]
  • Traffic Typed Selection : Only use 19 kinds of mainstream traffic.
  • Labeling : Use DNS records & TLS Server Name Indication to label network flow.

Overall Architecture of PEAN [Not Emphasis]

PEAN is open source, but we still need a little bit learning about its architecture.
在这里插入图片描述

Evaluation Index

First we need to learn some terminology of evaluation of neural network. TP, TN, FP and FN means:
在这里插入图片描述

Here are a few of evaluation index we need to calculate in order to assess the PEAN.

  • Accuracy : The proportion of “correctly classified samples” to “all samples”.

A c c u r a c y = T P i T P i + F P i Accuracy = \frac{TP_i}{TP_{i} + FP_{i}} Accuracy=TPi+FPiTPi

  • TPR-avg : TPR is short for “True Positive Rate”.

T P R i = T P i T P i + F N i TPR_{i} = \frac{TP_i}{TP_{i} + FN_{i}} TPRi=TPi+FNiTPi

  • FPR-avg : FPR is short for “False Positive Rate”.

F P R i = F P i F P i + T N i = R e c a l l i FPR_{i} = \frac{FP_i}{FP_{i} + TN_{i}} = Recall_{i} FPRi=FPi+TNiFPi=Recalli

  • F1_macro : Average F1 value of all categories.
    F 1 m a c r o = 1 N ∑ i = 1 N F 1 i = 1 N ∑ i = 1 N 2 × A c c u r a c y × R e c a l l A c c u r a c y + R e c a l l F1_{macro} = \frac{1}{N} \sum_{i=1}^N F1_i = \frac{1}{N} \sum_{i=1}^N 2 \times \frac{Accuracy \times Recall}{Accuracy + Recall} F1macro=N1i=1NF1i=N1i=1N2×Accuracy+RecallAccuracy×Recall

  • FTF : Can be calculated by TPR & FPR.

F T F = ∑ i = 1 N w i T P R i 1 + F P R i FTF = \sum_{i=1}^N w_{i} \frac{TPR_{i}}{1 + FPR_{i}} FTF=i=1Nwi1+FPRiTPRi

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值