【Paper Reading Note】Brief Introduction of Encrypted Traffic Classification Using PEAN

GalaxyerKw

已于 2023-04-16 14:48:36 修改

阅读量198

点赞数

分类专栏： Paper Reading Note 文章标签：卷积神经网络神经网络流量识别

于 2023-04-16 14:46:04 首次发布

本文链接：https://blog.csdn.net/GalaxyerKw/article/details/130182445

版权

Paper Reading Note 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Brief Introduction of Encrypted Traffic Classification

[Data Preprocessing-Oriented] [Essay Reading & Understanding Record] [2023.4] [Author : LWC]

Catalogue

文章目录

Brief Introduction of Encrypted Traffic Classification

Core

PEAN (Pakect-level End-to-end Attentive Network) : A novel “multimodel deep learning framework” for EFC.

※ Mechanism of PEAN：

INPUT —— raw bytes & length sequence.
OUTPUT —— traffic classification result.
Self Attention mechanism —— do better in learning inter-relationships between network packets.
Unsupervised pre-training —— enhance characterize ability.

Motivation and Challenges

Nowadays, people are more concerned about “data security”, resulting for the emergency of encrypted protocols (such as SSL/TSL). Thus traditional payload-based classification method or DPI can no longer works once packets are encrypted.

Using machine learning algorithms is a promising method, but this method is based on “hand-designed flow features” which may thus ignore some important packet details, in which case classification is poor-performed. (In PEAN, there might be some better way to attain flow features.)

Challenges of EFC are as follows :

Encrypted packets cannot be classified from their content.
No effective ways to intergrate all kinds of information (such as IP, TCP headers).

Basic Knowledge

SSL/TLS handshake process

在这里插入图片描述

What is necessary to be accentuated is that, not all cases contain complete “handshake packets”. Thus, there exists three types of “handshake packets” which we need to capture in our terminal system :

Fully Complete Handshake Packet.
Partially Complete Handshake Packet.
No Handshake Packet.

Preprocessing of Input Traffic Data [Important]

This part is actually what we need to do in this project, cause the PEAN is ready-made which is open source on git-hub and all we need to do is “preprocess” the data going into the PEAN.

Traffic Data Structure

First, let us analyse what is the input traffic data look like.
$R = [P^1,P^2,...,P^n]$
R represents the “network traffic” set. In this set, every P represents one “packet” and can also be expressed by a matrix:
$P^i = (X^i,B^i,T^i)$
B represents “byte content”, T represents “start time”, X is a 5-tuple which contains SRC, DST and Protocol features. They can be unfolded as :
$B^i = [byte^1,byte^2,...byte^q], 0x00≤byte≤0xff$

$T^i > 0$

$X^i = <SrcIP^i,DstIP^i,SrcPort^i,DstPort^i,Protocol^i>$

R can also be expressed by “flow”. ‘‘Flow’’ f is the set of the “Packets” P with the same X. One flow contains many packets, so it can be expressed as (l means l-th flow):
$f_{l} = [{P_{l}^1,P_{l}^2,...,P_{l}^m}]$
Once all of the “flow” contains all of the “packet”, R thus can be expressed as :
$R = [f_{1},f_{2},...,f_{k}]$

Preprocessing procedure

Preprocessing procedure is listed as follows :

Bi-directional Flows Extraction : Divide pakects with the same 5-tuple into the same group. [Tool : SplitCap]
TLS Traffic Filtering : Only concentrated on TLS encrypted traffic and filter other types. [Tool : tshark]
Traffic Typed Selection : Only use 19 kinds of mainstream traffic.
Labeling : Use DNS records & TLS Server Name Indication to label network flow.

Overall Architecture of PEAN [Not Emphasis]

PEAN is open source, but we still need a little bit learning about its architecture.
在这里插入图片描述

Evaluation Index

First we need to learn some terminology of evaluation of neural network. TP, TN, FP and FN means：
在这里插入图片描述

Here are a few of evaluation index we need to calculate in order to assess the PEAN.

Accuracy : The proportion of “correctly classified samples” to “all samples”.

$\frac{TP_i}{TP_{i} + FP_{i}}$

TPR-avg : TPR is short for “True Positive Rate”.

$TPR_{i} = \frac{TP_i}{TP_{i} + FN_{i}}$

FPR-avg : FPR is short for “False Positive Rate”.

$FPR_{i} = \frac{FP_i}{FP_{i} + TN_{i}} = Recall_{i}$

F1_macro : Average F1 value of all categories.
$F1_{macro} = \frac{1}{N} \sum_{i=1}^N F1_i = \frac{1}{N} \sum_{i=1}^N 2 \times \frac{Accuracy \times Recall}{Accuracy + Recall}$
FTF : Can be calculated by TPR & FPR.

$\sum_{i=1}^N w_{i} \frac{TPR_{i}}{1 + FPR_{i}}$

GalaxyerKw

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Paper Reading Note】Brief Introduction of Encrypted Traffic Classification Using PEAN

Brief Introduction of Encrypted TrafficClassification[Data Preprocessing-Oriented] [Essay Reading & Understanding Record] [2023.4] [Author : LWC]
复制链接

扫一扫