Android Multi-label Classification Research Based on Text Contents
WANG Yan
1
王岩(1995-),男,自然语言处理
ZHANG Hua
1
张华(1978-),女,副教授,博导,主要研究方向:网络安全
Cui Dong
1
1、State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876
Abstract:In recent years, more and more malicious applications have appeared on mobile application platforms, and they are often disguised as social, communication, and gaming applications. If we first classify applications by function when detecting malware, which can improve the accuracy of malware detection. Application function classification requires a large number of high-quality samples, but application category tags vary widely in different app stores, and samples of the same function type cannot be obtained quickly and efficiently. This thesis proposes a method for constructing a multi-classification model by using the text content of application description information, and guiding the classification of application by the category of application description. This method collects the description of an application in different app stores, predicts the category of the description through the classification model, and obtains the application category by voting.The model is based on CNN and RNN, and its F1-score is about 3% higher than the text classification model such as textCNN, LSTM. Its training prediction time and memory consumption are only improved compared to textCNN and LSTM models 6%. We named it CRNN.In addition, this thesis constructs a data set that can be used for application classification. The data set is classified using application description to obtain each application description and its category.