Encrypted traffic identification method oriented to category imbalance

A traffic identification and balancing technology, which is applied in the field of encrypted traffic identification under category imbalance, can solve the problems of reducing algorithm identification effect, unbalanced number of category samples, and unbalanced distribution of data streams, etc.

Active Publication Date: 2020-10-23
NANJING UNIV OF INFORMATION SCI & TECH
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the data stream distribution of various encryption applications in the actual network is very uneven. For example, the audio and video streams carried by encrypted protocols are much larger than instant messaging and pure web encrypted streams, etc. The data streams of encryption protocols such as SSH and IPsec are far less than HTTPS protocol
Network application flow category imbalance refers to the imbalance in the number of category samples in the data set. Through training, these classification algorithms may ignore the flow samples of a few categories, resulting in underfitting, or pay attention to the differences of minority categories, resulting in overfitting, reducing the algorithm recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Encrypted traffic identification method oriented to category imbalance
  • Encrypted traffic identification method oriented to category imbalance
  • Encrypted traffic identification method oriented to category imbalance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention is described in further detail now in conjunction with accompanying drawing.

[0041] It should be noted that terms such as "upper", "lower", "left", "right", "front", and "rear" quoted in the invention are only for clarity of description, not for Limiting the practicable scope of the present invention, and the change or adjustment of the relative relationship shall also be regarded as the practicable scope of the present invention without substantive changes in the technical content.

[0042] The present invention provides a method for identifying encrypted traffic under category imbalance. Aiming at the problems of category imbalance, feature extraction difficulty and feature redundancy of sample data sets, the original data set is balanced by an improved SMOTE algorithm based on density estimation, Then extract the commonly used features in the field of network traffic recognition and use the variational autoencoder model to automatically extrac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an encrypted traffic identification method oriented to category imbalance. The method comprises the following steps: S1, acquiring a data set; S2, balancing the data set: processing the original experimental data set by adopting an improved SMOTE algorithm based on density estimation; S3, data preprocessing: reading a data stream, cutting off data, and carrying out normalization processing; S4, optimizing the feature set: automatically extracting features through a variational automatic encoder model, identifying features commonly used in the field through network traffic, and obtaining the optimal feature set by using a feature selection method based on a tree model; S5, identifying flow: inputting the optimal feature set into a random forest CGA-RF classifier algorithm improved based on a genetic algorithm, and identifying target encrypted flow; and S6, analyzing the obtained index result, and optimizing an encrypted traffic identification method. The method is high in recognition rate and low in false alarm rate, and is suitable for encrypted traffic recognition of data set category imbalance and difficult feature extraction.

Description

technical field [0001] The invention relates to the field of encrypted traffic identification, in particular to an encrypted traffic identification method for category imbalance. Background technique [0002] With the rapid development of network technology, more and more network applications use encryption protocols to ensure the safe transmission of information in the network, and encrypted traffic occupies an increasing proportion of actual network traffic. However, due to the concealment of encrypted traffic, it often becomes the carrier of network attacks. In recent years, network security incidents have intensified. The reason for this is that network security issues have not received enough attention. Network attacks often use encrypted network traffic as a carrier to continuously attack the system. network. Existing network attacks mainly in the form of botnets, advanced persistent threats, and Trojan horses often use related concealment techniques to bypass securit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/851G06N3/12G06K9/62
CPCH04L47/2441H04L47/2483G06N3/126G06F18/24323G06F18/214Y02D30/50
Inventor 翟江涛吉小鹏崔永富林鹏石怀峰
Owner NANJING UNIV OF INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products