A mobile application program identification method based on K-means clustering and a random forest algorithm

A technology of k-means clustering and random forest algorithm, which is applied in the field of information security to reduce misjudgments, avoid misjudgments that interfere with samples, and improve accuracy.

Inactive Publication Date: 2019-05-07
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to overcome the deficiencies of the prior art, provide a mobile application identification method based on K-means clustering and random forest algorithm, and filter interference samples by combining clustering algorithm and information entropy , which reasonably avoids the problem of misjudgment of interference samples due to the lack of learning of the classifier

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mobile application program identification method based on K-means clustering and a random forest algorithm
  • A mobile application program identification method based on K-means clustering and a random forest algorithm
  • A mobile application program identification method based on K-means clustering and a random forest algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] see figure 1 , this embodiment provides a mobile application identification method based on K-means clustering and random forest algorithm, such as figure 1 As shown, the method includes the following steps:

[0041] Step 1: Represent the encrypted data stream as a grouped time series

[0042] The encrypted data stream is discretized and expressed in the form of three grouped time series, the specific steps are as follows:

[0043] 1.1. Discretize continuous encrypted network traffic in units of bursts. A burst is a series of packets whose adjacent time interval is less than a certain threshold;

[0044] 1.2. Separate multiple encrypted data streams from each burst. In a burst, packets related to the same pair of quadruples form a data stream;

[0045] 1.3. Each data stream is represented by three grouped time series. The three time series are: (1) the sequence arranged in chronological order by the packet length of each packet flowing in the data stream; (2) the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The mobile application program identification method of K-means clustering and a random forest algorithm comprises the following steps: firstly, discretizing an encrypted data stream in a time periodinto a plurality of data streams according to the characteristics of a TCP session, and representing each data stream by adopting an input grouping time sequence, an output grouping time sequence andan input and output grouping time sequence; Performing mathematical statistics on the three time sequences corresponding to each data stream to obtain statistical characteristics of the data packet; Afterwards, Carrying out statistical characteristic clustering analysis on the encrypted data flow by using a K-means clustering algorithm; Scoring the purity of each clustering cluster obtained by clustering analysis through an entropy calculation method, and filtering samples in the clustering cluster with lower purity; And finally, carrying out modeling on the filtered cluster serving as a dataset through a random Sendon algorithm, so that identification of the encrypted Liu mobile application type is realized. According to the method, supervised learning and unsupervised learning are combined, and different mobile application types can be accurately identified in encrypted traffic with various application types.

Description

technical field [0001] The invention belongs to the technical field of information security, and in particular relates to a mobile application identification method based on K-means clustering and random forest algorithm. Background technique [0002] In recent years, as the hardware performance of smart mobile devices has been greatly improved and software functions have become increasingly rich, the usage of smart mobile devices has also continued to grow. People carry smart phones with them at all times, and use them to complete basic voice calls and SMS communications, as well as daily communication activities such as e-mails and social networking related to the Internet. These portable devices also store a large amount of sensitive information related to user privacy. Most mobile applications today are encrypted using the SSL / TLS protocol. Even so, attackers can indirectly deduce users' sensitive information through the analysis of encrypted traffic. [0003] At the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
Inventor 陈丹伟朱迪
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products