Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Packet Sequence Clustering Method for Unknown Binary Private Protocol

A proprietary protocol and sequence clustering technology, applied in the information field, can solve the problems of modeling difficulties, not considering the semantic correlation characteristics of words, and large dimension of the feature vector representing the sequence of message packets.

Active Publication Date: 2021-05-14
南京赛宁信息技术有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In sequence clustering algorithms based on probabilistic models, modeling is often difficult, and it is only very effective in long sequence clustering calculations
Keyword-based sequence clustering algorithm, the more classic one is the Apriori algorithm. The problem with this algorithm is that there will be a large number of overlapping frequent items, which makes the dimension of the feature vector representing the message sequence very large.
Because this method ignores the keyword length of the protocol message sequence, and does not consider the semantic correlation features before and after the words when making the message embedding representation, it cannot accurately measure the similarity between the message sequences, and the clustering effect is poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Packet Sequence Clustering Method for Unknown Binary Private Protocol
  • Packet Sequence Clustering Method for Unknown Binary Private Protocol
  • Packet Sequence Clustering Method for Unknown Binary Private Protocol

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Attached below figure 1 The specific steps of the present invention are further described.

[0032] Step 1, using a data collection method to collect unknown binary private protocol message sequences.

[0033] (1a) Set the network card mode of the server acquisition device to a mixed mode, so that it can monitor wireless communication data, and then open both communication entity A and communication entity B to establish a communication connection;

[0034] (1b) Use wireshark software to intercept the message sequence communication data between communication entities A and B, and save it as a pcap format file to obtain an unknown binary private protocol message sequence, which includes link layer data and transport layer data and application layer data.

[0035] Step 2, preprocessing the collected unknown binary private protocol packet sequence.

[0036] (2a) Analyze the intercepted unknown binary private protocol message sequence according to the structure of the ne...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a message sequence clustering method of an unknown binary private protocol, which mainly solves the problem in the prior art that the similarity between protocol message sequences cannot be accurately measured in the protocol reverse process. The implementation scheme is: 1) collecting unknown binary private protocol message sequences; 2) preprocessing the collected message sequences; 3) extracting multi-scale N-gram features of preprocessed message sequences; 4) selecting pairs based on variance Perform dimensionality reduction on multi-scale N-gram features; 5) Embed message sequences according to multi-scale N-gram features after dimension reduction; 6) Determine the optimal number of clusters K according to message sequence embedding representations; 7) Cluster the message sequence according to the optimal clustering number K. The invention fully excavates the latent semantic information of the message sequence, can accurately measure the similarity between the message sequences, improves the accuracy of clustering, and can be used for clustering unknown binary private protocols.

Description

technical field [0001] The invention belongs to the field of information technology, and further provides a message sequence clustering method, which can be used for clustering unknown binary private protocols. Background technique [0002] The network protocol is a specification for entities in the network to communicate, and clearly stipulates the data format and related synchronization issues when communicating entities exchange information with each other. In addition to standardized communication protocols in the network, there are also a large number of unknown private protocols. Packet sequence clustering is the primary task in the protocol reverse process, that is, to separate the packets of each type of private protocol packet sequence according to the similarity between the message sequences to the greatest extent, and then perform field format inference and state machine inference . [0003] Packet sequence clustering of private protocols, that is, the core issu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/06G06K9/62
Inventor 杨超吴继超
Owner 南京赛宁信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products