Malicious PDF document intelligent detection method and system based on feature aggregation

A technology of intelligent detection and documentation, applied in neural learning methods, instruments, biological neural network models, etc., can solve problems such as poor generalization ability, reverse imitation attack, underfitting of classification models, etc., to improve accuracy and ease Usability, reduce training pressure, improve efficiency

Pending Publication Date: 2021-11-26
ARMY ENG UNIV OF PLA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The disadvantages of existing malicious PDF document detection methods include: the generalization ability of features is poor, and when there are few training samples, the classification model is prone to underfitting, which affects the performance of the detection system; unprocessed original features are easy to be attacked by attackers Carry out reverse imitation attacks, thereby affecting the detection rate of the detection system; the dimension of the feature space is large, resulting in low training efficiency of the deep learning model, which depends on the configuration of the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Malicious PDF document intelligent detection method and system based on feature aggregation
  • Malicious PDF document intelligent detection method and system based on feature aggregation
  • Malicious PDF document intelligent detection method and system based on feature aggregation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be further described below with reference to the accompanying drawings. The following examples are for more clearly explaining the technical solutions of the present invention without limiting the scope of the invention.

[0050] like figure 1 As shown, a malicious PDF document detection method based on feature agglomeration, including:

[0051] Enter a document to parse it, extract its content characteristics and structural features; Polymerization characteristics; feed the polymer feature into the 1D-CNN model training or the detection classification.

[0052]The content feature refers to the statistical class feature based on the content parsing of the PDF document, and the extracted statistical class feature, including the number of pages, whether it is encrypted, whether or not the tag JAVAScript contains tag JavaScript, whether or not a tag AA Is there a tag openaction, whether or not the tag acrofrom is included, whether JBIG2 compression, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a malicious PDF document intelligent detection method and system based on feature aggregation. The method comprises the following steps: acquiring a to-be-detected PDF document; analyzing the PDF document, extracting content features and structural features of the document from the PDF document, merging the content features and the structural features, and performing feature clustering by taking the minimum variance of feature clusters as a target to obtain aggregated features; inputting the aggregation features into a pre-trained convolutional neural network model, and if the output is 1, determining that the document is a malicious document; if the output is 0, judging the document to be a benign document. The invention has the advantages that the dimensionality of the features is reduced, the training pressure of a deep learning model is relieved, and the efficiency of the system is improved; according to the aggregation features of the input document, the convolutional neural network model is utilized to detect and classify the document or automatically train the parameters of the model, so that the accuracy and usability of the system are improved.

Description

Technical field [0001] The present invention relates to a malicious PDF document intelligent detection method and system based on feature agglomeration, belonging to the information security technology. Background technique [0002] The traditional malicious PDF document detection method is based primarily on the signature identification and heuristic rules. The advantage is that the false positive rate is low, but it is limited to detection of malicious samples existing in the virus database. It is sluggish to the unknown malicious document, and the attacker can By counterfeiting new malicious documents. [0003] In recent years, malicious PDF document testing based on machine learning has been widely used. Compared to traditional signature matching detection, it can find new malicious documents in time and the model update is convenient. According to the feature extraction method, it can be divided into dynamic detection and static detection. Dynamic detection requires a docume...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56G06K9/62G06N3/04G06N3/08
CPCG06F21/562G06N3/04G06N3/08G06F18/231
Inventor 王金双俞远哲孙蒙邹霞
Owner ARMY ENG UNIV OF PLA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products