Unbalanced data set conversion intrusion detection method and system based on sampling and feature reduction
An unbalanced data and intrusion detection technology, which is applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of reducing the detection accuracy of network intrusion risks, unbalanced data sets, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0074] Embodiment 1 of the present invention provides an intrusion detection method based on unbalanced data set conversion based on sampling and feature reduction, such as figure 1 As shown, the method includes the following steps:
[0075] S1: Obtain an unbalanced data set in the network log data, the unbalanced data set is a minority class sample set;
[0076] S2: Oversample the minority class sample set to form a new minority class sample set, which is a new unbalanced data set. Oversampling includes using the S-NKSMOTE algorithm to oversample the minority class sample set. Refer to figure 2 ,Specifically:
[0077] S21: Obtain k nearest neighbor samples of the sample x in the minority class sample set;
[0078] Among them, the k nearest neighbor samples are the k samples closest to the sample x in the kernel space, and the value of k can be set, which can be 100, 500, etc.;
[0079] S22: Compare the number of minority class samples in the k nearest neighbor samples wit...
Embodiment 2
[0098] Embodiment 2 of the present invention provides an intrusion detection method based on unbalanced data set conversion based on sampling and feature reduction. The detection method includes the following steps:
[0099] S1: Obtain an unbalanced data set in the network log data, the unbalanced data set is a collection of majority class sample set and minority class sample set;
[0100] S2: Sampling the unbalanced data set to obtain a new unbalanced data set. For the specific method of step S2, refer to Figure 4 , including:
[0101] S210: Acquire boundary sample sets of the majority class sample set and the minority class sample set;
[0102] refer to Figure 5 , step S210 is specifically, wherein the distances referred to below are all distances in the nuclear space;
[0103] S211: Calculate the distance between each majority class sample and its nearest minority class sample in the majority class sample set;
[0104] S212: Calculate the distance between each minorit...
Embodiment 3
[0144] Embodiment 3 further defines the oversampling process on the basis of embodiment 1, specifically:
[0145] Calculate the distance between each minority sample in the minority sample set and the center sample, and oversample the minority sample set according to the calculated distance to obtain a new minority sample set, which specifically includes the following steps:
[0146] Calculate the distance between each sample in the minority class sample set and the center sample;
[0147] Sort the distances from small to large to form a matrix of R'×T';
[0148] Both R' and T' are set values, which can be the same or different, and can take values such as 50, 100 or 200
[0149] Starting from the first row, use the S-NKSMOTE algorithm to oversample the samples corresponding to each row, referring to Example 1 and figure 2 method for oversampling;
[0150] After the samples in each row of the matrix are oversampled, the sample set formed after oversampling is input into ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com