Unbalanced data set conversion method and system based on sampling and feature reduction
A technology of unbalanced data and conversion method, which is applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of the complexity of the classifier training process, and does not consider the different importance of classifiers, so as to achieve the goal of improving accuracy Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072] Embodiment 1 of the present invention provides an unbalanced data set conversion method based on sampling and feature reduction, such as figure 1 As shown, the method includes the following steps:
[0073] S1: Obtain an unbalanced data set, the unbalanced data set includes a majority class sample set and a minority class sample set;
[0074] S2: Sampling the unbalanced data set to obtain a new unbalanced data set, including using the S-NKSMOTE algorithm to oversample the minority class sample set, refer to figure 2 ,Specifically:
[0075] S21 Obtain k nearest neighbor samples of the sample x in the minority class sample set;
[0076] Among them, the k nearest neighbor samples are the k samples closest to the sample x in the kernel space, and the value of k can be set, which can be 100, 500, etc.;
[0077] S22: Compare the number of minority class samples in the k nearest neighbor samples with the number of majority class samples, when the number of minority class sa...
Embodiment 2
[0094] Embodiment 2 of the present invention provides an unbalanced data set conversion method based on sampling and feature reduction, the method includes the following steps:
[0095] S1: Obtain an unbalanced data set, the unbalanced data set includes a majority class sample set and a minority class sample set;
[0096] S2: Sampling the unbalanced data set to obtain a new unbalanced data set. For the specific method of step S2, refer to Figure 4 , including:
[0097] S210: Acquire boundary sample sets of the majority class sample set and the minority class sample set;
[0098] refer to Figure 5 , step S210 is specifically, wherein the distances referred to below are all distances in the nuclear space;
[0099] S211: Calculate the distance between each majority class sample and its nearest minority class sample in the majority class sample set;
[0100] S212: Calculate the distance between each minority class sample and its nearest majority class sample in the minority ...
Embodiment 3
[0138] Embodiment 3 of the present invention provides an unbalanced dataset conversion system based on sampling and feature reduction, such as Figure 9 As shown, the conversion system includes:
[0139] Obtain a data acquisition module 1 of an unbalanced data set, the unbalanced data set includes a majority class sample set and a minority class sample set;
[0140] Perform sampling processing on the unbalanced data set to obtain a sampling processing module 2 of a new unbalanced data set;
[0141] Perform dimensionality reduction processing on the new unbalanced data set, and convert it into a dimensionality reduction processing module 3 of a new unbalanced data set with reduced features.
[0142] continue to refer Figure 9 , the sampling processing module 2 includes:
[0143] Boundary sample acquisition submodule 210: used to obtain the boundary sample set of the majority class sample set and the minority class sample set; wherein, the boundary sample acquisition submodu...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com