Mutual information-based data discretization and feature selection integrated method and apparatus

A feature selection and integrated device technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of reducing the performance of learning algorithms, information loss, ignoring internal connections, etc., to avoid information loss and reduce information loss , improve the effect of learning

Inactive Publication Date: 2017-02-15
PEKING UNIV +1
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of data preprocessing, the existing technology regards the discretization of continuous data and feature selection as two independent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mutual information-based data discretization and feature selection integrated method and apparatus
  • Mutual information-based data discretization and feature selection integrated method and apparatus
  • Mutual information-based data discretization and feature selection integrated method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The principles and properties of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention and are not intended to limit the scope of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0055] The present invention provides a kind of data discretization and feature selection integration method, comprise three processes: the generation process of candidate breakpoint set, the forward search process of optimal breakpoint subset, data discretization and feature selection process, following steps:

[0056] The generation process of the set of candidate breakpoints, such as figure 1 As shown, the detailed process is as follows:

[0057] 1) As shown in step S101, for the eigenvalue distribution...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mutual information-based data discretization and feature selection integrated method. The method comprises the steps of 1) generating a proper candidate breakpoint set by performing breakpoint analysis on data; 2) searching for an optimal breakpoint subset in the candidate breakpoint set by adopting forward search, and evaluating the breakpoint subset by calculating mutual information between a data division result and original tag distribution of the data; 3) defining a search stop condition that the information gain ratio is less than a preset threshold or the total number of selected breakpoints exceeds the preset threshold; and 4) performing discretization and feature selection on the data by using the optimal breakpoint subset. The invention furthermore discloses a mutual information-based data discretization and feature selection integrated apparatus. According to the mutual information-based data discretization and feature selection integrated method and apparatus, the data discretization and feature selection processes are organically integrated, so that unrelated and redundant information in the data can be effectively removed and the performance of a subsequent learning algorithm can be improved.

Description

technical field [0001] The invention belongs to the field of data preprocessing in data mining, adopts mutual information as an evaluation criterion, and integrates data discretization and feature selection. The present invention can be directly applied to the preprocessing stage of continuous data with label information, and simultaneously completes discretization and feature selection of data. The invention relates to a data discretization and feature selection integration method and device based on mutual information. Background technique [0002] Data discretization is a process of converting continuous features into nominal data or ordered data. Divide the value range of continuous features into multiple small intervals, and each interval represents an ordered numerical or categorical data. These discretized data are suitable for some learning algorithms that cannot use continuous features. At the same time, they can effectively remove hidden defects in the data and i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2474G06F16/2465
Inventor 刘宏志付彬易晖吴波赵鹏吴中海
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products