A data cleaning method and device

A data cleaning and data technology, applied in the computer field, can solve the problems of inability to realize massive data cleaning, low applicability of non-real-time historical data, and low data cleaning efficiency.

Active Publication Date: 2020-04-03
XIANGYANG BRANCH CHINA MOBILE GRP HUBEI CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, there are at least the following problems in the existing related technologies of data cleaning: 1) the related technologies mainly deal with real-time historical databases, and are not very applicable to non-real-time historical data; The efficiency of data cleaning is low; 3) Related technologies are only applicable to sample data in the cleaning process, and cannot realize the cleaning of massive data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data cleaning method and device
  • A data cleaning method and device
  • A data cleaning method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In the embodiment of the present invention, the data to be cleaned is obtained, and based on the analysis of the noise data distribution in the data to be cleaned, the field to be cleaned in the data to be cleaned is obtained; the dimension expandable field in the data to be cleaned is searched, And perform high-order tensor dimension expansion on the expandable dimension field to obtain M tensor field sets; use the tensor field related to the field to be cleaned in the tensor field set to perform data processing on the field to be cleaned Cleaning; wherein, M is a positive integer.

[0032] figure 1 It is a schematic flow chart of a data cleaning method according to an embodiment of the present invention, as figure 1 As shown, the flow of the data cleaning method in this embodiment includes:

[0033] Step 101: Obtain the data to be cleaned, and obtain the fields to be cleaned in the data to be cleaned according to the analysis of the distribution of noise data in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a data cleaning method. The method comprises: acquiring to-be-cleaned data, and according to analysis for noise data distribution in the to-be-cleaned data, obtaining to-be-cleaned fields in the to-be-cleaned data; searching a field which can be subjected to dimension expansion in the to-be-cleaned data, and performing high-order tensor dimension-expanding on the field which can be subjected to dimension expansion to obtain M tensor fieldsets; and carrying out data cleaning on the to-be-cleaned fields by using tensor fields in the tensor fieldsets, which are associated with the to-be-cleaned fields. The present invention also discloses a data cleaning apparatus.

Description

technical field [0001] The invention relates to data processing technology in the computer field, in particular to a data cleaning method and device. Background technique [0002] With the advancement of science and technology and the rapid development of computer technology, people can obtain more and more digital information, and at the same time need to invest more time in organizing and sorting out the information. Before performing statistical analysis on the data, it is necessary to filter out the dirty data in the data, that is, noise data, to ensure the accuracy of the statistics. Data cleaning is a process of detecting and eliminating data errors and inconsistencies in the database and improving data quality. Its principle is to use related technologies to convert data into data that meets data quality requirements. [0003] However, there are at least the following problems in the existing related technologies of data cleaning: 1) the related technologies mainly d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215
Inventor 廖振松熊胜吴勤华杨晶蕾冯文仲沈力黄艳田纪军莫益军曾志华
Owner XIANGYANG BRANCH CHINA MOBILE GRP HUBEI CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products