Data information consistency processing method, system and device based on big data

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data information and consistency technology, applied in the field of big data, can solve problems such as long execution time and complex process processing

Inactive Publication Date: 2017-10-03

BEIJING HONGMA MEDIA CULTURE DEV

View PDF3 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In the process of data cleaning, the data consistency check on the data from each data source needs to judge the uniqueness of the data based on the combination of multiple fields in each table. The process is complicated and the execution time is too long

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0056] refer to figure 1 , figure 1 A flow chart of an embodiment of a big data-based data information consistency processing method provided by the present invention is shown. Including: step S110 to step S160.

[0057] In step S110, the business primary key of at least one data table to be processed is obtained.

[0058] In step S120, the business primary key is converted into a unified standard format to generate a verification code.

[0059] In step S130, the Hamming distance algorithm is used to determine the similarity of the verification code data.

[0060] In step S140, the identification codes of the verification code data are sequentially generated by using the drawer principle algorithm.

[0061] In step S150, the first identification code is compared with each subsequent identification code, when the subsequent identification code is the same as the first identification code, the identification code of the subsequent identification code is recorded as the secon...

Embodiment 2

[0089] refer to figure 2 , figure 2 A structural block diagram of an embodiment of a data information consistency processing system 200 based on big data provided by the present invention is shown. include:

[0090] An acquisition module 21, configured to acquire the business primary key of at least one data table to be processed;

[0091] A conversion module 22, configured to convert the business primary key into a unified standard format to generate a verification code;

[0092] A determining module 23, configured to determine the similarity of the verification code data by using the Hamming distance algorithm;

[0093] The generating module 24 is used to sequentially generate the identification codes of the verification code data by adopting the drawer principle algorithm;

[0094] Contrast module 25, is used for comparing the first identification code with each subsequent identification code, when the subsequent identification code is the same as the first identifica...

Embodiment 3

[0113] refer to image 3 , image 3 A structural block diagram of an example of a data information consistency processing device 300 based on big data provided by the present invention is shown. It includes the system 200 described in any one of the second embodiment.

[0114] Embodiment 3 of the present invention provides a data information consistency processing device based on big data. The invention obtains at least one business master key of a data table to be processed; converts the business master key into a unified standard format to generate a verification code; Use the Hamming distance algorithm to determine the similarity of the verification code data; use the drawer principle algorithm to sequentially generate the identification codes of the verification code data; compare the first identification code with each subsequent identification code, and follow-up When the identification code is the same as the first identification code, record the identification code o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a data information consistency processing method, system and device based on big data. The data information consistency processing method comprises obtaining a business major key of at least one data sheet to be processed; converting the business major key to be in a unified standard format, and generating a verification code; determining the verification code data similarity by means of a Hamming distance algorithm; sequentially generating identification codes of the verification code data by means of a drawer principle algorithm; comparing the first identification code with each subsequent identification code, and marking a distinguishing code of the subsequent identification code as a second distinguishing code when the subsequent identification code is the same as the first identification code; and deleting the data the identification code of which has the second distinguishing code. When more than one hundred million data in multiple rows or multiple columns is processed, a lot of processing time is saved, and the data processing efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of big data, and in particular, to a method, system and device for processing data information consistency based on big data. Background technique [0002] With the development of the Internet and mobile Internet, the continuous increase of data has become a significant feature of the era of big data. Enterprises are also paying more and more attention to big data. No matter from the perspective of data storage, calculation and application, they have invested more manpower and material resources to try and explore. [0003] One of the important prerequisites for the production and use of big data is data cleaning. Data cleaning refers to the final process of finding and correcting identifiable errors in data files, including checking data consistency, handling invalid and missing values, etc. Because the data in the data warehouse is a collection of data oriented to a certain topic, these data are ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F11/08

CPCG06F16/2365G06F11/08G06F16/215

Inventor 顾喜德

Owner BEIJING HONGMA MEDIA CULTURE DEV

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data information consistency processing method, system and device based on big data

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology