Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data cleaning algorithm based on Internet trading information

A transaction information and data cleaning technology, applied in the field of data cleaning algorithms for Internet transaction information, can solve problems such as decreased accuracy, rule judgment standards cannot be guaranteed to be completely consistent, and rule mining is insufficient.

Inactive Publication Date: 2015-11-11
ZHEJIANG LISHI TECH
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. Data cleaning through the functional dependencies between keys in relational data is a relatively straightforward method, but it is not sufficient for rule mining of massive data such as transaction information on the Internet
[0006] 2. The data method based on conditional functional dependence uses functional dependence as the basis and adds semantic constraints, which can effectively clean data tuples with functional dependence. However, Internet transaction information comes from different e-commerce platforms, and many The functional dependence of the data is not clear, and some data cannot obtain the functional relationship before cleaning
The advantage of this method is that the accuracy of human participation will be greatly improved, but the time consumption of processing is relatively large; at the same time, different people cannot guarantee that the criteria for judging the rules of dependency are completely consistent, and the subjective dependence is too strong.
[0008] 4. Adopt the feedback method of machine learning, that is, replace the human feedback process with the machine learning method, let the machine learn the correct cleaning operation before the cleaning process, and then accumulate and learn continuously during the cleaning process, which can increase the time efficiency of the algorithm , but the accuracy has declined, and the learning process will increase the additional overhead of the system, and at the same time, the dependency between the data is still relatively high during the cleaning process
[0009] To sum up, the current data cleaning methods have certain limitations for the processing of Internet transaction information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning algorithm based on Internet trading information
  • Data cleaning algorithm based on Internet trading information
  • Data cleaning algorithm based on Internet trading information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The present invention will be described in detail below in terms of specific embodiments in conjunction with the accompanying drawings. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It is to be noted that other embodiments may be utilized or structural and functional modifications may be made to the embodiments set forth herein without departing from the scope and spirit of the invention.

[0056] In the embodiment of the data cleaning algorithm of a kind of Internet transaction information provided by the present invention, such as figure 1 As shown, the Internet transaction information data to be cleaned is tested for data quality problems to obtain clean tuples, correct tuples and problem tuples;

[0057] For the clean tuple: directly send it to the clean database;

[0058] For the correct tuple: generate a key sentence that needs to be retrieved from the ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for cleaning data based on different data resources, i.e., different Internet trading platforms. According to the method, firstly, tuples in a database are classified; correctness-confirmed tuple data in the tuples are subjected to mode interaction with an expert knowledge base; fuzzy matching based on retrieval contents of the knowledge base is used as a tool to obtain corresponding mode knowledge; then, the found mode knowledge is used for cleaning applicable data with quality problems. Meanwhile, a proper efficient detection scheme is also provided for quality errors of different types of mass data. A BP (Back Propagation) neural network method is adopted for realizing the self-learning expert knowledge base, thereby providing an efficient and safe cleaning mode for the Internet trading information data cleaning.

Description

technical field [0001] The invention relates to the field of computer applications, in particular to a data cleaning algorithm for Internet transaction information. Background technique [0002] In recent years, my country's Internet transactions have continued to maintain rapid development, with an average growth rate of 80% in the past five years. In 2013, the total transaction volume of e-commerce exceeded 10 trillion yuan, and the scale of the online retail market has surpassed that of the United States to become the world's largest online retail market. With the development of e-commerce, some problems that are difficult to solve in the market itself have emerged, including false promotion of products, proliferation of counterfeit goods, popular online fraud and phishing websites, irregular logistics and distribution services, difficulties in returning goods and unsmooth reverse logistics, and Internet users. Personal information leakage and other issues. The main rea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/215
Inventor 陈海江吕浩邵奇可颜世航
Owner ZHEJIANG LISHI TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products