Text data comparison method and device

A text data and data dictionary technology, applied in the field of information processing, can solve problems such as the lack of automatic processing methods and automatic comparison problems, and achieve the effect of relieving work pressure

Active Publication Date: 2022-07-08
NAT UNIV OF DEFENSE TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the Extract-Transform-Load (ETL) technology of data warehouses is widely used in the industry to realize the extraction, transformation and fusion of heterogeneous data. The existing research results: use Python as the intermediate unit to realize the records of tables in heterogeneous databases Data comparison solves the problem of data access and table record level comparison in different storage databases, but still fails to solve the problem of automatic comparison of the same data ontology in different expressions, and lacks automatic processing methods based on semantics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text data comparison method and device
  • Text data comparison method and device
  • Text data comparison method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

[0048] The text data comparison method provided by this application can be used for the comparative analysis of text data items in two data dictionary tables. A set of mappings from one set to another, and the process of annotating data items that change. Since the data items in the two data sets are not completely consistent, the association between different data values ​​of the same data ontology is usually achieved through semantic similarity (for example, "Nanning City, Guangxi Zhuang Autonomous Region" and "Nanning City, Guangxi" are semantically equivalent, But...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text data comparison method and device in the technical field of information processing. The method comprises the steps that text data item sets in two data dictionary tables are obtained, word segmentation processing is conducted on the two text data item sets, a Chinese word set of each element in the two text data item sets is obtained, similarity measurement between the elements of the two text data item sets is calculated, and the Chinese word set of each element in the two text data item sets is obtained. The method comprises the following steps of: obtaining similarity measurement, preprocessing the similarity measurement through a preset similarity ratio threshold value to obtain a similarity measurement matrix, converting a comparison analysis problem of two text data item sets into a problem of seeking an optimal matching scheme by a bipartite graph through abstraction and modeling of a dictionary table comparison analysis problem, and solving the problem by utilizing a KM algorithm. According to the method, automatic comparison and analysis of the dictionary table data based on semantics are realized, the working pressure of manual comparison in the data reorganization process is effectively relieved, and a new thought is provided for automatic processing of data comparison.

Description

technical field [0001] The present application relates to the technical field of information processing, and in particular, to a text data comparison method and device. Background technique [0002] With the reduction of data collection and storage costs, the amount of data shows explosive growth, but at the same time, more and more requirements are put forward for data association and integration, and data association and integration face more and more challenges. As a key bridge between original data and high-value data, data integration and reorganization plays an increasingly important role in data-based statistical analysis, and has also become an increasingly basic and heavy task in data processing. [0003] The data dictionary table is the basic data that defines the metadata such as data items in the current database system, and is the key information for the application and understanding of the entire database system. Therefore, the comparison, association and conne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F16/36G06F40/242G06F40/289
CPCG06F16/3344G06F16/374G06F40/242G06F40/289
Inventor 张万鹏张虎谷学强胡丽项凤涛王超杨景照张煜
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products