Tree diagram-based data similarity matching method and apparatus

A technology of data similarity and similarity matching, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc. Consider the effects of variables

Inactive Publication Date: 2018-07-10
GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Therefore, the similarity algorithm has been vigorously developed. Since both parties in the similarity algorithm in the ontology are placed under the same dendrogram, the data similarity matching method based on the dendrogram has been greatly applied, but at present The similarity matching method of the present invention has a narrow range of applications and low accuracy, which makes people urgently need a method and device for similarity matching of dendrogram data with higher accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tree diagram-based data similarity matching method and apparatus
  • Tree diagram-based data similarity matching method and apparatus
  • Tree diagram-based data similarity matching method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] like figure 2 As shown, it is a flowchart of a data similarity matching method based on a dendrogram in the present invention, wherein the data similarity matching method based on a dendrogram includes:

[0061] Step S1, for the data requiring similarity matching, establish a dendrogram with the data as part of the nodes;

[0062] The need to perform similarity matching on the data means that there must be a connection between these data, that is to say, there is a dendrogram with these data as some nodes to illustrate the relationship between these data. Therefore, a dendrogram with these data as part of nodes can be established or found.

[0063] Step S2, based on the amount of information, perform similarity calculation on the data;

[0064] Each data has its information content, and the similarity between two data can be calculated according to the information content.

[0065] Step S3, performing similarity calculation on the data based on attributes;

[0066]...

Embodiment 2

[0071] As described above, the data similarity matching method based on the dendrogram, the difference of this embodiment is that, as image 3 As shown, the dendrogram-based data similarity matching method also includes:

[0072] Step S4, performing similarity calculation on the data based on the semantic distance;

[0073] Semantic distance refers to the number of edges experienced by the shortest path in the path connecting two corresponding nodes in the ontology tree. The semantic distance between two data in the same dendrogram is related to its similarity, so the similarity between two data can be calculated according to the semantic distance.

[0074] In this way, when calculating the similarity of data, the semantic distance is also added as a variable that affects the similarity, which increases the consideration variables when calculating the similarity and improves the accuracy of the similarity.

Embodiment 3

[0076] As described above, the data similarity matching method based on the dendrogram, the difference of this embodiment is that, as Figure 4 As shown, the dendrogram-based data similarity matching method also includes:

[0077] In step S5, the similarity calculation is performed on the data based on the semantic density.

[0078] Semantic density refers to the number of sibling nodes of the data. The number of child nodes of different branch nodes in the dendrogram is different. If in the dendrogram, the greater the density of a certain local node, the greater the refinement of the concept of the node, and the corresponding semantic similarity The higher the degree. Therefore, the similarity between two data can be calculated according to the semantic density.

[0079] In this way, when calculating the similarity of data, the semantic density is also added as a variable that affects the similarity, which increases the consideration variables when calculating the similari...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a tree diagram-based data similarity matching method and apparatus. The method comprises the steps of S1, for data needed to be subjected to similarity matching, establishing atree diagram taking the data as part of nodes; S2, based on an information quantity, performing similarity calculation on the data; S3, based on attributes, performing similarity calculation on the data; and S6, performing weighted calculation on data similarity to obtain overall similarity. The apparatus comprises corresponding tree diagram establishment unit, information quantity-based similarity calculation unit, attribute-based similarity calculation unit and overall similarity calculation unit. Therefore, more accurate overall similarity can be obtained in combination with multiple factors influencing the data similarity.

Description

technical field [0001] The invention relates to the field of data similarity calculation, in particular to a method and device for matching data similarity based on a dendrogram. Background technique [0002] Semantic Web, the purpose of its existence and interconnection is to enable computers to automatically process and integrate data from different data sources. Ontology is the basis for information sharing and exchange at the semantic level and is also the key technology for realizing the Semantic Web. It realizes a common understanding of knowledge in a certain field through structured descriptions, and assists people to communicate accurately with computers in terms of grammar or semantics. It is the semantic basis of human-computer communication. The differences in knowledge and background of ontology builders lead to the possibility that the same semantic concept may use different identifiers or exist in different forms in different ontologies. [0003] The expansi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/2246G06F16/2462G06F40/30
Inventor 杨婉李青海黄超潘宇翔王平张晓亭
Owner GUANGDONG KINGPOINT DATA SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products