Form concept similarity rapid measurement method with both general semantics and domain semantics

A general semantic and formal concept technology, applied in the field of information retrieval, can solve the problems that the semantic information of the knowledge base cannot be included, and the process of computing the connotative semantic similarity of the information in the data set field is complicated, etc., and achieve the effect of fast measurement process and remarkable effect

Active Publication Date: 2020-09-25
HARBIN ENG UNIV +1
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The purpose of the present invention is to solve the problem that the semantic information of the knowledge base cannot contain the domain information contained in the data set in the existing method for judging the similarity of form and concept in the process of acquiring text information using FCA technology, and the process of calculating the semantic similarity of connotation is too complicated And other issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Form concept similarity rapid measurement method with both general semantics and domain semantics
  • Form concept similarity rapid measurement method with both general semantics and domain semantics
  • Form concept similarity rapid measurement method with both general semantics and domain semantics

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0024] Specific implementation mode one: combine figure 1 To describe this embodiment,

[0025] Definition of Terms

[0026] Formal background: Formal background F is a triplet, that is, F=(O,A,R); among them, O is an object set, and the element o in the object set O is called an object; A is an attribute set, and the attribute set A The element a in is called an attribute; R is a binary relation between O and A defined by the Cartesian product, namely Given two sets and but When the formal background takes documents as objects and words in the documents as attributes, such formal backgrounds are called textual formal backgrounds;

[0027] Formal concept: A formal concept C on a formal background F=(O,A,R) is a two-tuple, namely c=(E,I), where, And satisfy E'=I, I'=E, E is the extension of concept C, I is the connotation of concept C;

[0028] Concept lattice: the set of all concepts c on the formal background F and the inclusion relationship between the extension ...

specific Embodiment approach 2

[0040] In the fast measurement method for formal concept similarity with both general semantics and domain semantics described in this embodiment, in step 1, for the constructed concept lattice, the inverse concept frequency of all members and the attribute frequency of all attributes are obtained, respectively Obtain the importance of each member; the specific process includes the following steps:

[0041] Each concept in the concept lattice contains several members. If a member is contained by many concepts, the member is relatively common. Conversely, if a member is contained by only a few concepts, the concept is more specific. , two concepts share such members, the possibility of the two concepts being similar is high;

[0042] For the concept lattice L, calculate the inverse concept frequency ICF of each member:

[0043] Calculation of member e in concept lattice L i The inverse concept frequency ICF i , the formula is:

[0044]

[0045] Among them, N is the total n...

specific Embodiment approach 3

[0055] In the fast measurement method of formal concept similarity with both general semantics and domain semantics described in this embodiment, the cosine similarity of the two intensional vectors is calculated using the number of times words existing in the two intensions belong to the same topic class in step two The specific process of making corrections and obtaining the semantic similarity of the two connotations including general semantics and domain semantics includes the following steps:

[0056] For two concepts of the form c 1 =(E 1 , I 1 ) and c 2 =(E 2 , I 2 ), E 1 , I 1 concept c 1 The extension and connotation of E 2 , I 2 concept c 2 The extension and connotation of the two connotations can be expressed as: I 1 ={t 1 ,t 2 ,...,t a ,t x1 ,t x2 ,...,t xb} and I 2 ={t 1 ,t 2 ,...,t a ,t y1 ,t y2 ,...,t yb}, I 1 and I 2 The elements in are the contained words, obviously, I 1 and I 2 Contains a (a≥0) identical words, and when a=0, the tw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a form concept similarity rapid measurement method with both general semantics and domain semantics, which is mainly used for performing form concept similarity rapid measurement in text information retrieval by utilizing an FCA method, and belongs to the technical field of information retrieval. The method aims at solving the problems that an existing formal concept similarity judgment method cannot contain domain information covered by a data set, and the process of calculating the connotation semantic similarity is too complex. According to the method, through topicclustering of a data set, the frequency of common categories of words is counted; two concept similarities are solved based on the importance of the members by utilizing the attribute frequency and the inverse concept frequency of the concept lattice members, wherein each attribute in the connotation is represented by a word vector based on a public corpus, and the mean value of the word vectors is taken as a connotation vector; the similarity of the connotation vectors are corrected by using the frequency of the common category of each word to obtain the similarity of two connotations including general and domain semantics; and the similarity of the two form concepts is obtained by integrating the similarity based on member importance and the connotation semantic similarity.

Description

technical field [0001] The invention belongs to the technical field of information retrieval, and in particular relates to a method for measuring formal concept similarity for text information retrieval. Background technique [0002] With the rapid development of modern science and technology, the amount of social information has increased sharply, and social informatization has become an irresistible trend of the times. Therefore, information has become an important strategic resource, and information acquisition is particularly important. Among the mass information, text information has always been the main content and occupies a very important position. [0003] In the process of information acquisition, formal concept analysis (FCA) has become an important information acquisition technology, and with the enhancement of the basic capabilities of software and hardware, it has been used in more and more fields. The FCA-based information acquisition technology defines the d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/289G06F40/216G06F16/35G06K9/62
CPCG06F40/30G06F40/289G06F40/216G06F16/35G06F18/22
Inventor 王福刚王红滨周连科陈田田张梅恒
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products