A text semantic similarity measurement method based on pointwise mutual information

A technology of semantic similarity and similarity measurement, applied in the field of text topic clustering, can solve the problems of high retrieval cost and low accuracy of information retrieval, and achieve the effect of sufficient semantic extraction

Active Publication Date: 2019-06-04
SHANXI UNIV
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the technical problems of text clustering, low information retrieval accuracy and high retrieval cost, the present invention provides a text semantic similarity processing method based on point mutual information,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text semantic similarity measurement method based on pointwise mutual information
  • A text semantic similarity measurement method based on pointwise mutual information
  • A text semantic similarity measurement method based on pointwise mutual information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The following clearly and completely describes the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0026] The text semantic similarity measurement method based on point mutual information in the present embodiment comprises the following steps:

[0027] Step 1: According to the collected document data, extract the keywords contained in the document, the collected document data will be preprocessed, remove the non-keywords in the document, and extract the document containing keywords;

[0028] Step 2: Count the frequency of keywords and arrange them in descending order with the frequency of keywords as the main keyword;

[0029] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of text topic clustering, in particular to a pointwise mutual information-based text semantic similarity measurement method. The method comprises steps ofbased on a co-occurrence latent semantic vector space model, further extracting a potential semantic similarity relationship among the keywords by utilizing the pointwise mutual information so that two keywords which do not have a co-occurrence relation originally are enabled; by constructing the keyword co-occurrence vector, further extracting and mining the potential semantic similarity relationship between the keywords, so that semantic extraction is more sufficient, a text semantic similarity measurement method based on point mutual information is established, and the application of the method can effectively improve the text clustering and information retrieval precision and reduce the retrieval cost.

Description

technical field [0001] The invention belongs to the technical field of text topic clustering. The invention further extracts the potential semantic similarity relationship between keywords by using point mutual information, and establishes a text semantic similarity measurement method based on point mutual information. The application of this method will effectively improve The accuracy of text clustering and information retrieval reduces retrieval costs. Background technique [0002] With the rapid development and popularization of computer network technology, a large amount of text information in written form is converted into electronic text for storage and transmission. When the efficiency of information generation and transmission accelerates, an information explosion occurs, and human society enters the era of big data. In the era of big data and information explosion, literature resources have been greatly enriched, resource retrieval accuracy has decreased, and retr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/22G06F16/33G06F16/35
Inventor 牛奉高赵霞
Owner SHANXI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products