A text semantic similarity measurement method based on pointwise mutual information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of semantic similarity and similarity measurement, applied in the field of text topic clustering, can solve the problems of high retrieval cost and low accuracy of information retrieval, and achieve the effect of sufficient semantic extraction

Active Publication Date: 2019-06-04

SHANXI UNIV

View PDF7 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Aiming at the technical problems of text clustering, low information retrieval accuracy and high retrieval cost, the present invention provides a text semantic similarity processing method based on point mutual information,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] The following clearly and completely describes the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0026] The text semantic similarity measurement method based on point mutual information in the present embodiment comprises the following steps:

[0027] Step 1: According to the collected document data, extract the keywords contained in the document, the collected document data will be preprocessed, remove the non-keywords in the document, and extract the document containing keywords;

[0028] Step 2: Count the frequency of keywords and arrange them in descending order with the frequency of keywords as the main keyword;

[0029] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of text topic clustering, in particular to a pointwise mutual information-based text semantic similarity measurement method. The method comprises steps ofbased on a co-occurrence latent semantic vector space model, further extracting a potential semantic similarity relationship among the keywords by utilizing the pointwise mutual information so that two keywords which do not have a co-occurrence relation originally are enabled; by constructing the keyword co-occurrence vector, further extracting and mining the potential semantic similarity relationship between the keywords, so that semantic extraction is more sufficient, a text semantic similarity measurement method based on point mutual information is established, and the application of the method can effectively improve the text clustering and information retrieval precision and reduce the retrieval cost.

Description

technical field [0001] The invention belongs to the technical field of text topic clustering. The invention further extracts the potential semantic similarity relationship between keywords by using point mutual information, and establishes a text semantic similarity measurement method based on point mutual information. The application of this method will effectively improve The accuracy of text clustering and information retrieval reduces retrieval costs. Background technique [0002] With the rapid development and popularization of computer network technology, a large amount of text information in written form is converted into electronic text for storage and transmission. When the efficiency of information generation and transmission accelerates, an information explosion occurs, and human society enters the era of big data. In the era of big data and information explosion, literature resources have been greatly enriched, resource retrieval accuracy has decreased, and retr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/27G06F17/22G06F16/33G06F16/35

Inventor 牛奉高赵霞

Owner SHANXI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A text semantic similarity measurement method based on pointwise mutual information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology