Dialogue short text clustering method based on form and semantic similarity

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A semantic similarity and short text technology, applied in text database clustering/classification, unstructured text data retrieval, instrumentation, etc., can solve problems such as short text cannot be handled well, prominent, single topic, etc.

Inactive Publication Date: 2014-08-27

EAST CHINA NORMAL UNIV

View PDF5 Cites 26 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] (2) The topic is single, and a short dialogue text usually only discusses one thing

[0007] (4) Synonyms, mixed use of upper and lower case letters, and input errors are prominent

For example, Sahami et al. enter short texts into search engines to obtain the most relevant text sets returned, and these text data are used as auxiliary data information for corresponding short texts. This method solves the information sparsity of short texts to a certain extent, but A large amount of external auxiliary data is required, which leads to great restrictions on application scenarios

Another commonly used method is to use the knowledge base to expand the feature representation of words. For example, Hu et al. use WordNet or Wikipedia knowledge base to solve the problem of feature information sparsity. This method can supplement feature information from the semantic level, but for colloquial, wrong Short texts with severe noise cannot be handled well

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0031] The present invention can effectively cluster short dialogue texts. The following takes the dialogue text provided by Xiaoi robot as an example, combined with the attached figure 2 The present invention is further described.

[0032] The implementation process mainly includes two stages. The first stage is to filter and preprocess the original text data, such as text length filtering, Chinese word segmentation, and unification of English strings, and then use the keyword extraction tool to obtain keywords and weights; In the second stage, the short text collection is clustered using the morphology of strings and the semantic similarity of words, which is the process of FS-STC clustering method.

[0033] 1). Preprocessing stage

[0034] If the text set that needs to be clustered is a short Chinese text, it is first necessary to use the word segmentation tool to segment the short text, and use the Chinese Academy of Sciences 2014 word segmentation tool to segment the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a dialogue short text clustering method based on form and semantic similarity. The form similarity adopts character string editing distance similarity, and the semantic similarity is based on HowNet and WordNet knowledge bases; weight values of the short text and words are introduced during the calculation of the short text similarity. The dialogue short text clustering method based on the form and semantic similarity solves the problems of certain irregular and input wrong noise information, synonyms and semantic gaps included in the dialogue short text to a certain extent, and consequently, relatively great improvement is realized in comparison with a word bag vector based clustering method.

Description

technical field [0001] The invention belongs to the technical field of short text clustering, and relates to a method for clustering short texts of dialogues based on the similarity of string edit distance and the semantic similarity of words. Background technique [0002] With the rapid development of mobile communication and mobile Internet, various human-machine intelligent dialogue systems have emerged, such as Siri, google now, Xiaoi robot, etc. Taking Xiaoi Robot as an example, the number of users has exceeded 100 million, and there are 10 billion dialogue visits every year and a large amount of valuable dialogue text data are generated. These data are important data sources for user interest mining and knowledge base improvement of intelligent dialogue systems. Clustering analysis on these dialogue text data can gather similar dialogue texts and form several important cluster centers, which can improve the efficiency of mining user interests and extracting knowledge t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/35

Inventor 胡琴敏陈国梁杨河彬罗念钟哲凡裴逸钧

Owner EAST CHINA NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Dialogue short text clustering method based on form and semantic similarity

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology