Theme and semantic meaning-based dialogue corpus keyword extraction method

A keyword and corpus technology, applied in the field of natural language processing, can solve problems such as poor effectiveness, ignoring semantics and topics, and low accuracy of keyword extraction

Inactive Publication Date: 2018-09-28
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a dialogue material keyword extraction method based on topics and semantics, which is used to solve the problem that the traditional algorithm of dialogue material keyword extraction ignores semantics and topics, resulting in low accuracy and poor effectiveness of keyword extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Theme and semantic meaning-based dialogue corpus keyword extraction method
  • Theme and semantic meaning-based dialogue corpus keyword extraction method
  • Theme and semantic meaning-based dialogue corpus keyword extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0094] Embodiment 1: as Figure 1-4 Shown, based on the topic and semantics dialogue data keyword extraction method, the specific steps of the method are as follows:

[0095] Step1, first crawl the Chinese corpus and the dialogue material of the talk show, and then preprocess the dialogue material and the Chinese corpus;

[0096] Step2. Combine the preprocessed dialogue corpus with the Chinese corpus to obtain word vectors and topic models;

[0097]Step3. Combining word semantic weight, word semantic clustering weight, and part-of-speech weight multi-weight to finally obtain the weight of the word, and extract keywords based on the word weight to obtain keywords in the dialogue material extracted based on semantics, referred to as the KSel method;

[0098] Step4. Use the TF-IDF method to extract keywords by calculating word frequency and reverse document frequency;

[0099] Step5. The keywords extracted by the TF-IDF method and the KSel method are used as nodes, and the grap...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a theme and semantic meaning-based dialogue corpus keyword extraction method, and belongs to the technical field of natural language processing. The method comprises the following steps of: carrying out combined training on a preprocessed dialogue corpus and a Chinese corpus to obtain a word vector and a theme model; combining a semantic weight, a semantic clustering weight, a part-of-speech weight of a word to finally obtain a weight of the word, and extracting a keyword according to the word weight so as to obtain a keyword in the dialogue corpus extracted on the basis of sematic meaning by utilizing a KSel method=; extracting a keyword through calculating a word frequency and a reverse file frequency by utilizing a TF-IDF method; and taking the keywords extracted by utilizing the KSel method and the TF-IDF method as nodes, establishing a map by taking a semantic similarity between the nodes as an edge, and carrying out map iteration to obtain a final keyword. The method is capable of effectively solving the problem that traditional algorithms neglect the semantic meanings and themes, and considers the word frequency at the same time.

Description

technical field [0001] The invention relates to a method for extracting keywords from dialogue materials based on topics and semantics, and belongs to the technical field of natural language processing. Background technique [0002] Keywords help improve the performance of natural language processing tasks such as text classification and information retrieval. Therefore, there have been many researches on automatic keyword extraction or generation at home and abroad. In recent years, with the rapid development of social networks such as instant chat, online shopping consultation, and automatic question-answering systems, a large number of dialogue characteristic documents have been accumulated. Extracting keywords from dialogues can be used to summarize, organize, and retrieve dialogue content, and can also be used for user personal services, advertisement recommendations, etc. Compared with traditional long texts, this type of data has the characteristics of dialogue, sho...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F40/289G06F40/30
Inventor 黄青松胡迁李帅彬郎冬冬郭勃宋莉娜
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products