Short text classification method fusing knowledge graph and topic model

A technology of knowledge graph and topic model, which is applied in text database clustering/classification, unstructured text data retrieval, semantic analysis, etc. It can solve the problems of short space, unsatisfactory effect, and noise in text classification, and achieve good technical results. , the effect of mitigating inaccuracy

Pending Publication Date: 2022-05-13
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for short text classification problems, these methods have not achieved satisfactory results.
Compared with long texts, short texts have the characteristics of short length and weak topic clarity. After preprocessing such as removing stop words, there are usually only a few or a dozen words with actual meaning left, and it is difficult to construct high-quality texts. features for classification
Directly applying the method for long text classification to short text classification problems will result in sparse features, making it difficult to achieve accurate classification
In addition, the polysemy phenomenon in the text often brings a lot of noise to the text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method fusing knowledge graph and topic model
  • Short text classification method fusing knowledge graph and topic model
  • Short text classification method fusing knowledge graph and topic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

[0036] An embodiment of the present invention provides a short text classification method that integrates a knowledge map and a topic model, including the following steps:

[0037] 1. Short text preprocessing

[0038] The short text data with existing labels is used as the training set, and the short text data to be classified is used as the test set, and the text is preprocessed such as removing special symbols, removing stop words, and word segmentation.

[0039] Word segmentation of short texts: use the jieba word segmentation tool to initially divide short texts into a collection of words.

[0040] Remove stop words: Customize the stop word list to delete meaningless words in the word set, such as "的", "了" and punctuation marks.

[0041] Finally get the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text classification method fusing a knowledge graph and a topic model, which belongs to the field of natural language processing, and can effectively expand the feature content of a short text without changing the semantics of the original text by fusing the knowledge graph and the topic model technology and utilizing the knowledge graph to obtain external knowledge and perform feature expansion. And training a topic model by using the expanded text data, mining semantic association between the texts and taking the semantic association as an expansion feature, relieving inaccuracy of a text classification task caused by a synonym phenomenon, and finally performing classification prediction of the short texts by using a support vector machine.

Description

technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to a short text classification method based on a knowledge map and a topic model. Background technique [0002] With the rapid development of the Internet, a large amount of short text data has been generated in the fields of online news, social media, instant messaging, etc. How to effectively mine valuable information from short text data is a current key research topic. [0003] Text classification has a wide range of applications, including information recommendation, automatic question answering, search engines, mail filtering, etc. In the past few decades, researchers at home and abroad have proposed and improved some machine learning and deep learning algorithms, and applied them in the field of text classification. These solutions include: using vector space models to represent text features, and then Use a classifier for text classification; or use de...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/35G06F40/289G06F40/30G06K9/62
CPCG06F16/367G06F16/35G06F40/30G06F40/289G06F18/2411
Inventor 刘峰许淞源
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products