Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text classification method based on conditional entropy and convolution neural network

A convolutional neural network and classification method technology, applied in the field of short text classification, can solve the problems of large amount of calculation, affecting classification accuracy, low classification accuracy, etc., and achieve the effect of good effect, filtering and filtering accuracy

Inactive Publication Date: 2019-02-01
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

They all have certain shortcomings: decision tree, if we have more data in the video field, in the decision tree, the result of information gain will be biased towards the video field; SVM is sensitive to missing data; KNN category scores are not normalized, and the amount of calculation Large; in theory, the Naive Bayesian model has a smaller error rate than other classification algorithms, but in practice this is not always the case, because the Naive Bayesian model assumes that the attributes are independent of each other, this assumption is in practice application is often not established
For example, when only naive Bayesian is used for short text classification, it only considers whether words appear in the short text, regardless of the order of words, resulting in low classification accuracy; Feature selection will make the dimension of short text vectorization too high, which will not only affect the classification accuracy, but also reduce the classification speed
[0007] The main disadvantage of the existing short text classification technology is that there is no filtering of noise words, that is, words that appear in many sentences do not help the classification itself, but only cause interference, so they should be filtered out

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method based on conditional entropy and convolution neural network
  • Short text classification method based on conditional entropy and convolution neural network
  • Short text classification method based on conditional entropy and convolution neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0035] See figure 1 As shown, the short text classification method based on conditional entropy and convolutional neural network includes the following steps:

[0036] a) Collect a certain number of short texts. It is best to make the number of short texts under each category nearly equal to form a training data set.

[0037] b) Label the training data set after manual classification, for example:

[0038]

[0039] Among them, -1 means it does not belong to this category, and 1 means it belongs to this category. A short text may neither belong to category a nor category b (noise data). Of course, it may also belong to both categories.

[0040] c) Perform word segmentation processing on the short text, assuming that the four lists obtained after the word segmentation of the four short texts are:

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text classification method based on a conditional entropy and a convolution neural network, which relates to the natural language processing field. Comprises the following steps: S1, collecting short text according to requirements to form a training data set; 2, label that training data set accord to categories; 3, perform word segmentation processing on that traindata set; S4, constructing a word vector model; S5, calculating the conditional entropy of all words; 6, construct a stop word dictionary; S7, removing words that do not conform to the conditions andhave less influence on classification; S8, vectorizing all short texts; S9, establishing a convolution neural network model; S10, inputting the vectorized training data set into the convolution neuralnetwork model; S11, continuous iteration, optimization, and finally get the best effect of short text classifier. The invention realizes the filtering of noise words and the accuracy of filtering.

Description

Technical field [0001] The invention relates to the field of natural language processing, in particular to a short text classification method based on conditional entropy and convolutional neural network, and is suitable for short text classification. Background technique [0002] Natural language processing (NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, which is the language people use daily. The purpose of natural language processing is to allow the computer to ‘understand’ what humans ‘say’ to it, and then to perform certain designated tasks. These tasks include spell checking, keyword search, intent recognition, machine translation, dia...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F16/93
CPCG06F40/289
Inventor 唐军刘楚雄
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products