Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-class Chinese text classification method fusing global and local features

A text classification, local feature technology, applied in the field of text classification of natural language processing, can solve the problem of inability to obtain context information

Active Publication Date: 2019-12-24
BEIJING UNIV OF CHEM TECH +1
View PDF5 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although CNN can efficiently mine local semantic features of text data, and the training speed is very fast, it cannot obtain contextual information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-class Chinese text classification method fusing global and local features
  • Multi-class Chinese text classification method fusing global and local features
  • Multi-class Chinese text classification method fusing global and local features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0084] The present invention can be applied to text classification tasks on the Internet, such as public opinion analysis on e-commerce websites and text classification scenarios on news websites. According to one embodiment of the present invention, a multi-category Chinese text classification method that integrates global and local features is provided. In short, the method includes: preprocessing the text data and vectorizing the representation; using the vectorization representation Data training the text classification model that the present invention proposes; Use the model that the training completes to carry out text classification prediction, concrete process is as follows figure 1 As shown, the method includes the following steps:

[0085] Step S1, acquiring text data and performing preprocessing on the data.

[0086] The corpus data used in this experiment is to use crawler technology to crawl the comment data about the sales of ** cold medicine on a large-scale d...

Embodiment 2

[0117] The model proposed by the present invention is also applicable to long text multi-category Chinese text classification tasks. The long text data adopts the THUCT Chinese text data set released by the Natural Language Processing Laboratory of Tsinghua University. The data set has a large number of texts and many categories, including finance, lottery , real estate, stock, home furnishing, education, technology, society, fashion, current affairs, sports, horoscope, games, and entertainment. There are 14 categories in total. The basic information of the data set division is shown in Table 7. Figure 8 Shows the sentence length distribution of the experimental corpus. The comparison results of the five recurring classification models and the TBLC-rAttention model are shown in Table 8 and Table 9. Among them, Table 8 shows the overall comparison results of each model in the long text multi-classification task; Table 9 shows the results in The accuracy comparison results of e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-class Chinese text classification method fusing global and local features. The method comprises the following specific steps: obtaining text data and preprocessing thetext data; carrying out vectorization representation on the preprocessed text; obtaining context semantic features of the text by using a bidirectional long-short-term memory network with an attentionmechanism; extracting global semantic features with local semantic information on the basis of the context semantic features by utilizing a wide convolutional neural network; and inputting the finalfeature vector into a classification layer to realize text classification. According to the method, global semantic features are captured first and then local semantic features are captured step by step, so that text features can be better extracted, and the text classification precision is further improved. The method well solves the problems that in the prior art, in long text and multi-class Chinese text classification, semantic key feature extraction is difficult, and the classification effect is poor.

Description

technical field [0001] The invention relates to the technical field of text classification of natural language processing, in particular to a multi-category Chinese text classification method integrating global and local features. [0002] technical background [0003] In recent years, text data has grown rapidly through the Internet, and more and more text data have been accumulated. These massive data contain a lot of valuable information. How to efficiently mine and use this information has become a big problem. Naturally, Text classification techniques for language processing are an effective solution. It is a process of first using text data to train a classifier model, and then using the trained model to classify new text. The core is to learn a sequence representation. Text classification technology has a wide range of applications in daily life, for example, it can be used for public opinion analysis, spam SMS and email filtering, question and topic classification, r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F17/27G06N3/04G06N3/08
CPCG06F16/355G06N3/08G06N3/044G06N3/045Y02D10/00
Inventor 靳其兵薛兴荣彭文娟蔡鋈周星陈思
Owner BEIJING UNIV OF CHEM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products