A Multi-Label Long Text Classification Method Introducing Multiple Choice Fusion Mechanism

A technology of multi-way selection and classification method, applied in the field of multi-label long text classification with the introduction of multi-way selection fusion mechanism, to achieve the effect of short training, improved recall rate, and efficient feature extraction ability

Active Publication Date: 2021-05-25
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the deficiencies in the prior art, the present invention provides a multi-label long text classification method that introduces a multi-way selection fusion mechanism, which solves the problem of optimizing the accuracy, recall and F1 equivalent of the generated label sequence model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Multi-Label Long Text Classification Method Introducing Multiple Choice Fusion Mechanism
  • A Multi-Label Long Text Classification Method Introducing Multiple Choice Fusion Mechanism
  • A Multi-Label Long Text Classification Method Introducing Multiple Choice Fusion Mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Such as Figure 1-4 Shown:

[0031] For the 3 million training data set released by a machine learning challenge, the title data and description data are spliced ​​to obtain long text data. For data without description, a copy of the question is used as a description. Then, 200,000 data are divided into 200,000 as a verification set, 200,000 as a test set, and the remaining 2.6 million as a training set.

[0032] After the data is removed from low-frequency words, the vocabulary required by the encoder is established, and the vocabulary of the category labels required by the decoder is established. The sequence start symbol is added in front of the label sequence to obtain the input of the decoder, and the label sequence is followed by adding The sequence end symbol gets the output of the decoder, such as for the input long text x 1 、x 2 ...x n , labeled as l 1 , l 2 ,...,l n' , the starting symbol of the sequence is , the end symbol of the sequence is , then t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-label long text classification method introducing a multi-way selection fusion mechanism, and relates to the technical field of multi-label long text classification based on sequence-to-sequence architecture. The invention improves the effect of completing multi-label long text classification based on the sequence-to-sequence architecture. Based on the data released by a machine learning challenge, the title data and description data are spliced ​​to obtain long text data. For data without description, copy a copy of the question as Describe, and then preprocess the data to remove low-frequency words to obtain more effective data. The obtained data uses a converter model that incorporates a multi-way selection fusion mechanism to generate a tag sequence for the input long text, and effectively removes redundancy during decoding. information. Under the test data, the tag sequence generated by the model has a recall rate of 0.5% compared with the model without multiple selection fusion; the precision rate and F1 value have increased by 1 percentage point.

Description

technical field [0001] The invention relates to the technical field of multi-label long text classification based on sequence-to-sequence architecture, in particular to a multi-label long text classification method introducing a multi-way selection fusion mechanism. Background technique [0002] In the process of studying multi-label long text classification based on sequence-to-sequence architecture. Attention mechanism, the attention mechanism in deep learning is modeled on the human visual attention mechanism, according to the need to focus on a certain part of the input sequence each time, instead of paying attention to all at once. The attention mechanism has been widely used in the field of natural language processing. The attention mechanism is divided into hard attention and soft attention. The soft attention mechanism assigns an attention weight to each part of the sequence. To calculate the attention weight, first calculate the distribution of each part of the se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 屈鸿秦展展侯帅黄鹂张晓敏
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products