Text classification method and system

A text classification and text data technology, applied in the field of deep learning, can solve problems such as poor text classification effect, Bert model does not take into account the relationship between words, etc., to achieve good text classification effect and good fine-tuning effect

Inactive Publication Date: 2020-05-08
江苏艾佳家居用品有限公司
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the Bert model is encoded by words without considering the relationship between words, which leads to the problem that it still has poor results in text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and system
  • Text classification method and system
  • Text classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The introduction of the following technical solutions takes the commonly used THUCNews data set in the field of Chinese text classification as an example, but the model obtained by the present invention is not limited to the field of Chinese text classification, and the use of the THUCNews data set is not a feature of the present invention.

[0026] Step 1, prepare the pre-training model dataset. Download public Chinese datasets from the Internet and perform data cleaning.

[0027] In the second step, word vector encoding, text vector encoding and position vector encoding are respectively performed on the dataset data, and word vector encoding is added.

[0028] For the data set obtained above, use jieba word segmentation or other tools to segment the text, thereby increasing the word encoding vector of the model; in addition, the word encoding, text sentence encoding, and position encoding are processed in the same way as the original Bert. The resulting vector contai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method and system. According to the method, the limitation of Bert based on word coding is considered; internal relation information of characters in words is seriously lost. According to the invention, a word position coding mode is added on the basis of Bert; therefore, the obtained sentence vector is composed of a character vector, a sentence vector, a word vector and position information, sentence vector representation containing information of words is obtained, classification model training is conducted through the sentence vector training model, and finally the model is used for Chinese sentence classification.

Description

technical field [0001] The invention relates to a method for constructing a text classification model in the field of deep learning, in particular to an improved method of a Bert model and its application in text classification. Background technique [0002] The text classification problem is a classic problem in the NLP field, and it has accumulated many methods. Early text classification was mainly based on traditional machine learning methods, such as text classification based on TF-IDF. With the development of deep learning, many text classification models based on deep learning have emerged, such as Fast-Text, Text-CNN, etc. Until October 2018, Google officially released a transfer learning model Bert and achieved excellent results. It completely changed the relationship between pre-trained word vectors and downstream specific NLP tasks, and used transfer learning methods to solve problems in the NLP field. Questions have also become an important direction. [0003...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 陈旋吕成云蔡栩
Owner 江苏艾佳家居用品有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products