Chinese text categorization method based on multi-hidden-layer extreme learning machine

An extreme learning machine, text classification technology, applied in semantic analysis, computer parts, instruments, etc., to achieve fast learning speed and generalization ability, improve the effect of accuracy

Inactive Publication Date: 2017-12-08
BEIJING UNIV OF TECH
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And the extreme learning machine autoencoder (Extreme Learning Machine Auto-Encoder, ELM-AE) in the multi-hidden extreme learning machine is used to reduce the dimensionality of high-dimensional data to solve the classification problem of high-dimensional Chinese text data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text categorization method based on multi-hidden-layer extreme learning machine
  • Chinese text categorization method based on multi-hidden-layer extreme learning machine
  • Chinese text categorization method based on multi-hidden-layer extreme learning machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The following is through the Chinese corpus data of Fudan University and the attached Figure 1-4 To set forth the specific embodiment and detailed steps of the present invention:

[0026] Step 1: Data Preprocessing

[0027] The Fudan University Chinese corpus data set consists of two parts: training samples and test samples. Training samples: 9805, test samples: 9833, and the classification results are 20 different text categories. All text in the corpus needs to be converted to utf-8 format before processing. After converting the format, first use the full-mode word segmentation method under the jieba word segmentation tool to perform word segmentation processing on the training samples and test samples, and segment the sentences of the article into individual phrases and words. Then you need to use regular expressions to "denoise" the text data, including removing punctuation marks, numeric characters, and English characters in the text. Because there are many st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese text categorization method based on a multi-hidden-layer extreme learning machine. A regularization extreme learning machine model is applied to a Chinese text categorization problem, and text is categorized by means of a model of the multi-hidden-layer extreme learning machine. The Chinese corpus of Fudan University is used as a training set and testing set of text categorization; operation such as pre-processing is conducted on text data, including encoding mode unification, word segmentation, removal of stop words, symbols and figures and the like; the text is represented by means of a spatial vector model, and a data set is transformed into a text matrix; the text is categorized by means of the multi-hidden-layer extreme learning machine, wherein the process includes text dimensional reduction, characteristic mapping and text categorization. Text dimensional reduction is to transform high-dimensional text data into low-dimensional text data which can be calculated. The characteristics of the text are mapped by a multi-hidden-layer result of the multi-hidden-layer extreme learning machine, and high-level characteristic representation is conducted. The text is categorized by the regularization extreme learning machine of the multi-hidden-layer extreme learning machine.

Description

technical field [0001] The invention belongs to the field of natural language processing, and is a method for classifying Chinese text data through a multi-hidden layer extreme learning machine model. Background technique [0002] With the development of modern science and technology, the growth rate of human knowledge is accelerating year by year, and the cycle of information multiplication is getting shorter and shorter. The information produced by human beings in recent decades has exceeded the sum of information in the past few thousand years. Facing such a huge amount of information, how to quickly, accurately and comprehensively locate the information people need has become a new challenge. Prior to this, people used manual methods to classify texts, that is, arranged for professionals to divide texts into one or several categories according to the content. This manual method of text classification is more accurate, but it consumes a lot of manpower and material resour...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27G06K9/62
CPCG06F16/353G06F40/216G06F40/30G06F40/289G06F18/213
Inventor 庞皓明冀俊忠
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products