Chinese text categorization method based on multi-hidden-layer extreme learning machine

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An extreme learning machine, text classification technology, applied in semantic analysis, computer parts, instruments, etc., to achieve fast learning speed and generalization ability, improve the effect of accuracy

Inactive Publication Date: 2017-12-08

BEIJING UNIV OF TECH

View PDF4 Cites 21 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

And the extreme learning machine autoencoder (Extreme Learning Machine Auto-Encoder, ELM-AE) in the multi-hidden extreme learning machine is used to reduce the dimensionality of high-dimensional data to solve the classification problem of high-dimensional Chinese text data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] The following is through the Chinese corpus data of Fudan University and the attached Figure 1-4 To set forth the specific embodiment and detailed steps of the present invention:

[0026] Step 1: Data Preprocessing

[0027] The Fudan University Chinese corpus data set consists of two parts: training samples and test samples. Training samples: 9805, test samples: 9833, and the classification results are 20 different text categories. All text in the corpus needs to be converted to utf-8 format before processing. After converting the format, first use the full-mode word segmentation method under the jieba word segmentation tool to perform word segmentation processing on the training samples and test samples, and segment the sentences of the article into individual phrases and words. Then you need to use regular expressions to "denoise" the text data, including removing punctuation marks, numeric characters, and English characters in the text. Because there are many st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Chinese text categorization method based on a multi-hidden-layer extreme learning machine. A regularization extreme learning machine model is applied to a Chinese text categorization problem, and text is categorized by means of a model of the multi-hidden-layer extreme learning machine. The Chinese corpus of Fudan University is used as a training set and testing set of text categorization; operation such as pre-processing is conducted on text data, including encoding mode unification, word segmentation, removal of stop words, symbols and figures and the like; the text is represented by means of a spatial vector model, and a data set is transformed into a text matrix; the text is categorized by means of the multi-hidden-layer extreme learning machine, wherein the process includes text dimensional reduction, characteristic mapping and text categorization. Text dimensional reduction is to transform high-dimensional text data into low-dimensional text data which can be calculated. The characteristics of the text are mapped by a multi-hidden-layer result of the multi-hidden-layer extreme learning machine, and high-level characteristic representation is conducted. The text is categorized by the regularization extreme learning machine of the multi-hidden-layer extreme learning machine.

Description

technical field [0001] The invention belongs to the field of natural language processing, and is a method for classifying Chinese text data through a multi-hidden layer extreme learning machine model. Background technique [0002] With the development of modern science and technology, the growth rate of human knowledge is accelerating year by year, and the cycle of information multiplication is getting shorter and shorter. The information produced by human beings in recent decades has exceeded the sum of information in the past few thousand years. Facing such a huge amount of information, how to quickly, accurately and comprehensively locate the information people need has become a new challenge. Prior to this, people used manual methods to classify texts, that is, arranged for professionals to divide texts into one or several categories according to the content. This manual method of text classification is more accurate, but it consumes a lot of manpower and material resour...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06F17/27G06K9/62

CPCG06F16/353G06F40/216G06F40/30G06F40/289G06F18/213

Inventor 庞皓明冀俊忠

Owner BEIJING UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Chinese text categorization method based on multi-hidden-layer extreme learning machine

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology