Chinese text classification method based on MPI (Message Passing Interface) and adaboost.MH

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text classification and Chinese technology, applied in character and pattern recognition, special data processing applications, instruments, etc., can solve the problems of long training set time and a lot of time, and achieve the effect of improving time efficiency and shortening time

Inactive Publication Date: 2017-08-25

CHONGQING UNIV OF POSTS & TELECOMM

View PDF3 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] Aiming at the problem that it takes a long time to build a training set for massive data and it takes a lot of time to train the classification model using the Adaboost.MH algorithm, the present invention uses the combination of MPI and adaboost.MH to propose a parallel text classification based on MPI and Adaboost.MH method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] The present invention will be further described below in conjunction with the accompanying drawings.

[0028] Such as figure 1 As shown, the present invention includes the following 5 steps.

[0029] 1. Text preprocessing: collect Chinese text files in different fields through web crawlers and search network information, and perform word segmentation processing on the collected Chinese text files. You can use open source word segmentation packages such as IK and ICTCLAS to perform Chinese word segmentation on the collected texts, and then remove punctuation marks and stop words. Stop words are words that appear very frequently but have no practical meaning, such as "Le", " of", "and" and so on. The entry after word segmentation is separated and saved into the local training set data as a preliminary feature.

[0030]2. Feature selection: Preliminary features are selected by using the mutual information method. First use the MPI_Init function to start p processes, ob...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a Chinese text classification method based on an MPI (Message Passing Interface) and adaboost.MH, and is used for solving the problem of long total Chinese text classification time caused by long adaboost.MH training time when a data size is large. The method comprises the following steps that: storing a Chinese text subjected to word segmentation processing into a training data set; then, combining a mutual information method with the MPI to realize feature word selection; then, carrying out reduction summation on all processes through an MPI_Reduce function in the MPI so as to obtain a similarity, and selecting a feature word according to the similarity; according to whether the feature word selected from the Chinese text contained in each process is in the presence or not, endowing the feature word with a weight by each process; and according to the communication function of the MPI, integrating process calculation results to obtain a text classification model, and utilizing the classification model to classify Chinese texts to be classified. By use of the Chinese text classification method, Chinese text classification time is greatly shortened.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a Chinese text classification method based on MPI and adaboost.MH. Background technique [0002] Text classification is the process of dividing texts into relevant categories according to the information content when the category system of the text is known. With the advancement of science and technology, the development of society, the popularization of computers and the advent of the Internet age, the number of network texts is increasing rapidly, and the text classification task presents new characteristics: 1. A large number of new texts that need to be classified are generated every day. Text, these data are usually more than terabytes. Second, the categories of texts show diversity, that is, a text can belong to multiple categories, for example, the same text can belong to history, politics, technology, etc. [0003] Traditional single-label classification methods suc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F17/27G06K9/62

CPCG06F16/355G06F40/284G06F18/22G06F18/2411G06F18/2431

Inventor 王进高延雨李颖李航余薇高选人邓欣陈乔松胡峰

Owner CHONGQING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Chinese text classification method based on MPI (Message Passing Interface) and adaboost.MH

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology