Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for optimizing word classification in machine learning text

A machine learning, text technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve problems such as custom key word classification

Inactive Publication Date: 2017-02-22
G CLOUD TECH
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem solved by the present invention is to provide a method for optimizing word classification in machine learning texts, and solve the classification problem of self-defined key words in current text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for optimizing word classification in machine learning text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] Such as figure 1 As shown, on the basis of the traditional machine learning text classification method, the present invention utilizes a feature selection ruler based on regular expressions to filter out self-defined features related to semantics, and in the user-defined training data after feature selection Classification categories, and then use these features and categories to carry out classification training according to the naive Bayesian model; when the training is completed, in the application stage, if there is a sentence that meets the feature selection ruler in the text that needs to be classified, combined with the already The trained model completes the classification task.

[0021] Feature selectors are based on regular expressions, and a wildcard in a custom regular expression represents a feature value. For example: "." in ".*[xyz]+" can represent a specific feature, similar to: "The words that meet the regular rules here are all country names" or "The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of data processing and machine learning classification, in particular to a method for optimizing word classification in machine learning text. The method comprises the steps that on the basis of text classification, self-defined and semantically related features are filtered out through a feature selection regulator based on a regular expression, a user customizes classification types in training data after feature selection, and classification based training is conducted by means of the features and the types according to a naive Bayesian model; after training is completed, in the application stage, if statements conforming to the feature selection regulator exist in text needing word classification, classification is completed by combining a trained model. According to the method, the capacity of the model for processing work classification is not limited in word data in a training sample; the method can be applied to optimization and application of machine learning text work classification and derivation functions thereof.

Description

technical field [0001] The invention relates to the fields of data processing and machine learning classification, in particular to a method for optimizing word classification in machine learning texts. Background technique [0002] With the rapid development of information technology, the amount of information in modern society is growing explosively. In the era of big data, how to make good use of massive data and dig out truly valuable information has become a hot spot of social concern. The role of machine learning in data mining is becoming more and more obvious. In terms of natural language processing and text classification, machine learning solves problems by using statistical methods instead of traditional rule customization methods. It has been proved by practice that this This approach works well and is more efficient. On the basis of text classification, it is necessary to further classify each word and keyword in the text and extract the required keyword inform...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F18/24155
Inventor 郭宇李永波季统凯
Owner G CLOUD TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products