Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Classification method of information flow material creative text

A technology of information flow and text, which is applied in text database clustering/classification, digital data information retrieval, unstructured text data retrieval, etc. It can solve the problems of increased classification time, decreased classification performance, and large amount of calculation, etc., to achieve Improve accuracy, reduce time complexity, reduce the effect of time complexity

Pending Publication Date: 2019-03-26
广东原昇信息科技有限公司
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the text classification method based on classical KNN is simple and effective, and it is one of the methods with the best classification effect, but it also has some obvious disadvantages: First, when determining the category of the text to be classified, it is necessary to calculate its relationship with all samples in the training sample set. The similarity, and then select the top k samples with the highest similarity. In general, the training samples for text classification are often large in scale. Since the similarity with the text to be classified is calculated on tens of thousands of training samples, As the number of training samples increases, the classification performance will drop quickly; second, it is a lazy text classification learning method, which requires a large amount of calculation and consumes more time when classifying test samples. As the scale increases, the classification time increases sharply, causing the classification time to be nonlinear; thirdly, the KNN algorithm must specify a value of k, and how to determine the number of neighbors of the text to be classified is still lacking a better and widely applicable method. The selection of k Plays a very important role in category determination, if k is too large or too small, it will reduce the accuracy of text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The present invention will be further explained below:

[0043] A classification method for creative text of information flow materials, the method is as follows:

[0044] 1. Extract frequent feature word sets and their associated training texts based on association analysis:

[0045] Step1.1: Set the total number of text categories as m and the category as c 1 ,c 2 ,...,C m , The number of training samples in each category is denoted as N 1 ,N 2 ,...,N m ; Preprocess the text in the training set, using χ 2 Statistical method, select a certain number of texts in each category in the training set, denoted as N f Characteristic word

[0046] Step1.2: Scan all training texts, and represent each text as m·N composed of feature words of all categories f Dimensional text vector, using TF-IDF and χ 2 The feature evaluation function of the statistical method calculates the feature weight, and the weight is set to: TF-IDF * Based on χ 2 Characteristic evaluation value;

[0047] Step1.3: E...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a classification method of information flow material creative text, includes extracting frequent feature word sets and their associated training text based on association analysis and using the association analysis results, The present invention greatly reduces the time complexity and improves the accuracy of classification, improves the determination of the number of the nearest neighbors, and greatly reduces the time complexity of classification.

Description

Technical field [0001] The invention relates to the field of text classification, in particular to a method for classifying creative texts of information flow materials. Background technique [0002] With the rapid development of network information technology, Internet information resources are showing an exponential growth trend, and text is the most basic information carrier, and its classification technology has become a hot spot in modern information processing. At present, the commonly used text classification algorithms are: Naive Bayes, Support Vector Machine, Neural Network, Decision Number, K-Nearest Neighbor and other methods. Among them, the text classification method based on classic KNN is simple and effective, and it is one of the best classification methods, but it also has some obvious shortcomings: first, when determining the type of text to be classified, it needs to be calculated with all samples in the training sample set. After that, select the top k sample...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/332
Inventor 林正春姜允志贾西平
Owner 广东原昇信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products