Short text feature extraction method based on multi-feature factor fusion

A feature extraction and short text technology, which is applied in the field of short text feature extraction based on the fusion of multiple feature factors, can solve the problems of not considering the front and rear positions and its own part of speech features

Pending Publication Date: 2019-07-05
NORTHWEST UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, this method does not take into account the front and rear positions of words and their own part-of-speech features, and there are certain important trade-offs between words that appear in the front position and words that appear in the back position, so that some errors are inevitable in the process of feature extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text feature extraction method based on multi-feature factor fusion
  • Short text feature extraction method based on multi-feature factor fusion
  • Short text feature extraction method based on multi-feature factor fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The technical method in this application will be described in detail below in conjunction with the accompanying drawings.

[0046] Aiming at the shortcomings of the TF-IDF algorithm, the present invention introduces the feature word position influence factor and the part-of-speech feature factor to improve the TF-IDF algorithm, and proposes a short text feature extraction method based on the fusion of multiple feature factors to improve Problems such as weight imbalance that occur during the weight calculation process of the feature words by the TF-IDF algorithm. For ease of understanding, the present invention describes the specific implementation content in a hypothetical form.

[0047] First, we need to introduce the traditional TF-IDF algorithm. The TF-IDF algorithm uses the ratio of the number of times a feature word appears in document d to the number of documents containing the feature word as the weight of the word. The importance of words in a particular docum...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a short text feature extraction method based on multi-feature factor fusion, which comprises the following steps of: carrying out word segmentation and stop word removal processing on short text comments through a conjunctive word segmentation tool so as to construct a preliminary text feature word vector matrix; combining a traditional TF-IDF algorithm to carry out weightcalculation on the constructed feature word vector matrix by using an IDF obtain a weight vector matrix; introducing a feature word position influence factor and a part-of-speech feature factor, carrying out part-of-speech tagging on the preliminary text feature words one by one, and calculating the sum value of each feature word; multiplying the obtained sum values with a weight value corresponding to the conventional TF-IDF algorithm, to finally obtain a weight vector matrix of the optimized TF-IDF algorithm. According to the technical scheme provided by the invention, a word weight imbalance problem of the traditional TF-IDF algorithm can be solved to a certain extent, so that the text characteristic extraction accuracy is improved, and effective help is provided for emotion classification tasks.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a short text feature extraction method based on the fusion of multiple feature factors. Background technique [0002] With the advancement of the Web3.0 era, Internet information has been increasingly integrated into people's lives. A large number of users express their opinions on an event or product on the Internet, and these comment information will greatly affect people's thinking and behavior under the time effect. At the same time, these comment information includes people's various emotional attitudes and emotional information, such as happiness, anger, sadness, joy, sadness or positive, neutral, and negative. Based on these comment information, other users can learn about group users' comments and opinions on an event or product through the network platform, so this information has huge potential mining value. In addition, during the rapid development of the Interne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/332G06F17/27
CPCG06F40/284
Inventor 高岭周俊鹏马景超何丹王文涛高全力
Owner NORTHWEST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products