Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Text TF-IDF feature reconstruction method combined with emotion intensity

A technology of TF-IDF and emotional intensity, which is applied in the classification field in natural language processing, can solve problems such as information confusion, information inaccuracy, and failure to retain, and achieve the effect of avoiding information confusion

Active Publication Date: 2019-08-06
TONGJI UNIV
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The Internet language represented by Weibo contains special language components such as expressions and user names. Existing methods do not deal with them, resulting in information confusion; elements such as negative words, degree adverbs, and repeated words in Chinese texts will directly affect the text. The emotional intensity and polarity of the existing method cannot retain this information, resulting in inaccurate information; in the test set and actual application, some new words that are not in the training set will be discarded by the existing method, resulting in loss of information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text TF-IDF feature reconstruction method combined with emotion intensity
  • Text TF-IDF feature reconstruction method combined with emotion intensity
  • Text TF-IDF feature reconstruction method combined with emotion intensity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0044] A text TF-IDF feature reconstruction method combined with emotional strength, first constructing a disabling dictionary, a degree dictionary and a negative dictionary. Stop words mainly include English characters, numbers, mathematical characters, punctuation marks, and single Chinese characters with high frequency, such as "you", "I", "because", "and" and so on; the degree dictionary includes a series of used The adverbs that modify the intensity of adjectives and adverbs and their corresponding emotional intensity, select the "Dictionary of Degree Level Words" released by HowNet in 2007 as the degree dictionary, and divide the degree adverbs into "extremely" , "Super", "Very", "Compare", "Slightly", and "Under", corresponding to six levels of 1.7, 1.5, 1.3, 1.1, 0.8 and 0.5 respectively; the negative dictionary contains "not", Common negative words such as "no", "non" and "weiwei". Such as figure 1 As shown, this method includes the following processes:

[0045] 1....

example 1

[0086] Example 1: @今天also like miuky you look so good today [happy][happy]

[0087]

[0088]

[0089] Compared with the traditional method, the user name "@今天also like miuky" and the expression "[happy]" are correctly segmented.

example 2

[0090] Example 2: I don't like driving

[0091] method Feature vector traditional method Drive: 0.89 Likes: 0.45 The method proposed by the present invention Like: -0.46 Drive: -0.89

[0092] Compared with the traditional method, the reconstructed feature vector preserves the information of the negative word "no".

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text TF-IDF feature reconstruction method combined with emotion intensity. According to the present invention, the expressions and the user names are extracted and segmentedthrough a regular matching method, the word intensity is corrected according to an intensity dictionary and the position relation of the negative words, the degree auxiliary words and the repeated words, and the new words are replaced through a synonym replacement method based on Word2Vec, so that the TF-IDF feature vectors of the text are reconstructed. Compared with the prior art, the TF-IDF features of the words are corrected by considering the conditions of negative words, degree auxiliary words, repeated words and the like, the information, such as the strength, positions, etc., of the words is reserved, the new words on the test set are replaced with the mature words appearing in the training set to enhance the generalization performance, and when the method is used, an original sentence can be directly used as input, and the manual word segmentation is not needed.

Description

technical field [0001] The invention belongs to the classification field in natural language processing, and relates to a text classification preprocessing method, in particular to a text TF-IDF feature reconstruction method combined with emotional intensity. Background technique [0002] In the current natural language processing and machine learning fields, Term Frequency–Inverse Document Frequency (TF-IDF) is commonly used to construct the feature vector of the text. The Internet language represented by Weibo contains special language components such as expressions and user names. Existing methods do not deal with them, resulting in information confusion; elements such as negative words, degree adverbs, and repeated words in Chinese texts will directly affect the text. The emotional intensity and polarity of the existing method cannot retain this information, resulting in inaccurate information; in the test set and actual application, some new words that are not in the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/36G06F16/33G06F17/27
CPCG06F16/374G06F16/36G06F16/3344G06F40/284G06F40/247Y02D10/00
Inventor 邓修齐康琦张量
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products