Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Microblog big data hot topic multi-dimensional intelligent extraction system

A hot topic and extraction system technology, applied in electronic digital data processing, natural language data processing, unstructured text data retrieval, etc. The effect of improving accuracy, great practical value, and improving modeling effects

Pending Publication Date: 2022-08-09
张艳
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] (1) Although there are many studies on text representation models and topic extraction methods in the prior art, the topic research objects in traditional vector space model-based methods are often reports, comment articles, and long texts of forum blogs in news media. Weibo has different characteristics from the previous news media and commentary articles, which leads to major problems in the application of traditional topic extraction methods. The short text characteristics of Weibo information and the existence of text sparsity bring difficulties in semantic information processing. , the microblog data is short, and in the process of data processing by the VSM method, if the amount of text data is larger and there are more repeated words between texts, it is more conducive to the measurement and extraction of text similarity; at the same time, a large number of microblogs The existence of noisy data, such as user account information, URL links, and emoticons, will also bring difficulties to the calculation of text similarity and affect the effect of topic extraction. The hotspot extraction method of the prior art will appear larger when used on Weibo. deviation
[0007] (2) The amount of microblog information is very large, and the data is very real-time. There are high requirements for the speed and accuracy of hot topic extraction. To extract from a large amount of messy microblog information, the same topic may be different. The user's expression is quite different, resulting in massive information, and the forwarding and commenting between users bring huge data processing requirements, which brings great difficulties to the timeliness of topic extraction and algorithm speed in the existing technology. Due to the shortness of microblog text and the sparsity of data, it is difficult to calculate directly through the feature factors of ordinary text in the existing technology, and it is impossible to solve the problems of high vector space dimension and loss of semantic information in text modeling based on vector space model, resulting in the gap between words. The correlation is strong, the implicit semantic structure between words cannot be obtained, the expression of words and texts is inaccurate, and the accuracy of microblog hot topic extraction is very low
[0008] (3) Weibo has many characteristics of grassroots, originality, repetition, and explosive radial propagation. The existing technology cannot capture the important information that meets the needs in a timely and efficient manner for the massive microblog data released by a large number of users in real time. , to follow up the hot spots of Internet public opinion, there is a lack of an automatic extraction system for microblog hot topics, and a text modeling method based on short text expansion and hidden semantic calculation methods. Existing text modeling methods for microblog topic extraction have data Noise reduction, insufficient dimension reduction, semantic loss problems, lack of short text extension combined with microblog features, lack of collaborative approximate set method for microblog topic extraction, unable to realize automatic extraction and scoring display of microblog hot topics, microblog The accuracy and timeliness of blog topic extraction are poor, and the topic information is lost, which cannot meet the functional needs in reality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog big data hot topic multi-dimensional intelligent extraction system
  • Microblog big data hot topic multi-dimensional intelligent extraction system
  • Microblog big data hot topic multi-dimensional intelligent extraction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0109] In order to make the purpose, features, advantages, and innovations of the present application more obvious and easy to understand and easy to implement, the specific embodiments are described in detail below with reference to the accompanying drawings. Those skilled in the art can make similar promotions without departing from the connotation of the present application, so the present application is not limited by the specific embodiments disclosed below.

[0110] With the development of social networking and the continuous updating of mobile Internet technology, Weibo has gradually become an important platform for information display and sharing. The characteristics of micro-blog's fast dissemination speed and wide publicity make it an important source and dissemination carrier of social public opinion. Through the calculation and analysis of the reprinted and disseminated content of Weibo, the current public opinion situation in the society can be effectively gra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text modeling method based on short text expansion and a hidden semantic calculation method, and solves the problems of insufficient data noise reduction and dimension reduction and semantic loss when a text modeling method in the prior art is used for extracting microblog topics. According to the method, a short text expansion method is provided in combination with microblog characteristics, a discussion atlas tree is recombined to expand the microblog text, microblog text modeling is performed through a hidden semantic calculation method, and the dimension of a text vector is reduced under the condition that semantics are not lost; the invention provides a collaborative approximate set method for extracting microblog topics, scoring topic popularity, incorporating time data characteristics into text similarity calculation, proposing influence factors and a specific method for microblog topic popularity value calculation, generating a microblog hot topic scoring list, and calculating the microblog hot topics according to the scoring list. The accuracy and timeliness of microblog topics are improved, the loss of topic information is reduced, and multi-dimensional intelligent and accurate extraction of microblog big data hot topics is realized.

Description

technical field [0001] The present application relates to a microblog big data hot topic extraction system, in particular to a microblog big data hot topic multidimensional intelligent extraction system, which belongs to the technical field of social network hot topic extraction. Background technique [0002] With the rapid development of IT technology and the wide application of mobile Internet, Internet social networking has gradually and on a large scale entered people's life circle, deeply affecting people's way of information acquisition, social interaction and lifestyle. Due to the advantages of large user data, real-time release, strong openness and convenient interaction, Weibo has gradually become a tool used by people to obtain and publish information, share status and social interaction. Weibo has basically become one of the main sources of online public opinion. [0003] With the rapid increase in the number of Weibo users, it has become an important source of in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/216G06F40/289G06F40/194G06F16/35G06F16/36
CPCG06F40/30G06F40/216G06F40/289G06F40/194G06F16/353G06F16/367
Inventor 张艳李扬
Owner 张艳
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products