Online theme modeling method on basis of theme heredity

A Topic Modeling, Topic Technique

Active Publication Date: 2014-05-14
SICHUAN UNIV
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the weight vector ω of OLDA δ It is a fixed value and cannot be adjusted according to the dynamic changes of the theme
Moreover, each topic in the same time slice shares the same weight, and it is difficult to set this value
If the weight setting is too small, the front and back topics cannot be aligned; if the weight setting is too large, the influence of historical data will be too high, resulting in some topics that are not the same event being forced to align together due to the appearance of co-words
Especially when a new topic appears in the t time slice, the topic is easily mixed with an old topic and aligned with the related topic in t-1, making it difficult to detect the new topic
In addition, OLDA maintains an incrementally updated vocabulary, and new words in each time slice are added to the vocabulary, which will eventually cause memory overflow due to the large vocabulary, and the increase in processing dimensions will increase the running time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online theme modeling method on basis of theme heredity
  • Online theme modeling method on basis of theme heredity
  • Online theme modeling method on basis of theme heredity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] 1) Let time slice t n For the current time slice, grab the time slice t n middle The data of a text, the different words in the text form the vocabulary Among them, n=1, 2, 3, . . . The time slice size can be set according to actual needs, such as 24 hours. Text data can be anything from forum posts, blog posts, news, etc.

[0040] 2) Perform topic modeling according to the LDA model to get text The text-topic distribution vector of and theme The topic-word distribution vector of Among them, m is the text serial number, k is the serial number of the topic, k=1, 2,..., KK is the total number of topics; K is specified by the user, and its value remains unchanged in each time slice; text-topic distribution vector is a K-dimensional vector, topic-word distribution vector yes dimension vector, representation vocabulary the size of.

[0041] 3) Calculate the topic strength, the calculation steps are as follows:

[0042] a) Calculated text The text-t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online theme modeling method on the basis of theme heredity. The online theme modeling method includes the steps of: capturing text data of a current time slice, performing theme modeling according to an LDA (latent dirichlet allocation) model, computing theme strength, ranking a theme, computing a gene of the theme, capturing text data of the next time slice, converting distribution vectors of theme-vocabulary, computing prior parameters of Dirichlet distribution of the next time slice, adopting a Gibbs sampling method and the like. The online theme modeling method has the advantages that 1, an online theme model is suitable for processing of time-sequential text streams and can be applied to a public opinion monitoring system greatly; 2, alignment features of the theme in an OLDA (online latent dirichlet allocation) model are reserved, different genes are set for the themes according to the theme strength, and the defects that the themes are mixed and new themes are not detected timely are overcome; 3, scores of broad themes can be effectively lowered by the aid of a theme strength computing method.

Description

technical field [0001] The invention relates to a discovery and evolution analysis method of network hot topics, in particular to an online topic modeling method based on topic inheritance. Background technique [0002] Topic discovery and evolution analysis for emerging media such as forums and microblogs is a current research hotspot. It can provide support for the discovery and prediction of public opinion, which is conducive to the timely disposal of public opinion and the maintenance of social stability. In recent years, topic models represented by LDA (Latent Dirichlet Allocation) have been intensively studied. Due to the topic modeling ability of LDA itself, it has inherent advantages in the field of topic evolution. Some extended models based on LDA were subsequently proposed, such as the TOT (Topic Over Time) model that reflects the temporal intensity changes of topics, the DTM (Dynamic Topic Model) model that uses state space to record topic content and intensity ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355G06F16/36
Inventor 陈兴蜀吴小松王文贤杜敏
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products