General topic-embedding-model joint-training method

A topic model and training method technology, applied in character and pattern recognition, special data processing applications, instruments, etc., can solve problems such as dependence on specific models, lack of versatility, and difficulty in improving two models at the same time

Active Publication Date: 2018-09-18
NANJING UNIV
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: In view of the problems and deficiencies in the above-mentioned prior art, the purpose of the invention is to provide a joint training method for general topic embedding models, which solves the problem that the existing model combination method relies too much on specific models and lacks versatility, making it difficult to simultaneously improve Two models etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • General topic-embedding-model joint-training method
  • General topic-embedding-model joint-training method
  • General topic-embedding-model joint-training method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0025] The present invention proposes a general topic embedding model joint training method. The topic model and the embedding model are jointly trained by means of additive regularization. In theory, each component model can be replaced with other similar models, regardless of the specificity of the model. The form avoids the drawbacks of other similar combination model methods that need to be customized for a specific model, improves the generality of the training metho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a general topic-embedding-model joint-training method. The method comprises the following steps: step 1, preprocessing an input original document corpus to obtain target text;step 2, constructing a vocabulary table for the target text; step 3, initializing network structures, initializing parameter matrices of models, and constructing a negative sampling table; and step 4,carrying out joint modeling on the topic embedding models, and training the models in a manner of multiple iterations, wherein each iteration process is divided into the following three steps: step 1, using an expectation-maximization (EM) algorithm to train a topic model part; step 2, using a stochastic-gradient-descent algorithm to train an embedding model part; and step 3, using a complete-gradient-descent algorithm to train a regularization term part. The invention can provide a general manner for jointly training the topic model and the embedding model, and problems such as the problemsthat existing model combination manners are too dependent on unique models, generality is insufficient, and two models are difficult to improve at the same time are solved.

Description

technical field [0001] The invention relates to the fields of artificial intelligence, neural network and natural language processing, in particular to a joint training method for a general topic embedding model. Background technique [0002] One of the core tasks of natural language processing is to understand the semantics of text. Many downstream tasks such as text classification, part-of-speech tagging and machine translation use it as an initialization step to improve model performance. "Look at its companion and know its meaning" This is a famous saying of J.R. Firth, a famous British linguist. In other words, if two words appear in similar contexts, they are similar. This is the famous distribution hypothesis, which drives the development of many text models. [0003] Topic model and embedding model are the two main types of models. The origin of the two models is different. The topic model is derived from Bayesian statistics. The probabilistic latent semantic analys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/214
Inventor 顾荣黄宜华赵博肖倩袁春风
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products