Kalman filter word vector learning method based on Diesel process

A Kalman filter, Kalman filter technology, applied in complex mathematical operations, character and pattern recognition, instruments, etc., can solve the curse of dimensionality, can not well describe the similarity of words and words and other problems

Active Publication Date: 2021-07-20
HRG INT INST FOR RES & INNOVATION
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this kind of word vector representation has two disadvantages: (1) it is easily troubled by the curse of dimensionality, especially when it is used in some algorithms of Deep Learning; (2) it cannot describe the relationship between words well. similarity
However, the existing technology has not applied the Diesel process to the word vector representation of natural language processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Kalman filter word vector learning method based on Diesel process
  • Kalman filter word vector learning method based on Diesel process
  • Kalman filter word vector learning method based on Diesel process

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] Embodiments of the present invention provide a Diesel process-based Kalman filter word vector learning method. The present invention assumes that the process noise and measurement noise of the system obey the Diesque distribution, and then can calculate the Diesque posterior distribution, and then use MCMC( Monte Carlo sampling algorithm) sampling algorithm to sample, get the candidate cluster with the highest selection probability, put it into the LDS model and train the model parameters, and finally input the preprocessed corpus into the trained language model, and use the Kalman filter A one-step update formula computes an estimate of the underlying vector representation.

[0069] Attached below figure 1 The technical solution of the present invention is described in detail.

[0070] Firstly, training and preprocessing are performed on the corpus, including word segmentation processing and dictionary generation. This is a well-known processing for word vector learni...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A Kalman filter word vector learning method based on Diesel process, the method includes: training and preprocessing the corpus, generating an LDS language model system, initializing the system parameters, assuming that the process noise satisfies a normal distribution, defining the aggregation class theta t =(μ t ,∑ t ), μ t For the frequency of word t in the corpus, calculate θ t The prior distribution of Dirichlet, the posterior distribution is calculated by Kalman filter derivation and Gibbs sampling estimation, the candidate clusters are extracted by MCMC sampling algorithm, the selection probability of the candidate clusters is calculated, and the candidate with the highest probability value is selected Choose the cluster as θ t , calculate the estimated value of the minimum mean square error of the clustering, substitute the calculation result into the LDS language model, train the model through the EM algorithm, make the model parameters stable, input the preprocessed corpus into the trained LDS language model, and pass Carl The Mann filter updates the formula in one step to compute the implicit vector representation.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular, to a method for learning word vectors of Kalman filter based on Diesel process. Background technique [0002] In natural language processing (NLP) related tasks, in order to hand over natural language to algorithms in machine learning, it is usually necessary to mathematicize the language first, because machines only recognize mathematical symbols. Vectors are things that people abstract from the natural world and hand them over to machines for processing. Word vectors are a way to mathematicize words in language. [0003] One of the simplest word vector representations is One-hot Representation, which is to use a very long vector to represent a word. The length of the vector is the size of the dictionary. The vector component has only one 1, and the others are all 0 and 1 positions. corresponds to the position of the word in the dictionary. However, this kind of word v...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/247G06K9/62G06F17/16
CPCG06F17/16G06F40/247G06F18/2321
Inventor 王磊翟荣安刘晶晶王毓王飞于振中李文兴
Owner HRG INT INST FOR RES & INNOVATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products