The present invention discloses a word
vector generation method based on the
Gaussian distribution. The method comprises: firstly, preprocessing the corpus; secondly, using the
punctuation to performtext division on the corpus; then combining the local and
global information to infer the
word meaning, and determining the mapping relationship between the word and the
word meaning; and finally, obtaining a word vector by optimizing the objective function. The innovations and beneficial effects of the technical scheme of the present invention are as follows that: 1, words are represented based on the
Gaussian distribution, point
estimation characteristics of traditional word vectors are avoided, and more abundant information such as probabilistic quality, meaning connotation, an inclusion relationship, and the like can be brought to the word vectors; 2, multiple
Gaussian distributions are used to represent the words, so that the linguistic characteristics of a word in the
natural language can be coopered with; and 3, the similarity between the Gaussian distributions is defined based on the Hellinger distance, and by combining parameter updating and
word meaning discrimination, the number of word meanings can be inferred adaptively, and the problem that the number of hypothetical word meanings of the model in the prior art is fixed is solved.