The invention provides a short text topic identification method and
system, and relates to the technical field of
data processing. The method comprises the following steps of S1, obtaining a first corpus set and a second corpus set, wherein the first corpus set is a short text
data set to be processed, and the second corpus set is an auxiliary corpus set; S2, obtaining a hidden
feature vector based on words on the second corpus set, and constructing a
Dirichlet process hybrid model based on the first corpus set; S3, constructing a non-parameter theme model based on the implicit feature vectorand the
Dirichlet process hybrid model; S4, performing parameter
inference on topic posterior distribution of the non-parameter
topic model; S5, inferring and identifying the number of topics in the first corpus set based on the parameters, and obtaining the document-topic distribution and the topic-word distribution in the first corpus set at the same time. According to the method, the Dirichletprocess
hybrid model and the implicit
feature vector representation of the introduced words are constructed, so that the sparsity problem can be effectively relieved, and the accuracy of short text topic identification is improved.