Regular auto-encoding text embedded expression method for local topic probability generation

An embedded representation and self-encoding technology, which is applied in the fields of natural language processing and machine learning, can solve problems such as the difficulty in effectively estimating the semantic features of out-of-sample text and the inability to effectively maintain smoothness, and achieve the effect of maintaining smoothness

Inactive Publication Date: 2018-08-31
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The basic problem that the existing text embedding representation method based on manifold learning needs to solve is: how to make up for the defect that is very sensitive to neighborhood judgment, and effectively maintain the smoothness of the local neighborhood text topic probability generation structure
[0040] To sum up, the existing text embedding representation methods based on manifold learning mainly establish affine mapping by maintaining the smoothness of the geometric structure of local neighboring texts. This approach is very sensitive to neighborhood judgments and cannot effectively maintain local neighboring text topics. The smoothness of the probabilistic generation structure makes it difficult to effectively estimate the semantic features of out-of-sample texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Regular auto-encoding text embedded expression method for local topic probability generation
  • Regular auto-encoding text embedded expression method for local topic probability generation
  • Regular auto-encoding text embedded expression method for local topic probability generation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with examples.

[0064] Select 20newsgroups, Amazon reviews and RCV1 public data sets, of which 20newsgroups contains 20 news discussion groups with different topics, Amazon reviews are composed of more than 1.4 million reviews about products on the Amazon website, and select relevant reviews of 10 categories of products, RCV1 has Over 800,000 manually categorized press release stories, with text for 3 subtopics selected.

[0065] In order to verify that the parametric affine mapping established by the method of the present invention can improve the smoothness of the out-of-sample text embedding representation vector and improve the effect of text clustering and classification, the K-means algorithm is used for text clustering experiments and the 1-NN algorithm is used for Te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a regular auto-encoding text embedded expression method for local topic probability generation and belongs to the field of natural language processing and machine learning. The method comprises the steps of firstly, implementing construction of a text set neighbor graph, which includes calculation of similarity weight of any text word pair, search of a maximum weighted matching distance of the text pair, calculation of the similarity of an averaged maximum weighted matching distance (NMD) and selection of a k-nearest neighbor according to an NMD result and constructionof the neighbor graph with the NMD result as an edge weight; then, constructing a sub-space through a transductive multi-agent random walk process through the neighbor graph to determine the sub-space; and finally generating a pseudo-text by use of an LDA (Latent Dirichlet Allocation) model of the sub-space, taking the pseudo-text as a regular constraint item, taking the pseudo-text and a real text as a reconfiguration object of an auto-encoding network, guiding the encoder network to confront the change of a local neighbor text topic probability generation structure so as to construct smoothaffine mapping. According to the regular auto-encoding text embedded expression method for local topic probability generation, the smoothness of the local neighbor text topic probability generation structure can be effectively kept, thereby constructing a smooth affine mapping function, enhance intra-class compactness and inter-class separation of an out-of-sample text embedded representation vector and improving application effects such as text classification and clustering.

Description

technical field [0001] The invention relates to a local topic probability generation regularized self-encoded text embedding representation method, which belongs to the fields of natural language processing and machine learning. Background technique [0002] In order to estimate and use text semantic features more easily, the text embedding representation method can construct the text embedding representation vector through a specific affine mapping, which is widely used in information processing systems involving text clustering and information retrieval. In order to maintain the smoothness of the probability generation structure of the subspace text topic, construct a smooth affine mapping function, enhance the local smoothness of the text embedding representation vector, and improve the effect of text clustering and classification, the present invention will provide a fusion local neighbor text Autoencoder Network Text Embedding Representation for Topic Probabilistic Gene...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/289G06F40/30
Inventor 潘丽敏董思佳罗森林魏超
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products