Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Text Representation Method Using Local Embedded Topic Modeling

A text representation and topic technology, applied in the field of computer science and information retrieval, can solve the problems of not being a mapping, not being able to provide mapping functions, and not being able to transfer known data knowledge, etc., to achieve wide practicability and stable and coherent performance

Inactive Publication Date: 2020-12-08
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0029] However, for these three models, none of them can provide an explicit mapping function to effectively transfer the knowledge of known data to unknown data, which is neither a probability density nor a mapping for out-of-sample points

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Representation Method Using Local Embedded Topic Modeling
  • A Text Representation Method Using Local Embedded Topic Modeling
  • A Text Representation Method Using Local Embedded Topic Modeling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be described in further detail below in conjunction with the accompanying drawings and examples.

[0075] In the experiment, two widely used English text classification corpora (20newsgroup, RCV1) were used to test the invention. 20newsgroup consists of 20 associated newsgroups, containing a collection of 20,000 texts. RCV1 is a large-scale multi-class dataset, which is an archive of more than 800,000 human-classified newswire stories obtained by Reuters. We extracted four types of texts: M11 (equity investment market), M12 (bond market), M131 (international banking market) and M132 (foreign exchange market). Table 1 shows some statistics about these datasets. Table 1 shows some statistics about these datasets.

[0076] Table 1 Statistics of the 2 corpora, D is the total number of texts. W is the vocabulary size, is the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text representation method for modeling through a local embedding topic, and belongs to the technical field of computer science and information retrieval. The method comprises the steps that a neighbor set is selected according to the Euclidean distance of a text in the word space, a local weighting regular term is constructed, the regular term is added into a traditional self-encoding network for training to obtain a model, then, a display mapping function is constructed, and finally vector representation of an out-of-sample document is extracted by means of the encoding network. The local weighting regular term is introduced, and the internal inherent geometric structure of the space where the text is located is effectively kept. In addition, explicit embedding mapping between the observation space and low-dimensional manifold is generated, and the simple method convenient to implement can achieve embedding vector representation extraction from the out-of-sample document.

Description

technical field [0001] The invention relates to a text representation method using local embedding topic modeling, which belongs to the technical field of computer science and information retrieval. Background technique [0002] In recent years, the rapid development of the network has led to a rapid increase in the proportion of text information, which puts forward higher requirements for information retrieval technology. As the key technology of information retrieval, text representation is of great significance to improve the effective acquisition of information. Text representation is to transform the unstructured document collection containing a large number of characters into a semi-structured or structured data structure, which is convenient for computers to use clustering and classification techniques for information retrieval. The classic text representation method is the space vector model (VSM), which takes all the words that make up the document collection as fe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/35
CPCG06F16/3334G06F16/35
Inventor 罗森林刘望桐潘丽敏毛炎颖魏超
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products