Theme analysis method and system based on kernel principal component analysis and LDA
A technology of core principal component analysis and analysis methods, applied in the field of text mining, can solve the problems of lack of algorithms and lack of global perspective to analyze the evolution of topic trends, etc., to achieve comprehensive and accurate analysis, reduce space complexity, and improve quality
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0070] Such as figure 1 As shown, the present embodiment provides a topic analysis method based on kernel principal component analysis and LDA, comprising the following steps:
[0071] 1) Obtain the document corpus D, and preprocess each article in the document corpus D, including deleting punctuation marks, deleting English characters, word segmentation and removing stop words, etc.
[0072] 2) According to the preprocessed document corpus D, establish a KPCA-LDA topic model, specifically:
[0073] 2.1) Extract the vocabulary of each article in the preprocessed document corpus D:
[0074] By scanning the document corpus D, mutually exclusive words in the article are added to the vocabulary in turn, and the vocabulary w of the article collection is obtained L =(w 1 ,w j ,...,w W ), where W is the vocabulary length; w j for the vocabulary w L The jth word in .
[0075] 2.2) Generate the document-term matrix of the document corpus D:
[0076] 2.2.1) Suppose there are M ...
Embodiment 2
[0139] This embodiment provides a theme analysis system based on kernel principal component analysis and LDA, including:
[0140] The data acquisition module is used to acquire the document corpus and preprocess each article in the document corpus.
[0141] The model construction module is used to establish a KPCA-LDA topic model according to the preprocessed document corpus.
[0142] The text representation determining module is used to use the established KPCA-LDA topic model to perform topic analysis on the articles in the document corpus, and determine the text representation of the articles in the document corpus.
[0143] The topic generation module is used to train and estimate the parameters of the KPCA-LDA topic model by using the Gibbs sampling algorithm, solve the parameters of the KPCA-LDA topic model, and generate several topics represented by words.
[0144] In a preferred embodiment, the model building blocks include:
[0145] A vocabulary extraction unit is u...
Embodiment 3
[0149] This embodiment provides a processing device corresponding to the topic analysis method based on kernel principal component analysis and LDA provided in Embodiment 1. The processing device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, Desktop computer etc., to carry out the method of embodiment 1.
[0150] The processing device includes a processor, a memory, a communication interface and a bus, and the processor, the memory and the communication interface are connected through the bus to complete mutual communication. A computer program that can run on the processor is stored in the memory, and the processor executes the topic analysis method based on kernel principal component analysis and LDA provided in Embodiment 1 when running the computer program.
[0151] In some implementations, the memory may be a high-speed random access memory (RAM: Random Access Memory), and may also include a non-volatile memory (non...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com