Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A biomedicine technology subject mining method based on LDA,

A technology of biomedicine and topics, applied in the field of information retrieval, can solve the problems of low efficiency and inflexibility, the division of technical topics cannot meet the analysis requirements, and the analysis of technical topics cannot go deep into the content of patent texts, etc., to achieve the effect of reducing sparsity

Inactive Publication Date: 2019-03-08
KUNMING UNIV OF SCI & TECH
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The technical subject analysis through the IPC classification code cannot go deep into the content of the patent text, and at the same time, the division of the technical subject by the IPC classification code often cannot meet the analysis requirements
Although manual indexing of patent documents is used to analyze technical topics with high accuracy, it requires analysts to have a strong background in the technical field, and the efficiency is low and inflexible [59] , in the face of massive patent documents, it seems stretched

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A biomedicine technology subject mining method based on LDA,
  • A biomedicine technology subject mining method based on LDA,
  • A biomedicine technology subject mining method based on LDA,

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be further described below in combination with the accompanying drawings and specific embodiments.

[0037] A method for mining biomedical technology topics based on LDA, the specific steps of the method are as follows:

[0038] Step1. Construct LDA topic model;

[0039] Step1.1. Construct document layer, topic layer, and vocabulary layer, where topic is the multinomial probability distribution of documents, and vocabulary is the multinomial probability distribution of topics;

[0040] Step2, LDA parameter estimation;

[0041] Step2.1, using Gibbs random sampling to solve the intermediate parameters of the LDA topic model;

[0042] Step2.2, parameter estimation of the distribution;

[0043] Step3. The evaluation function Perplexity determines the semantic theme parameter K;

[0044] Step4. Calculate each document d i A probability value p on all Topics.

[0045] Described step Step1.1 constructs document layer, theme layer, the specific ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a biomedicine technology subject mining method based on LDA, belonging to the technical field of information retrieval. The method of the present invention first employs an LDA to view a document as a combination of vectors of a contained word, then the number of semantic topics K is determined by using the evaluation function Perplexity; and finally, a probability p of each document di on all Topics is calculated, and two matrices, one doc-Topic matrix and the other word-Topic matrix, are obtained, so that LDA projects documents and words onto a set of Topics, tryingto find out the potential relationship between documents and words, documents and documents, words and words. LDA is an unsupervised algorithm, and every Topic does not require a specified condition.However, after clustering, the probability distribution of the words on each Topic is calculated, and the words with high probability on the Topic can describe the meaning of the Topic very well.

Description

technical field [0001] The invention relates to an LDA-based biomedical technology theme mining method, which belongs to the technical field of information retrieval. Background technique [0002] Before data mining technology was extended to patent document mining, the technical subject analysis method mainly reflected the technical subject through the IPC classification code and classified the subject through manual reading of the patent document. The analysis of technical subjects through IPC classification symbols cannot go deep into the content of patent texts, and at the same time, the division of technical topics by IPC classification symbols often cannot meet the analysis requirements. Although manual indexing of patent documents is used to analyze technical topics with high accuracy, it requires analysts to have a strong background in the technical field, and the efficiency is low and inflexible [59] , it seems stretched when faced with massive patent documents. N...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F17/27
CPCG06F40/30
Inventor 姜迪叶波马军
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products