Domain text theme extraction method

A topic and text technology, applied in the field of text topic extraction, can solve problems such as low efficiency and waste of human resources, and achieve the effect of reducing excessive overlap

Active Publication Date: 2021-05-25
HARBIN ENG UNIV
View PDF15 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

In this new way we propose adding an extra layer called Auditorium (A) onto our models' baseline structure based on their 3 -dimensional data. These layers are organized into 4 different types: ① A Gaussian mixture representation where each type represents various attributes such as age or income levels; (2) An ensemble with many variables representing these attribute values together like location, time, etc.; (3) Multiple distributions of those same attributes from varying sources within the dataset used during training process. By analyzing the properties of these distributions, it can help identify potential issues related to risk assessment tools.

Problems solved by technology

This patented technical problem addressed by this patents relates to manual organization or analyzing data collected over time on networks like bankruptcy checks for compliance purposes without efficiently collecting relevant information that could be useful later when needed again.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain text theme extraction method
  • Domain text theme extraction method
  • Domain text theme extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0083] This embodiment provides a method for extracting text topics of policies and regulations in the field of four insurances and one housing fund. This embodiment extracts the subject words of policies and regulations in the field of four insurances and one housing fund by adding clustering technology and semantic web of words on the basis of LDA topic model, and then uses clustering technology and semantic web of words to summarize the text of policies and regulations Subject headings with audit significance.

[0084] The topics obtained by the LDA topic model are given in the form of probability distribution, but the LDA topic model does not consider the potential semantics of keywords, so this embodiment adds a semantic web model to the LDA topic model to solve this problem. In this embodiment, clustering is first added to the model to distinguish texts with differences in subject content, and an index for evaluating the importance of keywords is added to improve the rep...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of text topic extraction, and particularly relates to a domain text topic extraction method. An LDA topic model in a statistical learning method is applied, an auditing method layer is added on the basis of a three-layer Bayesian network of the LDA topic model, and a four-layer Bayesian network is formed. The model considers that a text is composed of multi-term distribution of an auditing method, and the auditing method is composed of multi-term distribution of a subject. The method comprises the following steps of: firstly, respectively generating multi-term distribution of an auditing method, a text topic and a word, then distributing parameters by taking Dirichlet distribution as multi-term distribution of the topic, multi-term distribution of the auditing method and multi-term distribution of the word, and calculating by utilizing Gibbs sampling to obtain real topic distribution parameters containing the auditing method. Compared with an LDA topic model, the method has the advantages that the information of the auditing method is added into the extracted topics, the problem that the overlapping degree between the topics is too high is solved, and meanwhile support can be provided for an auditing tool set of the knowledge graph in the four-insurance-one-fund field.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products