Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic Chinese text topic exploration method and system

A text and Chinese technology, applied in the field of automatic Chinese text topic exploration, can solve the problems of increased time consumption, increased calculation, inconvenient induction, and extraction of text topics, etc., to achieve the effect of easy extraction

Pending Publication Date: 2021-03-26
珠海横琴博易数据技术有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. The goal of K-Means judgment is to minimize the sum of the squared distances from the cluster members to the actual centroid containing the member. As the analyzed data set continues to increase, it is necessary to calculate the distance from all data points to the centroid each time, and the amount of calculation is Continuously increasing, time-consuming increase;
[0004] 2. K-Means can only divide the text into multiple different clusters or classes according to a given number, and does not provide more classification information, which is not convenient for manual and faster induction and extraction of text topics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic Chinese text topic exploration method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] This part will describe the specific embodiment of the present invention in detail, and the preferred embodiment of the present invention is shown in the accompanying drawings. Each technical feature and overall technical solution of the invention, but it should not be understood as a limitation on the protection scope of the present invention.

[0023] In the description of the present invention, multiple means more than two, greater than, less than, exceeding, etc. are understood as not including the original number, and above, below, within, etc. are understood as including the original number. If the description of the first and second is only for the purpose of distinguishing the technical features, it cannot be understood as indicating or implying the relative importance or implicitly indicating the number of the indicated technical features or implicitly indicating the order of the indicated technical features relation.

[0024] In the description of the present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an automatic Chinese text topic exploration method and system; the system comprises a word vector construction module, a text clustering module and a visualization module; theautomatic Chinese text topic exploration method is used in the system, and the problem of long calculation time consumption of a K-Means clustering method can be solved; and more classification feature information is provided, so that text topics can be extracted manually and quickly.

Description

technical field [0001] The invention relates to the field of text theme exploration, in particular to a method and system for automatic Chinese text theme exploration. Background technique [0002] There are many methods of topic exploration, such as topic extraction based on LDA, K-Means text clustering based on unsupervised learning, etc. The LDA topic model is a topic inference based on the perspective of probability and statistics using Bayesian thinking, K-Means clustering The model is a scatter clustering based on the distance of the space vector, which can finally divide the text into different clusters or classes. On this basis, the purpose of text topic extraction is finally achieved through manual further information extraction and induction; under this background , K-Means has the following disadvantages: [0003] 1. The goal of K-Means judgment is to minimize the sum of the squared distances from the cluster members to the actual centroid containing the member. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/284G06K9/62G06F40/258G06F40/216G06F40/49
CPCG06F40/284G06F40/258G06F40/216G06F40/49G06F18/23213Y02D10/00
Inventor 张荣显
Owner 珠海横琴博易数据技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products