Corpus keyword automatic extraction algorithm based on data mining

A technology of data mining and automatic extraction, applied in data processing applications, digital data processing, digital data information retrieval, etc. Easy to use effect

Inactive Publication Date: 2019-10-25
厦门美域中央信息科技有限公司
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The existing corpus keyword extraction algorithm is relatively complex, it is difficult to quickly extract the required data from it, and the keyword cannot be automatically extracted from the corpus. The extraction process takes a long time, the extraction efficiency is low, and the accuracy of keyword extraction needs to be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus keyword automatic extraction algorithm based on data mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are exemplary only, and are not intended to limit the scope of the present invention. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present invention.

[0038] Such as figure 1 As shown, a kind of corpus keyword automatic extraction algorithm based on data mining that the present invention proposes comprises the following steps:

[0039] S1. Obtain the text to be processed;

[0040] S2. Perform word segmentation processing on the acquired text to obtain a word segmentation set;

[0041] S3. Perform part-of-speech tagging and word meaning tagging on the words in the wor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a corpus keyword automatic extraction algorithm based on data mining. The corpus keyword automatic extraction algorithm comprises the following steps of obtaining a to-be-processed text; performing word segmentation processing on the obtained text; performing part-of-speech tagging and meaning tagging on words in the word segmentation set; performing word removal processingon the word segmentation set; counting co-occurrence information of the word frequency and the word pair; setting a word frequency threshold, and judging whether the word frequency of the words in the vocabulary set is greater than the word frequency threshold or not; generating a candidate keyword set; obtaining position information of each candidate keyword in the text; calculating a weight value of each candidate keyword in the text; setting a weight value threshold value, and judging whether a calculation result of each candidate keyword is greater than the weight value threshold value ornot; generating a set of keywords. According to the method, a corpus keyword extraction algorithm is optimized, the operation is simple and convenient, the keywords can be automatically extracted from the corpus, time and labor are saved, and the keyword extraction accuracy is remarkably improved.

Description

technical field [0001] The invention relates to the technical field of corpus keyword extraction, in particular to an algorithm for automatically extracting corpus keywords based on data mining. Background technique [0002] The existing corpus keyword extraction algorithm is relatively complex, it is difficult to quickly extract the required data from it, and the keyword cannot be automatically extracted from the corpus. The extraction process takes a long time, the extraction efficiency is low, and the accuracy of keyword extraction needs to be improved. . Contents of the invention [0003] (1) Purpose of the invention [0004] In order to solve the technical problems existing in the background technology, the present invention proposes a corpus keyword automatic extraction algorithm based on data mining, optimizes the corpus keyword extraction algorithm, is easy to operate, and can automatically extract keywords from the corpus, saving time and effort, Significantly i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/335G06F17/27G06F17/24G06Q50/26
CPCG06F16/335G06Q50/26G06F40/169G06F40/216G06F40/211G06F40/289
Inventor 刘家祥
Owner 厦门美域中央信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products