A weighted association rule mining method for Chinese inter-word matrix based on item frequency and weight

A matrix-weighted, Chinese word technology, applied in data mining, text database query, text database indexing, etc., can solve the problems of not considering the importance of the association mode, the weight of the association mode, and unable to solve the change of the item weight, so as to improve the retrieval efficiency. Performance, high application value and promotion prospects, the effect of good application value

Inactive Publication Date: 2021-10-29
GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method only considers the frequency of the association mode, and does not consider the importance of the association mode in the transaction database (ie, the weight of the association mode)
The second type is the calculation method of association mode support with fixed item weights. This method uses the product of the sum of item set weights and unweighted association mode support as the weighted item set support (C.H.Cai, A.da, W.C.Fu, etal.Mining Association Rules with Weighted Items[C] / / Proceedings of IEEEInternational database Engineering and Application Symposiums,1998:68-77.), this method overcomes the defects of the first type of method and considers the item weight, but the item weight The value is fixed during the mining process, which cannot solve the situation that the item weight changes with different transaction records

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A weighted association rule mining method for Chinese inter-word matrix based on item frequency and weight
  • A weighted association rule mining method for Chinese inter-word matrix based on item frequency and weight
  • A weighted association rule mining method for Chinese inter-word matrix based on item frequency and weight

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to better illustrate the technical solution of the present invention, the specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings, but this does not constitute a limitation to the protection scope of the claims of the present invention.

[0040] Such as figure 1 As shown, the weighted association rule mining method of Chinese inter-word matrix based on item frequency and weight includes the following steps:

[0041] 1. Preprocess the Chinese documents to be mined, that is, remove Chinese stop words, extract feature words and calculate their weights, and build a Chinese feature lexicon and a Chinese document index library.

[0042] The feature word weight indicates the importance of the Chinese feature word to the Chinese document where it is located. The classic and popular tf-idf feature word weight calculation method is used. The calculation formula is:

[0043]

[0044] In f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese inter-word matrix weighted association rule mining method based on item frequency and weight. First, the Chinese documents to be mined are preprocessed by removing Chinese stop words, extracting feature words, and calculating feature word weights, respectively. Construct the Chinese feature word library and Chinese document index library; use the matrix weighted support calculation method based on item frequency and weight to mine the weighted frequent itemsets of the Chinese feature word matrix, and obtain the weighted frequent item set of the Chinese feature word matrix; use the confidence - Interest degree evaluation framework mines the weighted association rule mode of the Chinese feature word matrix from the weighted frequent item set of the Chinese feature word matrix. The method of the present invention fully considers the occurrence frequency and weight value of the feature words in the document, and can excavate a more practical, more reasonable and Chinese inter-word matrix weighted association rule pattern that can better reflect various association relationships between the feature words. Patterns applied to the field of information retrieval query expansion can improve the performance of information retrieval.

Description

technical field [0001] The invention belongs to the field of Chinese text mining, in particular to a Chinese inter-word matrix weighted association rule mining method based on item frequency and weight. Background technique [0002] In the study of association pattern mining, the core problem is the calculation of the support degree of association patterns. In the current research, there are mainly three types of association model support calculation methods as follows: the first type is the unweighted association model support calculation method (see literature R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large database[C].In Proceeding of 1993ACM SIGMOD International Conference on Management of Data,Washington D.C.,1993,(5):207-216.), this is an early classic support calculation method, which combines the association mode in the transaction The probability of occurrence in is taken as the support of the association mode. This met...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N5/02G06F16/33G06F16/31
CPCG06F2216/03G06N5/025G06F16/316G06F16/3334G06F16/3335
Inventor 黄名选
Owner GUANGXI UNIVERSITY OF FINANCE AND ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products