SFp-Link-based semi-structured data frequent pattern mining method

A technology of semi-structured data and frequent patterns, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc. The extraction of patterns, associations and correlations, and the internal associations of rare information cannot be ignored, so as to achieve the effects of short mining time, high mining efficiency, and small storage space

Active Publication Date: 2018-01-09
SUN YAT SEN UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since some rare diseases require more research, the inherent correlation between these rare information cannot be ignored, and some effective technology is needed to extract relevant information
Existing algorithms such as Apriori, FP-tree and other algorithms are mostly aimed at structured data, and cannot effectively solve the problem of extracting frequent patterns, associations and correlations of unstructured or semi-structured data.
In addition, this type of algorithm also needs to read the sample database multiple times, often resulting in high space and time complexity.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • SFp-Link-based semi-structured data frequent pattern mining method
  • SFp-Link-based semi-structured data frequent pattern mining method
  • SFp-Link-based semi-structured data frequent pattern mining method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Such as Figure 1 to Figure 3 As shown, what the present invention discloses is a kind of semi-structured data frequent pattern mining method based on SFp-Link, comprising:

[0026] Step 1. Perform data preprocessing on the mined sample database, namely:

[0027] Extract the sample item set of each piece of semi-structured data in the mined sample database. The sample item set is a collection of valid data related to the mining purpose in the corresponding semi-structured data. Each valid data contained in the sample item set is An item of this sample item set; for example, the content of a piece of semi-structured data is "the patient was admitted to this ward because of 'dry mouth, polydipsia, polyuria, weight loss for 4 years, left foot redness, swelling, and ulceration for 3 weeks'" , among which, the application-oriented keywords "dry mouth, polydipsia, polyuria, weight loss for 4 years, left foot swelling, ulceration for 3 weeks" are all valid data related to the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a SFp-Link-based semi-structured data frequent pattern mining method. The method comprises the following steps of: establishing a semi-structured data frequent pattern linked list SFp-Link for semi-structured data; carrying out frequent pattern mining on the basis of the semi-structured data frequent pattern linked list SFp-Link, so as to effectively extract a frequent itemset in the semi-structured data according to a mining aim. According to the method, when the semi-structured data frequent pattern linked list SFp-Link is established, scanning is only carried out onthe mined sample data for one time, only sample item sets, project combinations in which are scanned for the first time, are stored, and for the sample item sets, the project combinations in which are scanned again, only corresponding sample frequencies are accumulated for one time, so that the method has the advantages of being small in consumed storage space, short in consumed mining time and high in mining efficiency.

Description

technical field [0001] The invention relates to a method for mining frequent patterns of semi-structured data based on SFp-Link, belonging to the technical field of data mining. Background technique [0002] In the medical field, a large amount of diagnostic data is unstructured or semi-structured data, and there are many types of attributes, often as many as hundreds or even thousands of items (such as genes), but the sample size is very small. If such semi-structured data is converted into structured data, it will inevitably generate a very sparse matrix. However, since some rare diseases need more research, the internal correlation between these rare information cannot be ignored, and some effective technology is needed to extract relevant information. Existing algorithms such as Apriori, FP-tree and other algorithms are mostly aimed at structured data, and cannot effectively solve the problem of extracting frequent patterns, associations and correlations of unstructured...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 蔡庆玲邓少风吕律李海良
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products