Biclustering mining method based on butterfly network under synchronous programming model Hama BSP

A butterfly network and programming model technology, applied in the field of HamaBSP programming, to improve the utilization rate, reduce the amount of data, and reduce the amount of communication

Inactive Publication Date: 2019-01-18
HENAN UNIV OF ECONOMICS & LAW
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to improve the efficiency of bi-clustering mining, give full play to the advantages of distributed parallel platforms, and effectively avoid the problem of low computational efficiency caused by redundant data transmission, the present invention proposes a synchronous programming model Hama BSP based on butterfly network. cluster mining method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Biclustering mining method based on butterfly network under synchronous programming model Hama BSP
  • Biclustering mining method based on butterfly network under synchronous programming model Hama BSP
  • Biclustering mining method based on butterfly network under synchronous programming model Hama BSP

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Example 1 (implementation of bi-clustering mining method on BNHB). See Table 1(a) for an example of a gene expression data set, and see Table 1(b) for the source data input by the algorithm. It is replaced by the generated column label, see the fragmented data when reading the data image 3 The first row of , the final mining results are shown in Table 1(c), and the result threshold of the biclustering column (attribute) is 0.6.

[0032] Table 1

[0033]

[0034]

[0035]

[0036]

[0037] The detailed process of Example 1 is as follows. First, each node reads a piece of data, and then enters the 2 The processing of N supersteps. In the first superstep process (step=1), it first enters the local calculation stage, and each node uses image 3 The data obtained in the first row of the above data are compared locally, and then the intermediate results are generated, see image 3 In the 2nd row. Next, enter the global communication stage. First, the 4 node...

Embodiment 2

[0073] [specific performance analysis]

[0074] We analyze the performance of the method of the present invention, and the most critical factors to measure the pros and cons of the double-clustering mining method based on the butterfly network under the synchronous programming model Hama BSP include: processing efficiency and scalability. Processing efficiency is usually measured by task processing time, which refers to the time between when the user initiates the bi-clustering mining request and when the user gets the mining result. Scalability is usually measured by continuously increasing the amount of data or the number of processing nodes, and the measurement index is generally task processing time. The performance metric adopted in our performance analysis is task processing time.

[0075] We used 6 real gene expression data sets from the BroadInstitute website. The behavioral genes in each data set are listed as experimental conditions, and each cell stores gene expres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a biclustering mining method based on a butterfly network under a synchronous programming model Hama BSP. The concrete steps are that firstly a Hama platform with an underlyingHDFS and 2n nodes is deployed, then the biclustering mining work is performed on each node on the Hama platform in the local computing stage, only the local source data is used if it is the first overstep, otherwise, only the newly received data is matched with the local data and then the nodes are grouped and communicated by the butterfly network method in the global communication stage, and synchronization of the communication is performed in the barrier synchronization stage. According to the method, all the biclustering can be finally mined after multiple times of iteration. According to the method, the communication traffic is less, the redundancy of the communication data volume and the mining result can be effectively reduced and the utilization rate of the nodes can be improved.

Description

technical field [0001] The invention belongs to the field of Hama BSP programming, in particular to a double-clustering mining method based on a butterfly network under a synchronous programming model Hama BSP. Background technique [0002] The rapid development of high-throughput technologies (such as gene microarrays) has made it possible to simultaneously measure the expression levels of all genes in an organ. In this way, a large amount of gene expression data has been accumulated. These data can be regarded as an n×m matrix, where n is the number of genes (number of rows), m is the number of experimental conditions (number of columns), and each data in the matrix represents the expression level of a given gene under the set experiment . Currently, biclustering has become an important tool for gene expression data analysis because of its important role in inferring and creating gene regulatory networks. The purpose of designing the biclustering algorithm is to find a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B25/10G16B40/10
Inventor 姜涛李钧涛
Owner HENAN UNIV OF ECONOMICS & LAW
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products