Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Distributed computation system and method for large-scale data set cross comparison

A technology of distributed computing and large-scale data, applied in the field of distributed computing, can solve problems such as difficulties in parallel program development, and achieve the effect of facilitating implementation and reducing the difficulty of use.

Inactive Publication Date: 2014-07-23
张一凡
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] In addition, there are difficulties in the development of parallel programs during the use of the current solution. Users need to master the internal implementation of the computing platform, and the system is only aimed at solving a specific computing problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed computation system and method for large-scale data set cross comparison
  • Distributed computation system and method for large-scale data set cross comparison
  • Distributed computation system and method for large-scale data set cross comparison

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as Figure 3-5 shown.

[0063] A distributed computing system for cross-comparison of large-scale data sets, including a cross-comparison programming model, a master node, a programming interface and a back-end distributed processing framework based on heterogeneous distributed clusters. The distributed computing system aims to apply the distributed computing environment to efficiently process computing problems satisfying the cross-comparison mode of data sets. The invention helps users abstract and simplify the calculation process to be processed by providing an intuitive cross-comparison programming model for users, and realizes unified support for various cross-comparison calculation problems; provides users with a concise programming interface and helps users develop serial cross-computing Comparing programs, users do not need to master parallel programming knowledge; the system hides the implementation details of parallel computing, and users do not need to ...

Embodiment 2

[0077] A method for processing data utilizing a distributed computing system as described in Embodiment 1, comprising steps as follows:

[0078] (1) Users analyze specific calculation problems;

[0079] (2) the user uses the programming interface provided by the distributed computing system of the present invention to realize four independent computing modules respectively: the specific processing methods of the data reading module, the data preprocessing module, the data comparison module and the data output module, Including steps (a)-(d):

[0080] (a) Data read-in stage: at this stage, the sub-data sets required for sub-task execution are read in from the distributed file system, and each input file in the data set is in the distributed computing system of the present invention Store in the form of index A and initial content;

[0081] (b) Data preprocessing stage: at this stage, the data read in in step (1) is preprocessed according to the user-defined processing method,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a distributed computation system for large-scale data set cross comparison. The distributed computation system comprises a cross comparison programming model, a master node, a programmatic interface and a rear-end distributed processing frame based on a heterogeneous distributed cluster. The distributed computation system aims to process and meet the computation of a data set cross comparison mode efficiently by applying a distributed computation environment. The visual cross comparison programming model is provided for a user, the user is helped to carry out abstraction simplification on the computation process to be processed, and various different cross comparison computation problems can be uniformly supported. The simple programmatic interface is provided for the user, the user is helped to develop a serial cross comparison program, and the user does not need to master the parallel programming knowledge. The achievement details of parallel computation are hidden by the system, the user does not need to master the internal structure of the system, and the use difficulty of the system is lowered. In addition, the provided cross comparison programming model and interface have nothing to do with hardware, and the system can be achieved in different distributed cluster environments conveniently.

Description

technical field [0001] The invention relates to a distributed computing system and method for cross-comparison of large-scale data sets, and belongs to the technical field of distributed computing. technical background [0002] The cross-comparison problem is also known as the cross-connection of data sets or the Cartesian product problem, and the calculation space includes the possible combinations of all elements of the two data sets. The computational problem of cross-comparing all elements in a dataset exists widely in fields such as biological computing, data mining, and pattern recognition. For example, in biometrics, in order to deal with the problem of face recognition, it is necessary to compare the massive face samples in the database one by one to obtain the similarity; in bioinformatics, in order to deeply analyze the evolution characteristics of species, it is necessary to compare the massive DNA, RNA and other gene sequences one by one. To obtain the gene sequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F9/44
CPCG06F9/448G06F9/5083G06F16/182
Inventor 张一凡
Owner 张一凡
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products