Distributed computation system and method for large-scale data set cross comparison

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of distributed computing and large-scale data, applied in the field of distributed computing, can solve problems such as difficulties in parallel program development, and achieve the effect of facilitating implementation and reducing the difficulty of use.

Inactive Publication Date: 2014-07-23

张一凡

View PDF5 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0013] In addition, there are difficulties in the development of parallel programs during the use of the current solution. Users need to master the internal implementation of the computing platform, and the system is only aimed at solving a specific computing problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0062] Such as Figure 3-5 shown.

[0063] A distributed computing system for cross-comparison of large-scale data sets, including a cross-comparison programming model, a master node, a programming interface and a back-end distributed processing framework based on heterogeneous distributed clusters. The distributed computing system aims to apply the distributed computing environment to efficiently process computing problems satisfying the cross-comparison mode of data sets. The invention helps users abstract and simplify the calculation process to be processed by providing an intuitive cross-comparison programming model for users, and realizes unified support for various cross-comparison calculation problems; provides users with a concise programming interface and helps users develop serial cross-computing Comparing programs, users do not need to master parallel programming knowledge; the system hides the implementation details of parallel computing, and users do not need to ...

Embodiment 2

[0077] A method for processing data utilizing a distributed computing system as described in Embodiment 1, comprising steps as follows:

[0078] (1) Users analyze specific calculation problems;

[0079] (2) the user uses the programming interface provided by the distributed computing system of the present invention to realize four independent computing modules respectively: the specific processing methods of the data reading module, the data preprocessing module, the data comparison module and the data output module, Including steps (a)-(d):

[0080] (a) Data read-in stage: at this stage, the sub-data sets required for sub-task execution are read in from the distributed file system, and each input file in the data set is in the distributed computing system of the present invention Store in the form of index A and initial content;

[0081] (b) Data preprocessing stage: at this stage, the data read in in step (1) is preprocessed according to the user-defined processing method,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a distributed computation system for large-scale data set cross comparison. The distributed computation system comprises a cross comparison programming model, a master node, a programmatic interface and a rear-end distributed processing frame based on a heterogeneous distributed cluster. The distributed computation system aims to process and meet the computation of a data set cross comparison mode efficiently by applying a distributed computation environment. The visual cross comparison programming model is provided for a user, the user is helped to carry out abstraction simplification on the computation process to be processed, and various different cross comparison computation problems can be uniformly supported. The simple programmatic interface is provided for the user, the user is helped to develop a serial cross comparison program, and the user does not need to master the parallel programming knowledge. The achievement details of parallel computation are hidden by the system, the user does not need to master the internal structure of the system, and the use difficulty of the system is lowered. In addition, the provided cross comparison programming model and interface have nothing to do with hardware, and the system can be achieved in different distributed cluster environments conveniently.

Description

technical field [0001] The invention relates to a distributed computing system and method for cross-comparison of large-scale data sets, and belongs to the technical field of distributed computing. technical background [0002] The cross-comparison problem is also known as the cross-connection of data sets or the Cartesian product problem, and the calculation space includes the possible combinations of all elements of the two data sets. The computational problem of cross-comparing all elements in a dataset exists widely in fields such as biological computing, data mining, and pattern recognition. For example, in biometrics, in order to deal with the problem of face recognition, it is necessary to compare the massive face samples in the database one by one to obtain the similarity; in bioinformatics, in order to deeply analyze the evolution characteristics of species, it is necessary to compare the massive DNA, RNA and other gene sequences one by one. To obtain the gene sequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F9/44

CPCG06F9/448G06F9/5083G06F16/182

Inventor 张一凡

Owner 张一凡

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Distributed computation system and method for large-scale data set cross comparison

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology