Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Top-k arrangement query method based on metric space in distributed environment

A distributed environment, metric space technology, applied in the field of query, can solve problems such as data redundancy, inapplicability of large data sets, performance bottlenecks, etc., to achieve the effect of reducing comparison operations and speeding up query speed

Active Publication Date: 2016-10-26
SOUTHEAST UNIV
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Tiakas E and others first proposed this concept, but it was only studied in the traditional stand-alone mode. At present, with the rapid increase of data sets, the traditional stand-alone algorithm encounters performance bottlenecks, and Tiakas E and others use M-tree. The index storage structure is completely inapplicable for large data sets, which will lead to a large amount of data redundancy, so it is imminent to study the parallel top-k dominance algorithm based on metric space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Top-k arrangement query method based on metric space in distributed environment
  • Top-k arrangement query method based on metric space in distributed environment
  • Top-k arrangement query method based on metric space in distributed environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] This embodiment is completed on a 7-node spark distributed cluster. Spark is built on hadoop, using hadoop's yarn resource manager and HDFS file storage system. Among the 7 nodes, the master node is both a driver node and a worker node, and the remaining 6 nodes are all worker nodes. All algorithms are written in Scala language, and the basic configuration is shown in Table 2:

[0067] Table 2 Experimental environment configuration

[0068]

[0069] Such as Figure 2 to Figure 5 As shown, the experimental part mainly evaluates the three algorithms of DSDA, DKDA, and DAKDA from the following aspects: the impact of the number of partitions num on the query time (select a reasonable number of partitions), the impact of the returned result k on the query, and the size of the query input set Q The impact on the query time, the comparison of each algorithm candidate set and the scalability of the algorithm, the default settings of the parameters in the experiment are sho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a top-k arrangement query method based on a metric space in a distributed environment. The top-k arrangement query method sequentially comprises the following steps of: (1), giving a query input set Q and a distance formula d() in the metric space, wherein the distance formula is used for measuring the distance between a whole data object and a query object Q; and (2), providing a parallel algorithm based on a set ANN and k-skyband according to the step (1). Characteristics of parallel calculation among various nodes are sufficiently utilized in the distributed environment; the top-k arrangement query performance based on the metric space in a large dataset environment is greatly improved by pruning and sorting; the query speed is enhanced; and services are provided for decision of users.

Description

technical field [0001] The invention relates to a query method, in particular to a parallel top-k dominance query method based on metric space in a distributed environment of mass data concentration. Background technique [0002] As an important complex query, the top-k domination query based on metric space is getting more and more attention. It returns a part of the data that meets the user's needs from the massive multi-dimensional data set. This type of query provides decision-making for users, for example, it has a wide range of applications in web search, multimedia retrieval, e-commerce and other fields. This query does not require the user to give an evaluation function and the result set is controllable. It calculates the dominance score of each object and returns the k result sets with the highest dominance scores. [0003] A top-k dominance query based on a metric space is defined as follows: with O={o 1 ,o 2 ,...,o n} represents the collection of all data obj...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/24532
Inventor 何洁月罗浩
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products