Data space multi-dimension indexing method based on load balance and query log

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A load balancing and data space technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of high cost of hard disk I/O overhead, inability to efficiently support large-scale data query processing, and inability to load indexes in memory Problems such as graphs to achieve the effect of minimizing communication overhead

Inactive Publication Date: 2016-11-09

HARBIN ENG UNIV

View PDF6 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, none of these existing methods can efficiently support large-scale data query processing

This is because in the process of large-scale data query processing, the cost of hard disk I / O overhead is much higher than the cost saved by searching, or the memory cannot load a huge index map

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0068] Specific implementation mode one: as figure 1 As shown, the implementation of the load balancing and query log-based data space multidimensional indexing method is described in detail in this embodiment as follows:

[0069] 1. In order to successfully extend the inverted index into the data space, the attribute labels and attribute values are aggregated and coded into token words:

[0070] Define Token. For an attribute-value pair (a, v), its corresponding token is defined as t=v / / a.

[0071] Essentially, an entity is often composed of a set of attribute-value pairs (note that the content can be regarded as an attribute-value pair). In other words, an entity is actually a vector of tokens (t 1 ,t 2 ,...,t |D| ), where D represents all the different token identifiers in the data space.

[0072] Define entity vector, an entity vector is defined as o=(w 1 ,w 2 ,...,w |D| ), where w i Indicates the token word t i the weight of.

[0073] The partition-based data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a data space multi-dimension indexing method based on a load balance and a query log, and relates to the technical field of data space indexing. The purposes that inverted indexes are distributed in different index nodes to enable the index nodes to keep load balance, communication consumption related in query processing is minimized, and the searching space is reduced are achieved. In vertical partitioning, token words for indexing are gathered through the query log and words frequently occurring in an entity, and an access mode between user query and an inverted list is represented by a hypergraph; in horizontal partitioning, access mode information between the user query and the entity is depicted by a hypergraph, horizontal partitioning problems are reduced into hypergraph partitioning problems, therefore, loads of the different index nodes keep balanced, and communication consumption related in querying is reduced. By combining the vertical partitioning and horizontal partitioning strategy, two-dimensional mixed indexing is constructed and expanded to be three-dimensional indexing. An experiment on a public data set DBLP shows that the handling capacity, the query response time and the expansibility of the method are superior to those of an existing method.

Description

technical field [0001] The invention relates to a data space multidimensional indexing method, and relates to the technical field of data space indexing. Background technique [0002] With the rapid development of big data and Internet technology, data space scenarios have become more and more common, especially in the fields of Web and personal information management systems such as Wikipedia, Google Base, and Linked Data. Different from traditional relational databases that mainly focus on specific domains and a fixed number of attributes, data spaces are characterized by heterogeneity, sparseness, large scale, and interrelationships. Therefore, it is of great significance to provide users with efficient data space query services. Usually, indexing is often one of the important means to improve query processing efficiency, so it is of great significance to study an efficient data space indexing technology. [0003] At present, the research on data spatial index technolog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/2264

Inventor 王红滨王念滨周连科祝官文王瑛琦何鸣宋奎勇

Owner HARBIN ENG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data space multi-dimension indexing method based on load balance and query log

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology