Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Perform hash joins using parallel processing

A technology of hashing and concatenating data, applied in electrical digital data processing, special data processing applications, digital data processing components, etc., can solve performance problems, unfavorable memory access patterns, slow execution, etc.

Active Publication Date: 2022-03-25
ALTERYX INC
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, hash join algorithms can have unfavorable memory access patterns (e.g. random disk access) and can also be slow to execute
Therefore, existing data processing systems suffer from performance problems when dealing with join algorithms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Perform hash joins using parallel processing
  • Perform hash joins using parallel processing
  • Perform hash joins using parallel processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] There is an interest in obtaining data related to business related functions (eg, customer engagement, process performance, and strategic decision making). Businesses can then use advanced data analytics techniques such as text analytics, machine learning, predictive analytics, data mining, and data at rest to further analyze the collected data. Likewise, with the development of electronic commerce (e-commerce) and the integration of personal computer devices and communication networks (e.g., the Internet) into the exchange of goods, services, and information between businesses and customers, large amounts of business-related data are being Electronic transmission and storage. Large amounts of information that may be important to a business (eg, financial transactions, customer profiles, etc.) can be accessed and obtained from multiple data sources using web-based communications. Due to differences in data sources and large volumes of electronic data that may include i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses the use of computers to link data records. Data records in the first plurality of data records and the second plurality of data records are hashed. Based on the hash, the first plurality of data records and the second plurality of data records are assigned to the first group and the second group, respectively. Associated pairs of packets from the first packet and the second packet are provided to threads executing on the computer processor, and different pairs are provided to different threads. Threads operate on group pairs in parallel to determine whether to join the records in the group. If the hashes associated with the data records match, the thread concatenates the two data records under consideration together. Outputs the connected data records.

Description

technical field [0001] This specification relates generally to data processing techniques, and more specifically, to performing hash joins in a manner optimized for parallel processing computer systems (eg, multi-core processors). Background technique [0002] The development of data analytics platforms (eg, Big Data Analytics) has expanded data processing into tools for processing large amounts of data to extract information of commercial value. To this end, efficient data processing techniques are required to access, process, and analyze large data sets from different data sources. For example, a small business may utilize a third-party data analysis environment that employs dedicated computing and human resources to collect, process, and analyze data from various sources (e.g., external data providers, internal data sources (e.g., files), large data storage units, and cloud-based data (e.g., social media information). Processing large data sets, such as those used in dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F7/00G06F12/02G06F16/22
CPCG06F16/2453G06F16/2456G06F16/2255G06F16/24532G06F16/137G06F16/285G06F9/5066
Inventor E·P·哈丁A·D·赖利C·H·金斯利S·威斯纳
Owner ALTERYX INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products