TY - JOUR
T1 - Heterogeneous CPU-GPU Epsilon Grid Joins
T2 - Static and Dynamic Work Partitioning Strategies
AU - Gallet, Benoit
AU - Gowanlock, Michael
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grant No. 1849559. We thank Frédéric Loulergue for letting us use his platform to conduct our experiments.
Publisher Copyright:
© 2020, The Author(s).
PY - 2021/3
Y1 - 2021/3
N2 - Given two datasets (or tables) A and B and a search distance ϵ, the distance similarity join, denoted as A⋉ ϵB, finds the pairs of points (pa, pb), where pa∈ A and pb∈ B, and such that the distance between pa and pb is ≤ ϵ. If A= B, then the similarity join is equivalent to a similarity self-join, denoted as A⋈ ϵA. We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin, we design in this paper a dynamic and two static work partitioning strategies. We also propose a performance model for each static partitioning strategy to perform the distribution of the work between the processors. We evaluate the performance of all three partitioning methods by considering the execution time and the load imbalance between the CPU and GPU as performance metrics. HEGJoin achieves a speedup of up to 5.46 × (3.97 ×) over the GPU-only (CPU-only) algorithms on our first test platform and up to 1.97 × (12.07 ×) on our second test platform over the GPU-only (CPU-only) algorithms.
AB - Given two datasets (or tables) A and B and a search distance ϵ, the distance similarity join, denoted as A⋉ ϵB, finds the pairs of points (pa, pb), where pa∈ A and pb∈ B, and such that the distance between pa and pb is ≤ ϵ. If A= B, then the similarity join is equivalent to a similarity self-join, denoted as A⋈ ϵA. We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin, we design in this paper a dynamic and two static work partitioning strategies. We also propose a performance model for each static partitioning strategy to perform the distribution of the work between the processors. We evaluate the performance of all three partitioning methods by considering the execution time and the load imbalance between the CPU and GPU as performance metrics. HEGJoin achieves a speedup of up to 5.46 × (3.97 ×) over the GPU-only (CPU-only) algorithms on our first test platform and up to 1.97 × (12.07 ×) on our second test platform over the GPU-only (CPU-only) algorithms.
KW - HEGJoin
KW - Heterogeneous CPU-GPU computing
KW - Range query
KW - Similarity join
KW - Super-EGO
KW - Work partitioning
UR - http://www.scopus.com/inward/record.url?scp=85093100626&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093100626&partnerID=8YFLogxK
U2 - 10.1007/s41019-020-00145-x
DO - 10.1007/s41019-020-00145-x
M3 - Article
AN - SCOPUS:85093100626
SN - 2364-1185
VL - 6
SP - 39
EP - 62
JO - Data Science and Engineering
JF - Data Science and Engineering
IS - 1
ER -