TY - GEN
T1 - Load imbalance mitigation optimizations for GPU-accelerated similarity joins
AU - Gallet, Benoit
AU - Gowanlock, Michael
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - The distance similarity self-join is widely used in database applications and is defined as joining a table on itself using a distance predicate. The similarity self-join is often used in spatial applications and is a building block of other algorithms, such as those used for data analysis. In this paper, we propose several new optimizations mitigating load imbalance of a GPU-accelerated self-join algorithm. The data-dependent nature of the self-join makes the algorithm potentially unsuitable for the GPU's architecture, due to variance in workloads assigned to threads. Consequently, we propose a method that reduces load imbalance and subsequent thread divergence between threads executing in a warp by considering the total workload assigned to each thread, and forcing the GPU's hardware scheduler to group threads with similar workloads within the same warp. Also, by leveraging a grid-based index, we propose a new balanced computational pattern for both reducing the number of distance calculations and the workload variance between threads. Moreover, we exploit additional parallelism by increasing the workload granularity to further improve computational throughput and workload balance within warps. Our solution achieves a speedup of up to 9.7x and 1.6x on average against another GPU algorithm, and up to 10.7x with an average of 2.5x against a CPU state-of-the-art parallel algorithm.
AB - The distance similarity self-join is widely used in database applications and is defined as joining a table on itself using a distance predicate. The similarity self-join is often used in spatial applications and is a building block of other algorithms, such as those used for data analysis. In this paper, we propose several new optimizations mitigating load imbalance of a GPU-accelerated self-join algorithm. The data-dependent nature of the self-join makes the algorithm potentially unsuitable for the GPU's architecture, due to variance in workloads assigned to threads. Consequently, we propose a method that reduces load imbalance and subsequent thread divergence between threads executing in a warp by considering the total workload assigned to each thread, and forcing the GPU's hardware scheduler to group threads with similar workloads within the same warp. Also, by leveraging a grid-based index, we propose a new balanced computational pattern for both reducing the number of distance calculations and the workload variance between threads. Moreover, we exploit additional parallelism by increasing the workload granularity to further improve computational throughput and workload balance within warps. Our solution achieves a speedup of up to 9.7x and 1.6x on average against another GPU algorithm, and up to 10.7x with an average of 2.5x against a CPU state-of-the-art parallel algorithm.
KW - GPGPU
KW - Load balancing
KW - Query Optimization
KW - Range Query
KW - Self-join
UR - http://www.scopus.com/inward/record.url?scp=85070416414&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070416414&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2019.00078
DO - 10.1109/IPDPSW.2019.00078
M3 - Conference contribution
AN - SCOPUS:85070416414
T3 - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
SP - 396
EP - 405
BT - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 33rd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
Y2 - 20 May 2019 through 24 May 2019
ER -