TY - GEN
T1 - A study of work distribution and contention in database primitives on heterogeneous CPU/GPU architectures
AU - Gowanlock, Michael
AU - Fink, Zane
AU - Karsin, Ben
AU - Wright, Jordan
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/3/22
Y1 - 2021/3/22
N2 - Graphics Processing Units (GPUs) provide very high on-card memory bandwidth which can be exploited to address data-intensive workloads. To maximize algorithm throughput, it is important to concurrently utilize both the CPU and GPU to carry out database queries. We select data-intensive algorithms that are common in databases and data analytic applications including: (i) scan; (ii) batched predecessor searches; (iii) multiway merging; and, (iv) partitioning. For each algorithm, we examine the performance of parallel CPU/GPU-only, and hybrid CPU/GPU approaches. There are several challenges to combining the CPU and GPU for query processing, including distributing work between architectures. We demonstrate that despite being able to accurately split the work between the CPU and GPU, contention for memory bandwidth is a major limiting factor for hybrid CPU/GPU data-intensive algorithms. We employ performance models that allow us to explore several research questions. We find that while hybrid data-intensive algorithms may be limited by contention, these algorithms are more robust to workload characteristics; therefore, they are preferable to CPU/GPU-only approaches. We also find that hybrid algorithms achieve good performance when there is low memory contention between the CPU and GPU, such that the GPU can perform its operations without significantly reducing CPU throughput.
AB - Graphics Processing Units (GPUs) provide very high on-card memory bandwidth which can be exploited to address data-intensive workloads. To maximize algorithm throughput, it is important to concurrently utilize both the CPU and GPU to carry out database queries. We select data-intensive algorithms that are common in databases and data analytic applications including: (i) scan; (ii) batched predecessor searches; (iii) multiway merging; and, (iv) partitioning. For each algorithm, we examine the performance of parallel CPU/GPU-only, and hybrid CPU/GPU approaches. There are several challenges to combining the CPU and GPU for query processing, including distributing work between architectures. We demonstrate that despite being able to accurately split the work between the CPU and GPU, contention for memory bandwidth is a major limiting factor for hybrid CPU/GPU data-intensive algorithms. We employ performance models that allow us to explore several research questions. We find that while hybrid data-intensive algorithms may be limited by contention, these algorithms are more robust to workload characteristics; therefore, they are preferable to CPU/GPU-only approaches. We also find that hybrid algorithms achieve good performance when there is low memory contention between the CPU and GPU, such that the GPU can perform its operations without significantly reducing CPU throughput.
KW - GPGPU
KW - heterogeneous systems
KW - hybrid algorithms
KW - in-memory database
KW - memory-bound algorithms
KW - multiway merge
KW - partitioning
KW - predecessor search
KW - scan
UR - http://www.scopus.com/inward/record.url?scp=85104959032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104959032&partnerID=8YFLogxK
U2 - 10.1145/3412841.3441913
DO - 10.1145/3412841.3441913
M3 - Conference contribution
AN - SCOPUS:85104959032
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 311
EP - 320
BT - Proceedings of the 36th Annual ACM Symposium on Applied Computing, SAC 2021
PB - Association for Computing Machinery
T2 - 36th Annual ACM Symposium on Applied Computing, SAC 2021
Y2 - 22 March 2021 through 26 March 2021
ER -