TY - GEN
T1 - Accelerating the unacceleratable
T2 - 15th International Workshop on Data Management on New Hardware, DaMoN 2019, Held with ACM SIGMOD/PODS 2019
AU - Gowanlock, Michael
AU - Karsin, Ben
AU - Fink, Zane
AU - Wright, Jordan
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/7/1
Y1 - 2019/7/1
N2 - Many database operations have a low compute to memory access ratio. In heterogeneous systems, where a graphics processing unit (GPU) is interconnected via PCIe, the data transfer bottleneck is perceived as insurmountable to achieving performance gains on these memory-bound database primitives. On the other hand, several compute-bound database operations have been shown to achieve significant performance gains using the GPU. This leads to CPU-only memory-bound applications having an increasingly non-negligible impact on database query throughput. In this paper we examine several of these overlooked algorithms, including (i) batched predecessor searches; (ii) multiway merging; and, (iii) partitioning. We examine the performance of parallel CPU-only, GPU-only, and hybrid CPU/GPU approaches, and show that hybrid algorithms achieve respectable performance gains. We develop a model that considers main memory accesses and PCIe data transfers, which are two major bottlenecks for hybrid CPU/GPU algorithms. The model lets us analytically determine how to distribute work between the CPU and GPU to maximize resource utilization while minimizing load imbalance. We show that our model can accurately predict the fraction of work to be sent to each architecture, and consequently, confirms that these overlooked database primitives can be accelerated despite their memory-bound nature.
AB - Many database operations have a low compute to memory access ratio. In heterogeneous systems, where a graphics processing unit (GPU) is interconnected via PCIe, the data transfer bottleneck is perceived as insurmountable to achieving performance gains on these memory-bound database primitives. On the other hand, several compute-bound database operations have been shown to achieve significant performance gains using the GPU. This leads to CPU-only memory-bound applications having an increasingly non-negligible impact on database query throughput. In this paper we examine several of these overlooked algorithms, including (i) batched predecessor searches; (ii) multiway merging; and, (iii) partitioning. We examine the performance of parallel CPU-only, GPU-only, and hybrid CPU/GPU approaches, and show that hybrid algorithms achieve respectable performance gains. We develop a model that considers main memory accesses and PCIe data transfers, which are two major bottlenecks for hybrid CPU/GPU algorithms. The model lets us analytically determine how to distribute work between the CPU and GPU to maximize resource utilization while minimizing load imbalance. We show that our model can accurately predict the fraction of work to be sent to each architecture, and consequently, confirms that these overlooked database primitives can be accelerated despite their memory-bound nature.
KW - GPGPU
KW - Heterogeneous systems
KW - Hybrid algorithms
KW - In-memory database
KW - Memory-bound algorithms
UR - http://www.scopus.com/inward/record.url?scp=85074454872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074454872&partnerID=8YFLogxK
U2 - 10.1145/3329785.3329926
DO - 10.1145/3329785.3329926
M3 - Conference contribution
AN - SCOPUS:85074454872
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
BT - 15th International Workshop on Data Management on New Hardware, DaMoN 2019
PB - Association for Computing Machinery
Y2 - 1 July 2019
ER -