TY - GEN
T1 - Sorting large datasets with heterogeneous CPU/GPU architectures
AU - Gowanlock, Michael
AU - Karsin, Ben
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/3
Y1 - 2018/8/3
N2 - We examine heterogeneous sorting for input data that exceeds GPU global memory capacity. Applications that require significant communication between the host and GPU often need to obviate communication overheads to achieve performance gains over parallel CPU-only algorithms. We advance several optimizations to reduce the host-GPU communication bottleneck, and find that host-side bottlenecks also need to be mitigated to fully exploit heterogeneous architectures. We demonstrate this by comparing our work to end-to-end response time calculations from the literature. Our approaches mitigate several heterogeneous sorting bottlenecks, as demonstrated on single- and dual-GPU platforms. We achieve speedups up to 3.47x over the parallel reference implementation on the CPU. The current path to exascale requires heterogeneous architectures. As such, our work encourages future research in this direction for heterogeneous sorting in the multi-GPU NVLink era.
AB - We examine heterogeneous sorting for input data that exceeds GPU global memory capacity. Applications that require significant communication between the host and GPU often need to obviate communication overheads to achieve performance gains over parallel CPU-only algorithms. We advance several optimizations to reduce the host-GPU communication bottleneck, and find that host-side bottlenecks also need to be mitigated to fully exploit heterogeneous architectures. We demonstrate this by comparing our work to end-to-end response time calculations from the literature. Our approaches mitigate several heterogeneous sorting bottlenecks, as demonstrated on single- and dual-GPU platforms. We achieve speedups up to 3.47x over the parallel reference implementation on the CPU. The current path to exascale requires heterogeneous architectures. As such, our work encourages future research in this direction for heterogeneous sorting in the multi-GPU NVLink era.
KW - GPGPU
KW - Heterogeneous architecture
KW - Sorting
UR - http://www.scopus.com/inward/record.url?scp=85052209789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052209789&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2018.00095
DO - 10.1109/IPDPSW.2018.00095
M3 - Conference contribution
AN - SCOPUS:85052209789
SN - 9781538655559
T3 - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
SP - 560
EP - 569
BT - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
Y2 - 21 May 2018 through 25 May 2018
ER -