TY - GEN
T1 - Hybrid CPU/GPU clustering in shared memory on the billion point scale
AU - Gowanlock, Michael
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/6/26
Y1 - 2019/6/26
N2 - Many applications require clustering data using an unsupervised approach. One such clustering algorithm is Dbscan, which is inherently sequential, thus limiting parallelization opportunities. Consequently, several recent works have proposed novel shared- and distributed-memory approaches for scaling Dbscan. We propose BPS-HDbscan, a shared-memory CPU/GPU approach that clusters on the billion-point scale. The major pillars of BPS-HDbscan are as follows: (i) distance calculation avoidance in dense data regions; (ii) efficient merging of subclusters; (iii) obviating limited GPU memory capacity by both batching the result set and partitioning the input dataset; and, (iv) computing data partitions in parallel, which effectively exploits both CPU and GPU resources. BPS-HDbscan is highly efficient, and to our knowledge, is the first shared-memory Dbscan algorithm to cluster on the billion point scale.
AB - Many applications require clustering data using an unsupervised approach. One such clustering algorithm is Dbscan, which is inherently sequential, thus limiting parallelization opportunities. Consequently, several recent works have proposed novel shared- and distributed-memory approaches for scaling Dbscan. We propose BPS-HDbscan, a shared-memory CPU/GPU approach that clusters on the billion-point scale. The major pillars of BPS-HDbscan are as follows: (i) distance calculation avoidance in dense data regions; (ii) efficient merging of subclusters; (iii) obviating limited GPU memory capacity by both batching the result set and partitioning the input dataset; and, (iv) computing data partitions in parallel, which effectively exploits both CPU and GPU resources. BPS-HDbscan is highly efficient, and to our knowledge, is the first shared-memory Dbscan algorithm to cluster on the billion point scale.
KW - DBSCAN
KW - GPU
KW - Heterogeneous computing
KW - Parallel clustering
UR - http://www.scopus.com/inward/record.url?scp=85074530577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074530577&partnerID=8YFLogxK
U2 - 10.1145/3330345.3330349
DO - 10.1145/3330345.3330349
M3 - Conference contribution
AN - SCOPUS:85074530577
T3 - Proceedings of the International Conference on Supercomputing
SP - 35
EP - 45
BT - ICS 2019 - International Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019
Y2 - 26 June 2019
ER -