A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU

Brian Donnelly, Michael Gowanlock

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We present COSS, an exact method for high-dimensional distance similarity self-joins using the GPU, which finds all points within a search distance e from each point in a dataset. The similarity self-join can take advantage of the massive parallelism afforded by GPUs, as each point can be searched in parallel. Despite high GPU throughput, distance similarity self-joins exhibit irregular memory access patterns which yield branch divergence and other performance limiting factors. Consequently, we propose several GPU optimizations to improve self-join query throughput, including an index designed for GPU architecture. As data dimensionality increases, the search space increases exponentially. Therefore, to find a reasonable number of neighbors for each point in the dataset, e may need to be large. The majority of indexing strategies that are used to prune the ∈-search focus on a spatial partition of data points based on each point's coordinates. As dimensionality increases, this data partitioning and pruning strategy yields exhaustive searches that eventually degrade to a brute force (quadratic) search, which is the well-known curse of dimensionality problem. To enable pruning the search using an indexing scheme in high-dimensional spaces, we depart from previous indexing approaches, and propose an indexing strategy that does not index based on each point's coordinate values. Instead, we index based on the distances to reference points, which are arbitrary points in the coordinate space. We show that our indexing scheme is able to prune the search for nearby points in high-dimensional spaces where other approaches yield high performance degradation. COSS achieves a speedup over CPU and GPU reference implementations up to 17.7X and 11.8X, respectively.

Original languageEnglish (US)
Title of host publicationProceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450379830
DOIs
StatePublished - Jun 29 2020
Externally publishedYes
Event34th ACM International Conference on Supercomputing, ICS 2020 - Barcelona, Spain
Duration: Jun 29 2020Jul 2 2020

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference34th ACM International Conference on Supercomputing, ICS 2020
Country/TerritorySpain
CityBarcelona
Period6/29/207/2/20

Keywords

  • GPU
  • high dimensional
  • in-memory database
  • multidimensional index
  • similarity search

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU'. Together they form a unique fingerprint.

Cite this