GPU-accelerated similarity self-join for multi-dimensional data

Michael Gowanlock, Ben Karsin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The similarity self-join finds all objects in a dataset that are within a search distance, ∈, of each other. As such, the self-join is a building block of many algorithms. In high dimensions, indexing structures become increasingly ineffective at pruning the search, making the self-join challenging to compute efficiently. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. The massive parallelism afforded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based GPU-tailored index to perform range queries, and propose the following optimizations: (i) a trade-off between candidate set filtering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the filtering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Our algorithm generally outperforms a parallel CPU state-of-the-art approach.

Original languageEnglish (US)
Title of host publication15th International Workshop on Data Management on New Hardware, DaMoN 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450368018
DOIs
StatePublished - Jul 1 2019
Externally publishedYes
Event15th International Workshop on Data Management on New Hardware, DaMoN 2019, Held with ACM SIGMOD/PODS 2019 - Amsterdam, Netherlands
Duration: Jul 1 2019 → …

Publication series

NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

Conference

Conference15th International Workshop on Data Management on New Hardware, DaMoN 2019, Held with ACM SIGMOD/PODS 2019
Country/TerritoryNetherlands
CityAmsterdam
Period7/1/19 → …

Keywords

  • GPGPU
  • High-dimensional data
  • In-memory data-base
  • Index structure
  • Query optimization
  • Self-join

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'GPU-accelerated similarity self-join for multi-dimensional data'. Together they form a unique fingerprint.

Cite this