GPUQP: Relational Databases on Graphics Processors (2007-2012)

Publications

Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD International Conference on Management of data, pages: 511-524, 2008.

Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS) 34.4 (2009): 1-39.

Rui Fang, Binsheng He, Mian Lu, Ke Yang, Naga K. Govindaraju, Qiong Luo, Pedro Sander. GPUQP: Query Co-Processing using Graphics Processors. ACM SIGMOD 2007 (system demonstration).

Wenbin Fang, Bingsheng He, Qiong Luo. Database Compression on Graphics Processors. In Proceedings of International Conference on Very Large Data Bases (VLDB) 2010.

Abstract

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs allow writes to random memory locations, provide efficient interprocessor communication through on-chip local memory, and support a general purpose parallel programming model. Nevertheless, many of the GPU features are specialized for graphics processing, including the massively multithreaded architecture, the Single-Instruction-Multiple-Data processing style, and the execution model of a single application at a time. Additionally, GPUs rely on a bus of limited bandwidth to transfer data to and from the CPU, do not allow dynamic memory allocation from GPU kernels, and have little hardware support for write conflicts. Therefore, a careful design and implementation is required to utilize the GPU for coprocessing database queries.

We present our design, implementation, and evaluation of an in-memory relational query coprocessing system, GPUQP, on the GPU. Taking advantage of the GPU hardware features, we design a set of highly optimized data-parallel primitives such as split and sort, and use these primitives to implement common relational query processing algorithms. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. Furthermore, we propose coprocessing techniques that take into account both the computation resources and the GPU-CPU data transfer cost so that each operator in a query can utilize suitable processors—the CPU, the GPU, or both—for an optimized overall performance. We have evaluated our GPUQP system on a machine with an Intel quad-core CPU and an NVIDIA GeForce 8800 GTX GPU. Our workloads include microbenchmark queries on memory-resident data as well as TPC-H queries that involve complex data types and multiple query operators on data sets larger than the GPU memory. Our results show that our GPU-based algorithms are 2–27x faster than their optimized CPU-based counterparts on in-memory data. Moreover, the performance of our coprocessing scheme is similar to, or better than, both the GPU-only and the CPU-only schemes.

Highlights

Our system GPUQP is the pioneering system of accelerating databases on GPUs, which inspired the usage of GPUs into relational databases in academia and industry.

Our research on GPU-accelerated databases has led the emerging adoption of GPU accelerations in industry, including Brytlyt, BlazingDB, Omnisci (formerly MapD) and SQream. (screenshot in SQream's presentation)
The relational join paper was among the “Best papers” in ACM SIGMOD 2008 (invited to be featured at a special issue at ACM TODS).

In the following, we present more details on the "impact factors" of this project (see definition of "impact factors").

Citations

According to GoogleScholar, our publications on GPUQP have received over 1,000 citations since 2007.

Example quotes on citations.

Relevance to Industry and Open-Source Community

This system has inspired other open-source systems and industry systems.

[arXiv] Albutiu, Martina-Cezara, Alfons Kemper, and Thomas Neumann.
Massively parallel sort-merge joins in main memory multi-core database systems, arXiv 2012.

[VLDB] Heimel, Max, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl.
Hardware-oblivious parallelism for in-memory column-stores, VLDB 2013.

Saecker, Michael, and Volker Markl.
Graph processing on GPUs: A survey.

[TPDS] Yabuta, Makoto, Anh Nguyen, Shinpei Kato, Masato Edahiro, and Hideyuki Kawashima.
Relational joins on gpus: A closer look, TPDS 2017.

[LNCS] Breß, Sebastian, Max Heimel, Norbert Siegmund, Ladjel Bellatreche, and Gunter Saake.
Gpu-accelerated database systems: Survey and open challenges, LNCS 2014.

[SIGMOD] Breß, Sebastian, Henning Funke, and Jens Teubner.
Robust query processing in co-processor-accelerated databases, SIGMOD 2016.

[LNBIP] Saecker, Michael, and Volker Markl.
Big data analytics on modern hardware architectures: A technology survey, LNBIP 2012.

[Thesis] Karpathiotakis, Manolis.
Just-in-time Analytics Over Heterogeneous Data and Hardware, EPFL 2017.

[CIDR] Appuswamy, Raja, Manos Karpathiotakis, Danica Porobic, and Anastasia Ailamaki.
The case for heterogeneous HTAP, CIDR 2017.

[TKDE] F. Zhang and Jidong Zhai and Bo Wu and Bingsheng He and W. Chen and X. Du.
Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures, TKDE 2021.

[IEEE Access] Xue-Xuan Hu and Jianqing Xi and De-You Tang.
Optimization for Multi-Join Queries on the GPU, IEEE Access 2020.

[SIGMOD] C. Lutz and S. Bress and Steffen Zeuch and T. Rabl and V. Markl.
Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects, SIGMOD 2020.

[VLDB] Johns Paul and Bingsheng He and Shengliang Lu and C. Lau.
Improving execution efficiency of just-in-time compilation based query processing on GPUs, VLDB 2020.

[SIGMOD] Linwei Li and K. Zhang and Jiading Guo and W. He and Zhenying He and Yinan Jing and Weili Han and X. Wang.
IBinDex: A Two-Layered Index for Fast and Robust Scans, SIGMOD 2020.

System Repeatability and Academic Impacts

The system is used in the evaluation of the following papers:

[VLDB] Kim, Changkyu, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey.
Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs, VLDB 2009.

[Big Data] Rui, Ran, Hao Li, and Yi-Cheng Tu.
Join algorithms on GPUs: A revisit after seven years, Big Data 2015.

Diamos, Gregory Frederick, Haicheng Wu, Ashwin Lele, and Jin Wang.
Efficient relational algebra algorithms and data structures for gpu.

Educational Adoptions

[Book] Wang, John. IGI Global.
Encyclopedia of Business Analytics and Optimization, 28 Feb 2014.

[Book] Kacprzyk, Janusz, Springer.
Advances in Intelligent Systems and Computing. Series Ed, ,12 Sep 2015.

[Course] Max-Planck-Institut für Informatik.
Techniques for Non-Traditional Data Management

[Course] Virginia Tech.
CSX984, Accelerator-Based Parallel Computing

[Course] University of Maine.
Computer Science Capstone

[Course] Indian Institutes of Technology.
Advanced DBMS

[Course] Duke University.
Database and Programming Languages: Crossing the Chasm

[This course uses my slides in the lecture]

[Course] University of Maine.
COS 497: Computer Science Capstone 2, by Sudarshan S. Chawathe

[Course] Aalborg University.
Ph.D. course: Database Management on Modern Hardware, , May 11-12, 2009, by Prof. Anastasia Ailamaki

[Course] University of Waterloo.
CS 848 (Spring 2015) Advanced Topics in Databases: Database Systems on Modern Hardware

[Course] Otto von Guericke University Magdeburg.
Seminar on Modern Software Engineering and Database Concepts

[Course] TU Dortmund University.
Advanced lecture, Data Processing on Modern Hardware, Summer 2014, Prof. Dr. Jens Teubner

Media Coverage

[News] SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes. Arnon Shimoni, Product manager of SQream DB, 17 Oct 2018. (screenshot)