Kernelet(2014-2016)
Publications
- Jianlong Zhong, Bingsheng He. Kernelet: High-Throughput GPU Kernel Execution with Dynamic Slicing and Scheduling. IEEE Transactions on Parallel and Distributed System, vol.25, no.6, pp.1522-1532, June 2014.
- Mochi Xue, Kun Tian, Yaozu Dong, Jiajun Wang and Zhengwei Qi, Bingsheng He, Haibing Guan. gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space. USENIX Annual Technical Conference (ATC) 2016.
Abstract
Graphics processors, or GPUs, have recently been widely used as accelerators in shared environments such as
clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is
an important metric for performance and total ownership cost. Despite recently improved runtime support for concurrent GPU
kernel executions, the GPU can be severely underutilized, resulting in suboptimal throughput. In this paper, we propose Kernelet,
a runtime system to improve the throughput of concurrent kernel executions on the GPU. Kernelet embraces transparent
memory management and PCI-e data transfer techniques, and dynamic slicing and scheduling techniques for kernel executions.
With slicing, Kernelet divides a GPU kernel into multiple sub-kernels (namely slices). Each slice has tunable occupancy to allow
co-scheduling with other slices for high GPU utilization. We develop a novel Markov chain-based performance model to guide
the scheduling decision. Our experimental results demonstrate up to 31 percent and 23 percent performance improvement on
NVIDIA Tesla C2050 and GTX680 GPUs, respectively.
Highlights
Our research has significant practical impacts in GPU virtualizations (nowadays an important infrastructure component in cloud computing with GPUs)
In the following, we present more details on the "impact factors" ofkthis project (see definition of "impact factors").
Citations
Relevance to Industry and Open-Source Community
This system has inspired other open-source systems and industry systems.
- gScale has been integrated into Intel's Open GPU virtualization platform. Screenshot on Intel, Linux Kernel
- [TPDS] Zhang, Haitao, Xin Geng, and Huadong Ma
Learning-driven Interference-aware Workload Parallelization for Streaming Applications in Heterogeneous Cluster, TPDS 2020.
- [TACO] Wu, Hao, Weizhi Liu, Huanxin Lin, and Cho-Li Wang
A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs, TACO 2020.
- [TC] Li, Zhifang, Beicheng Peng, and Chuliang Weng
XeFlow: Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU Platform, TC 2020.
- [TC] Houssam-Eddine, Zahaf, Nicola Capodieci, Roberto Cavicchioli, Giuseppe Lipari, and Marko Bertogna
The HPC-DAG Task Model for Heterogeneous Real-Time Systems, TC 2020.
- [DAC] Kim, Jiho, John Kim, and Yongjun Park
Navigator: dynamic multi-kernel scheduling to improve GPU performance, DAC 2020.
- [The Journal of Supercomputing] Mohamad Beheshti Roui, S. Kazem Shekofteh, Hamid Noori
Efficient scheduling of streams on GPGPUs, The Journal of Supercomputing 2020.
- [TC] Peng, Bo, Jianguo Yao, Yaozu Dong, and Haibing Guan
MDev-NVMe: Mediated Pass-Through NVMe Virtualization Solution with Adaptive Polling, TC 2020.
- [TPDS] Shekofteh, Seyed Kazem, Hamid Noori, Mahmoud Naghibzadeh, Holger Froening, and Hadi Sadoghi Yazdi.
cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs, TPDS 2020.
- [CGO] Jiao, Qing, Mian Lu, Huynh Phung Huynh, and Tulika Mitra.
Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS, CGO 2015.
- [RTAS] Otterness, Nathan, Ming Yang, Sarah Rust, Eunbyung Park, James H. Anderson, F. Donelson Smith, Alex Berg, and Shige Wang.
An Evaluation of the NVIDIA TX 1 for Supporting Real-time ComputerVision Workloads, RTAS 2017.
- [ICESS] Li, Junke, Bing Guo, Yan Shen, Deguang Li, and Yanhui Huang.
Low-Energy Kernel Scheduling Approach for Energy Saving, ICESS 2016.
- [ISCA] Xu, Qiumin, Hyeran Jeon, Keunsoo Kim, Won Woo Ro, and Murali Annavaram.
Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming, ISCA 2016.
- [Scientific Programming] Park, Younghun, Minwoo Gu, and Sungyong Park.
Ballooning Graphics Memory Space in Full GPU Virtualization Environments, Scientific Programming 2019.
System Repeatability and Academic Impacts
The system is used in the evaluation of the following papers:
Educational Adoptions
Media Coverage
Back to Bingsheng's Influential Works © 2020