Mars: Accelerating MapReduce on Graphics Processors (2007-2010)

Publications

Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, Tuyong Wang. Mars: A MapReduce Framework on Graphics Processors. PACT08: IEEE International Conference on Parallel Architecture, Compilation Techniques 2008.

Wenbin Fang, Bingsheng He, Qiong Luo, Naga K. Govindaraju. Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed System (TPDS) Volume 22, Number 4, April 2011, pp. 608-620.

Abstract

We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of development of web search applications on a large number of commodity CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth, but are harder to program since their architectures are designed as a special-purpose co-processor and their programming interfaces are typically for graphics applications. As the first attempt to harness GPU's power for MapReduce, we developed Mars on an NVIDIA G80 GPU, which contains over one hundred processors, and evaluated it in comparison with Phoenix, the state-of-the-art MapReduce framework on multi-core CPUs. Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface. It is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine.

Highlights

Our system Mars has inspired the usage of heterogeneous architectures in big data systems such as Hadoop and Spark (see the Spark-GPU project).

In the following, we present more details on the "impact factors" of this project (see definition of "impact factors").

Citations

According to GoogleScholar, the two publications have received over 1,200 citations in total since 2008!

According to GoogleScholar, the PACT paper has received over 1,000+ citations since 2008.

Our Mars paper is the 2nd mostly cited among all the papers published in ACM PACT (since 1993), according to ACM Digital Library. (screenshot)

Example quotes on citations.

Relevance to Industry and Open-Source Community

This system has inspired other open-source systems and industry systems.

[CLUSTER] Teodoro, George, Rafael Sachetto, Olcay Sertel, Metin N. Gurcan, Wagner Meira, Umit Catalyurek, and Renato Ferreira.
Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications, CLUSTER 2009.

[LNCS] Tran, Ha-Nguyen, Jung-jae Kim, and Bingsheng He.
Fast Subgraph Matching on Large Graphs using Graphics Processors, LNCS 2015.

[IISWC] Che, Shuai, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron.
Rodinia: A Benchmark Suite for Heterogeneous Computing, IISWC 2019.

[Journal of Parallel and Distributed Computing] Rafique, M. Mustafa, Ali R. Butt, and Dimitrios S. Nikolopoulos.
A capabilities-aware framework for using computational accelerators in data-intensive computing, Journal of Parallel and Distributed Computing 2011.

[CCGrid] Shirahata, Koichi, Hitoshi Sato, Toyotaro Suzumura, and Satoshi Matsuoka.
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-scale Heterogeneous Supercomputers, CCGrid 2013.

[SIGPLAN Notices] Jog, Adwait, Onur Kayiran, Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das.
OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance, SIGPLAN Notices 2013.

[PACT] Kayıran, Onur, Adwait Jog, Mahmut T. Kandemir, and Chita R. Das.
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs, PACT 2013.

[ISCA] Jog, Adwait, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das.
Orchestrated Scheduling and Prefetching for GPGPUs, ISCA 2013.

[MICRO] Chen, Xuhao, Li-Wen Chang, Christopher I. Rodrigues, Jie Lv, Zhiying Wang, and Wen-Mei Hwu.
Adaptive Cache Management for Energy-efficient GPU Computing, MICRO 2014.

[MICRO] Kayiran, Onur, Nachiappan Chidambaram Nachiappan, Adwait Jog, Rachata Ausavarungnirun, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das.
Managing GPU Concurrency in Heterogeneous Architectures, MICRO 2014.

[MICRO] Rhu, Minsoo, Michael Sullivan, Jingwen Leng, and Mattan Erez.
A Locality-Aware Memory Hierarchy for Energy-Efficient GPU Architectures, MICRO 2013.

[HPCA] Xie, Xiaolong, Yun Liang, Yu Wang, Guangyu Sun, and Tao Wang.
Coordinated Static and Dynamic Cache Bypassing for GPUs, HPCA 2015.

[SIGARCH] Vijaykumar, Nandita, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu.
A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps, SIGARCH 2015.

[PACT] Pattnaik, Ashutosh, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das.
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities, PACT 2016.

[HPCA] Li, Dong, Minsoo Rhu, Daniel R. Johnson, Mike O'Connor, Mattan Erez, Doug Burger, Donald S. Fussell, and Stephen W. Redder.
Priority-Based Cache Allocation in Throughput Processors, HPCA 2015.

[SC] Chatterjee, Niladrish, Mike O'Connor, Gabriel H. Loh, Nuwan Jayasena, and Rajeev Balasubramonia.
Managing DRAM Latency Divergence in Irregular GPGPU Applications, SC 2014.

[IISWC] Xu, Qiumin, Hyeran Jeon, and Murali Annavaram.
Graph Processing on GPUs: Where are the Bottlenecks?, IISWC 2014.

[PACT] Ausavarungnirun, Rachata, Saugata Ghose, Onur Kayiran, Gabriel H. Loh, Chita R. Das, Mahmut T. Kandemir, and Onur Mutlu.
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance, PACT 2015.

[SC] Li, Ang, Gert-Jan van den Braak, Akash Kumar, and Henk Corporaal.
Adaptive and Transparent Cache Bypassing for GPUs, SC 2015.

[HPCA] Pekhimenko, Gennady, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, and Stephen W. Keckler.
A Case for Toggle-Aware Compression for GPU Systems, HPCA 2016.

[TACO] Yazdanbakhsh, Amir, Gennady Pekhimenko, Bradley Thwaites, Hadi Esmaeilzadeh, Onur Mutlu, and Todd C.
RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads, TACO 2016.

[DAC] Jang, Hyunjun, Jinchun Kim, Paul Gratz, Ki Hwan Yum, and Eun Jung Kim.
Bandwidth-Efficient On-Chip Interconnect Designs for GPGPUs, DAC 2015.

[PACT] Kayiran, Onur, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das.
µC-States: Fine-grained GPU Datapath Power Management, PACT 2016.

[ICPS] Khairy, Mahmoud, Mohamed Zahran, and Amr G. Wassal.
Efﬁcient Utilization of GPGPU Cache Hierarchy, ICPS 2015.

[ISCA] Koo, Gunjae, Yunho Oh, Won Woo Ro, and Murali Annavaram.
Access Pattern-Aware Cache Management for Improving DataUtilization in GPU, ISCA 2017.

[Computer Architecture Letters] Zheng, Zhong, Zhiying Wang, and Mikko Lipasti.
Adaptive Cache and Concurrency Allocation on GPGPUs, Computer Architecture Letters 2014.

[ICS] Wang, Bin, Weikuan Yu, Xian-He Sun, and Xinning Wang.
DaCache: Memory Divergence-Aware GPU CacheManagement, ICS 2015.

[CLUSTER] Elteir, Marwa, Heshan Lin, and Wu-chun Feng.
Performance Characterization and Optimization of Atomic Operations on AMD GPUs, CLUSTER 2014.

[MES] Chen, Xuhao, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Zhiying Wang, and Wen-Mei W. Hwu.
Adaptive Cache Bypass and Insertion for Many-coreAccelerators, MES 2014.

[HPCA] Zhang, Jie, Myoungsoo Jung, and Mahmut Kandemir.
FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads, HPCA 2019.

[TCC] Wang, Jihe, Meikang Qiu, Bing Guo, and Ziliang Zong.
Phase–Reconfigurable Shuffle Optimization for Hadoop MapReduce, TCC 2015.

[PACT] Wang, Bin, Yue Zhu, and Weikuan Yu.
OAWS: Memory Occlusion Aware Warp Scheduling, PACT 2016.

[ICCD] Lee, Shin-Ying, and Carole-Jean Wu.
Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs, ICCD 2016.

[ICCD] Zhao, Xia, Sheng Ma, Chen Li, Lieven Eeckhout, and Zhiying Wang.
A Heterogeneous Low-Cost and Low-Latency Ring-Chain Network for GPGPUs, ICCD 2016.

[ICDE] Li Zeng and L. Zou and M. Ozsu and L. Hu and Fan Zhang.
GSI: GPU-friendly Subgraph Isomorphism, ICDE 2020.

System Repeatability and Academic Impacts

The system is used in the evaluation of the following papers:

[IPDPS] Stuart, Jeff A., and John D. Owens. "Multi-GPU MapReduce on GPU clusters. Owens.
Multi-GPU MapReduce on GPU Clusters, IPDPS 2011.

[PACT] Hong, Chuntao, Dehao Chen, Wenguang Chen, Weimin Zheng, and Haibo Lin. Owens.
MapCG: Writing Parallel Program Portable between CPU and GPU, PACT 2010.

[IPDPS] Ji, Feng, and Xiaosong Ma.
Using Shared Memory to Accelerate MapReduce on Graphics Processing Units, IPDPS 2011.

[ICPADS] Elteir, Marwa, Heshan Lin, Wu-chun Feng, and Tom Scogland.
StreamMR: An Optimized MapReduce Framework for AMD GPUs, ICPADS 2011.

[Journal of Parallel and Distributed Computing] Basaran, Can, and Kyoung-Don Kang.
Grex: An Efficient MapReduce Framework for Graphics Processing Units, Journal of Parallel and Distributed Computing 2013.

[SC] El-Helw, Ismail, Rutger Hofman, and Henri E. Bal.
Scaling MapReduce Vertically and Horizontally, SC 2014.

[PACT] Jung, Wookeun, Jongsoo Park, and Jaejin Lee.
Versatile and Scalable Parallel Histogram Construction, PACT 2014.

[HPCA] Y.-W. Cui and Shakthi Prabhakar and Hui Zhao and Saraju P. Mohanty and Juan Fang.
A Low-Cost Conflict-Free NoC Architecture for Heterogeneous Multicore Systems, HPCA 2020.

[ICS] Xianwei Cheng and H. Zhao and M. Kandemir and Beilei Jiang and Gayatri Mehta.
AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling, ICS 2020.

[MICRO] Lu Wang and Magnus Jahre and Almutaz Adileh and L. Eeckhout.
MDM: The GPU Memory Divergence Model, MICRO 2020.

[CCGRID] Bin Nie and A. Jog and E. Smirni.
Characterizing Accuracy-Aware Resilience of GPGPU Applications, CCGRID 2020.

[ASPLOS] X. Zhao and Magnus Jahre and L. Eeckhout.
HSM: A Hybrid Slowdown Model for Multitasking GPUs, ASPLOS 2020.

Educational Adoptions

[Book] Kai Hwang, Jack Dongarra, Geoffrey C. Fox, Morgan Kaufmann.
Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, 18 Dec 2013.

[Book] Shui Yu, Song Guo, Springer.
Data Concepts, Theories, and Applications, 3 Mar 2016.

[Book] Xiaolin Li, Judy Qiu. Springer.
Cloud Computing for Data-Intensive Applications, 2 Dec 2014.

[Book] Khosrow-Pour, Mehdi. IGI Global.
Encyclopedia of Information Science and Technology, Third Edition, 31 Jul 2014.

[Book] Jimmy Lin, Chris Dyer. Morgan & Claypool Publishers.
Data-Intensive Text Processing with MapReduce, 10 Oct 2010.

[Book] Wang, John. IGI Global.
Encyclopedia of Business Analytics and Optimization, 28 Feb 2014.

[Book] Loo, Alfred Waising. IGI Global.
Distributed Computing Innovations for Business, Engineering, and Science, 30 Nov 2012.

[Book] Kuan-Ching Li, Qing Li, Timothy K. Shih. CRC Press.
Cloud Computing and Digital Media: Fundamentals, Techniques, and Applications, 9 Jan 2015.

[Book] Rajkumar Buyya, James Broberg, Andrzej M. Goscinski. John Wiley & Sons.
Cloud Computing: Principles and Paradigms, 17 Dec 2010.

[Book] Keesook J. Han, Baek-Young Choi, Sejun Song. Springer Science & Business Media.
High Performance Cloud Auditing and Applications, 24 Oct 2013.

[Book] Aris Gkoulalas-Divanis, Abderrahim Labbi. Springer Science & Business Media.
Large-Scale Data Analytics, 8 Jan 2014.

[Book] Marcello Trovati, Richard Hill, Ashiq Anjum, Shao Ying Zhu, Lu Liu, Springer.
Big-Data Analytics and Cloud Computing: Theory, Algorithms and Applications, 12 Jan 2016.

[Course] University of Wisconsin at Madison.
CS 757

[Course] Virginia Tech.
CS 4984, CS 5984, CS 5204

[Course] University of Victoria.
SENG 474

[Course] Cairo University.
Application Acceleration Using the Massive Parallel Processing Power of GPUs

[Course] Duke University.
CPS516 Data-intensive Computing Systems Project 2

[Course] North Carolina State University.
CSC/ECE 506 Spring 2012

Media Coverage

[News] Hadoop + GPU: Boost performance of your big data project by 50x-200x?, 24 Jun 2013. (screenshot)

[News] MapReduce, 2020. (screenshot)

[News] Can Hadoop + GPU Boost the Performance of Your Big Data Project?,24 Jun 2013. (screenshot)

[News] Making Hadoop faster with GPU, 8 Aug 2013. (screenshot)

[News] GPUͨ�ü��б��, 04 Mar 2013. (screenshot)

[News] CUDA��̡��Mars��MapReduce on GPU, 19 Jan 2016. (screenshot)