ThunderML: Machine Learning Systems on Heterogeneous Architectures (2016-present)
Publications
- Zeyi Wen, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. ThunderSVM: A Fast SVM Library on GPUs and CPUs. Journal of Machine Learning Research (JMLR) 19 (2018) 1-5.
- Zeyi Wen, Bingsheng He, Ramamohanarao Kotagiri, Shengliang Lu, Jiashuai Shi. Efficient Gradient Boosted Decision Tree Training on GPUs. IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2018.
- Zeyi Wen, Hanfeng Liu, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. ThunderGBM: Fast GBDTs and Random Forests on GPUs. Journal of Machine Learning Research (JMLR) 21 (2020) 1-5.
Abstract
ThunderSVM
Support Vector Machines (SVMs) are classic supervised learning models for classification, regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that 26% of the data mining and machine learning practitioners are users of SVMs. However,
SVM training and prediction are very expensive computationally for large and complex problems. This paper presents an efficient and open source SVM software toolkit called ThunderSVM which exploits the high-performance of Graphics Processing Units (GPUs)
and multi-core CPUs. ThunderSVM supports all the functionalities—including classification (SVC), regression (SVR) and one-class SVMs—of LibSVM and uses identical command line options, such that existing LibSVM users can easily apply our toolkit. ThunderSVM
can be used through multiple language interfaces including C/C++, Python, R and MATLAB. Our experimental results show that ThunderSVM is generally an order of magnitude faster than LibSVM while producing identical SVMs. In addition to the high efficiency, we design our convex optimization solver in a general way such that SVC, SVR, and one-class SVMs share the same solver for the ease of maintenance. Documentation, examples, and more about ThunderSVM are available at https://github.com/zeyiwen/thundersvm.
ThunderGBM
Gradient Boosting Decision Trees (GBDTs) and Random Forests (RFs) have been used in many real-world applications. They are often a standard recipe for building state-of-the-art solutions to machine learning and data mining problems. However, training and prediction are very expensive computationally for large and high dimensional problems. This article presents an efficient and open source software toolkit called ThunderGBM which exploits the high-performance Graphics Procesing Units (GPUs) for GBDTs and RFs. ThunderGBM supports classification, regression and ranking, and can run on single or multiple GPUs of a machine. Our experimental results show that ThunderGBM outperforms the existing libraries while producing similar models, and can handle high dimensional problems where existing GPU-based libraries fail. Documentation, examples, and more details about ThunderGBM are available at https://github.com/xtra-computing/thundergbm.
Highlights
The open-source systems ThunderSVM and ThunderGBM have attracted wide attractions in open-source community interests and real application adoptions.
- Github: ThunderSVM and ThunderGBM have got over 1,200 stars and 500 stars respectively since the projects are released in less than three years. The number of unique visitors monthly are 800 and 300, respectively (see below screenshots).
- Media coverage: ThunderSVM has been highlighted in headline of popular open-source websites including Hacker News and Packt DataHub. ThunderGBM has been highlighted in Reddit, jiqizhixin with over 5000 repostings.
In the following, we present more details on the "impact factors" of this project (see definition of "impact factors").
Citations
- According to GoogleScholar, the papers received 100 citations in less than three years. Specifically, ThunderSVM
received 81 citations, and ThunderGBM received 19 citations.
Relevance to Industry and Open-Source Community
Our systems have attracted the interests from NVIDIA and we are in the process of collaboration with NVIDIA RAPIDS.
System Repeatability and Academic Impacts
The system is used in the evaluation of the following papers:
- [NeurIPS] Chen, Nicholas FY and Du, Zhiyuan and Ng, Khin Hua.
Scene Graphs for Interpretable Video Anomaly Classification, NeurIPS18 2018.
- [arXiv] Prabhu, Ameya and Dognin, Charles and Singh, Maneesh.
Sampling Bias in Deep Active Classification: An Empirical Study, arXiv 2019.
- [Sensors] Choi, Eunjeong and Chae, Somi and Kim, Jeongtae.
Machine Learning-Based Fast Banknote Serial Number Recognition Using Knowledge Distillation and Bayesian Optimization, Sensors 2019.
- [arXiv] Foleis, Juliano Henrique and Tavares, Tiago Fernandes.
Texture Selection for Automatic Music Genre Classification, arXiv 2019.
- [Diss] Zhao, Lingjun.
Classification for Device-free Localization based on Deep Neural Networks, The University of Aizu 2019.
- [JIOT] Zhao, Lingjun and Huang, Huakun and Li, Xiang and Ding, Shuxue and Zhao, Haoli and Han, Zhaoyang.
An accurate and robust approach of device-free localization with convolutional autoencoder, JIOT 2019.
- [ICRA] Thakar, Shantanu and Rajendran, Pradeep and Annem, Vivek and Kabir, Ariyan and Gupta, Satyandra.
Accounting for part pose estimation uncertainties during trajectory generation for part pick-up using mobile manipulators, ICRA 2019.
- [SIGSPATIAL] Xiu, Haoyi and Vinayaraj, Poliyapram and Kim, Kyoung-Sook and Nakamura, Ryosuke and Yan, Wanglin.
3D semantic segmentation for high-resolution aerial survey derived point clouds using deep learning, SIGSPATIAL 2018.
- [ICSEC] Khuphiran, Panida and Leelaprute, Pattara and Uthayopas, Putchong and Ichikawa, Kohei and Watanakeesuntorn, Wassapon.
Performance Comparison of Machine Learning Models for DDoS Attacks Detection, ICSEC 2018.
- [IEEE Communications Magazine] Chen, Shuangwu and Chen, Xiang and Yao, Zhen and Yang, Jian and Li, Yangyang and Wu, Feng.
Evolving Switch Architecture toward Accommodating In-Network Intelligence, IEEE Communications Magazine 2020.
- [arXiv] Ma, Siyuan and Belkin, Mikhail.
Kernel machines that adapt to GPUs for effective large batch training, arXiv 2018.
- [Genetic Programming and Evolvable Machines] Langdon, William B and Lam, Brian Yee Hong and Modat, Marc and Petke, Justyna and Harman, Mark.
Genetic improvement of GPU software, Genetic Programming and Evolvable Machines 2017.
The source code of ThunderSVM is used as a benchmark.
-
[MICRO] Hu, Yu-Ching and Lokhandwala, Murtuza Taher and I, Te and Tseng, Hung-Wei.
Dynamic Multi-Resolution Data Storage, MICRO 2019.
The source code of ThunderSVM is used as a benchmark.
- [BigData] Chen, Huaming and Wang, Lei and Jin, Yaochu and Chi, Chi-Hung and Li, Fucun and Chu, Huaiyuan and Shen, Jun.
Hyperparameter Estimation in SVM with GPU Acceleration for Prediction of Protein-Protein Interactions, IEEE International Conference on Big Data 2018.
- [IEEE Internet of Things Journal] Zhao, Lingjun and Huang, Huakun and Su, Chunhua and Ding, Shuxue and Huang, Huawei and Tan, Zhiyuan and Li, Zhenni.
Block-Sparse Coding Based Machine Learning Approach for Dependable Device-Free Localization in IoT Environment, IEEE Internet of Things Journal 2020.
- [IEEE Transactions on Industrial Informatics] Hassan, Mohammad and Huda, Shamsul and Sharmeen, Shaila and Abawajy, Jemal and Fortino, Giancarlo.
An adaptive trust boundary protection for IIoT networks using deep-learning feature extraction based semi-supervised model, IEEE Transactions on Industrial Informatics 2020.
- [ISPASS] Moolchandani, Diksha and Gupta, Sudhanshu and Kumar, Anshul and Sarangi, Smruti R.
Performance Prediction for Multi-Application Concurrency on GPUs, ISPASS 2020.
- [ICS] Zhang, Shaoshuai and Shah, Ruchi and Wu, Panruo.
Tensorsvm: accelerating kernel machines with tensor engine, ICS 2020.
- [TACO] Liou, Jhe-Yu and Wang, Xiaodong and Forrest, Stephanie and Wu, Carole-Jean.
GEVO: GPU Code Optimization Using Evolutionary Computation, TACO 2020.
- [BMC bioinformatics] Liou, Jhe-Yu and Wang, Xiaodong and Forrest, Stephanie and Wu, Carole-Jean.
CRISPRpred (SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning, BMC bioinformatics 2020.
- [AICAS] Wang, Shu and Hu, Yuhuang and Burgu{\'e}s, Javier and Macro, Santiago and Liu, Shih-Chii.
Prediction of gas concentration using gated recurrent neural networks, AICAS 2020.
- [Briefings in Bioinformatics] Zhang, Zhao-Yue and Yang, Yu-He and Ding, Hui and Wang, Dong and Chen, Wei and Lin, Hao.
Design powerful predictor for mRNA subcellular location prediction in Homo sapiens, Briefings in Bioinformatics 2020.
- [Pattern Recognition] Cao, Yun-Hao and Wu, Jianxin and Wang, Hanchen and Lasenby, Joan.
Neural Random Subspace, Pattern Recognition 2020.
Educational Adoptions
- [Book] Tayyaba Azim, Sarah Ahmed, Springer.
Composing Fisher Kernels from Deep Neural Models: A Practitioner's Approach,
23 Aug 2018.
- [Book] Editor IJSMI, International Journal of Statistics and Medical Informatics.
Python programming for Data Scientists,
15 Nov 2019.
- [Book] Academic Press.
Energy Efficiency in Data Centers and Clouds,
28 Jan 2016.
- [Course] National University of Singapore.
CS6284 Topics in Computer Science: Big Data Meets New Hardware
- [Course] Eidgenössische Technische Hochschule Zürich.
Cache-Concious and Cache-Oblivious Database Algorithms, Spring 2017, Instructor: Michael Böhlen and Przemyslaw Uznanski
- [Tutorial] Gonz{\'a}lez, Sergio and Garc{\'\i}a, Salvador and Del Ser, Javier and Rokach, Lior and Herrera, Francisco.
A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities, Information Fusion 2020
Media Coverage
- [News] ThunderGBM: Fast GBDTs and Random Forests on GPUs, 04 Mar 2019.
(r/MachineLearning with 1.7m members on Reddit)
Screenshot on Reddit
- [News] ThunderSVM: A Fast SVM Library on GPUs and CPUs, 29 Dec 2017.
(r/MachineLearning with 1.7m members on Reddit)
Screenshot on Reddit
- [News] Hacker News, ThunderSVM: A Fast SVM Library on GPUs and CPUs, 29 Dec 2017.
Screenshot on Hacker News @newsycombinator (203K Followers on Twitter)
- [News] Headlines in Packthub, ThunderSVM: A Fast SVM Library on GPUs and CPUs, 29 Dec 2017.
Screenshot on Packthub @PacktPub (28.6K Followers on Twitter)
- [News] Archlinux, 05 Oct 2018.
Screenshot on Arch Linux
- [News] Linear SVM kernel, 21 Mar 2019.
Screenshot on Kaggle
- [News] ThunderGBM:快成一道闪电的梯度提升
决策树, 03 Jun 2019. This post has attracted thousands of repostings and views.
Screenshot on 机器之心,Screenshot on Baidu, Google
- [News] 机器学习人工学2017/12/31, 3
1 Dec 2017.
Screenshot on Tencent
- [Blog] GPU (双1080tiSVM) 使用Thundersvm, 01 Jan 2019.
Screenshot on CSDN
- [News] ThunderSVM and ThunderGBM PyPI, 13 Mar 2020/2 May 2020.
Screenshot (ThunderSVM) Screenshot (ThunderGBM)
- [Blog] Introduction to ThunderSVM: A Fast SVM Library on GPUs and CPUs, 24 Aug 2020.
Screenshot on Medium
- [Blog] How to Install and Run ThunderSVM in Google Colab, Aug 2020.
Screenshot on Reddit
- [News] Speeding Up SVM by 120X and more!, Sep 2020.
(r/MachineLearning with 1.7m members on Reddit)
Screenshot on Reddit
- [News] Most Useful C/C++ ML Libraries Every Data Scientist Should Know, 23 Sep 2020.
Screenshot on Predict the Future
- [Blog] A non-exhaustive list of SVM solvers, 02 Oct 2020.
Screenshot
- [News] Associate Professor He Bingsheng wins IEEE TPDS 2019 Best Paper award, NUS News, 09 Dec 2020.
Screenshot on NUS News
Back to Bingsheng's Influential Works © 2021