Transistor
count continues to increase for silicon devices following Moore’s Law. But the
failure of Dennard scaling has brought the computing community to a crossroad
where power has become the major limiting factor. Thus future chips can have
many cores; but only a fraction of them can be switched on at any point in
time. This dark silicon era, where significant fraction of the chip real estate
remains dark, has necessitated a fundamental rethinking in architectural
designs. In this context, heterogeneous multi-core architectures combining
functionality and performance-wise divergent mix of processing cores (CPU, GPU,
special-purpose accelerators, and reconfigurable computing) offer a promising
option. Heterogeneous multi-cores can potentially provide energy-efficient
computation as only the cores most suitable for the current computation need to
be switched on.
However,
a complex heterogeneous multi-core presents daunting challenges from
programming point of view. At present, each computing element (CPU, GPU, reconfigurable fabric) follows its own programming model and
these divergent programming models is the biggest obstacle to the wider
long-term acceptance of heterogeneous multi-core systems. The emergence of open
parallel programming standards for heterogeneous computing systems such as
OpenCL is an excellent development in the right direction. OpenCL programs are
portable across CPU, GPU, and FPGAs. There have been some preliminary works in
compilation and runtime support for OpenCL to different kinds of computing
cores. But unified software support for heterogeneous multi-cores remain
largely unexplored.
The
goal of this project is transparent partitioning, mapping, and execution of a
complete application on a heterogeneous multi-core utilizing all its resources
starting from a single high-level specification such as OpenCL. This project
takes two pronged approach towards improving the software support for
heterogeneous multi-core architectures. First, we will automate the application
partitioning and mapping process. Second, we will orchestrate the execution of
the different cores so as to provide best performance under tight power budget.
We believe the automated partitioning/mapping of applications on heterogeneous
multi-core along with runtime support for power management will significantly
improve the programmability and lead to greater acceptance of such platforms.
Relevant
Preliminary Publications:
[DAC] Lin-Analyzer: A High-level
Performance Analysis Tool for FPGA-based Accelerators
Guanwen Zhong, Alok Prakash,Yun Liang, Tulika Mitra, Smail Niar
53rd ACM/IEEE Design
Automation Confernece,
June 2016
[ICCD] Energy-Efficient
Execution of Data-Parallel Applications on Heterogeneous Mobile Platforms
Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, Tulika Mitra
33rd IEEE
International Conference on Computer Design, October 2015