Over the last few years there has been a renewed interest in the area of (big-data) systems on emerging hardware. The opportunities and challenges from emerging big-data processing systems have been raised different scales, from a single machine to thousands of machines. The need for effectively utilizing computing resources creates new technologies and research directions: from conventional ones (e.g., cluster computing, in-memory computing), to more recent ones (e.g., GPGPU, many-core processors, and NVRAM). This module will introduce students to the architecture, performance optimization and design of big data systems on various emerging high performance computing hardware including many-core processors and accelerators.
This module will introduce graduate students to this consequential topic: it aims to (i) enable students to identify fundamental research issues in system design and architecture development and (ii) equip them with core algorithmic/computational methodology to embrace performance optimizations and hardware accelerations in big data systems. More generally, we hope the module will encourage students to reason more broadly about their own research ideas/topics with the horizon of hardware-software co-design.
This is a research-oriented course on big data systems, which will cover both hardware and software aspects. Students will read and present research papers, participate in class discussions, and complete a semester-long research project. Class time will consist of lectures, student presentations, and group project presentation. Familiarity with database systems and computer architectures, and programming with C/C++ will be assumed.
This is a research-oriented course on big data systems, which will cover both hardware and software aspects. Familiarity with database systems and computer architectures, and programming with C/C++ will be assumed.
Grading
The workloads consist of weekly paper
reviews, paper presentations, class participation, and
a research project. The grading breakdown is as
follows.
Grading Breakdown |
Class Participation | 5% |
Paper Reviews | 25% |
Paper Presentations | 15% |
Research Project | 55% |
Student Submissions
Please submit your assignments to
CS6284 in LumiNUS.
Lectures
For each lecture we will study 2 research papers under "Required Reading".
Paper Presentation Guidelines: Each paper presentation should consist of the following components.
1) the presentation about the paper (~30 minutes). Beyond the paper, you should think about including more relevant content on background and related work so that others can fully understand your presentation.
2) summarizing the paper reviews including yours and other students (5 minutes).
3) several questions for discussions with other students (5-10 minutes).
The speaker will
be expected to lead the discussion on the paper, and keep the time within 35 minutes per paper at most.
Paper presentation task allocation will be performed when the class enrollment is fixed (around Week 2).
Paper Reviews
Prior to each lecture, you are
expected to read the papers under "Required Reading" in the schedule
for that lecture.
The length of the review should be 3-4 paragraphs. The review should follow this format:
1) Paper summary: the problem the paper is trying to
solve, why it is important, the main ideas proposed, and the results
obtained.
2) Pros: the strengths of the paper (listed as S1, S2, S3, ...)
3) Cons: the weaknesses of the paper (listed as W1, W2, W3, ...)
4) Discussion: any ideas that you think for improving the paper, any new problem/direction that it inspires you, any point that you learnt and feel useful for your own research, how the paper is related to the state-of-the-art, and any questions that you are going to ask etc.
You are required to submit paper reviews
for any FIVE papers that will be presented in the lecture (except the ones that you present). You can submit up to 8 reviews, and we mark all of them. Your marks will be calculated from the Top 5.
Paper review task allocation will be performed when the class enrollment is fixed (around Week 2).
Paper Review Submission Guidelines: Please submit a pdf file for each paper review with the file name in this format: [student matric no.]-[week]-[paper title].pdf (say, A000111222-Week 1-Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware.pdf).
The paper reviews will be
submitted
on the student submission folder
LumiNUS. The paper reviews will be made visible
after each submission deadline, and you are encouraged to read other
reviews to improve your understanding and to prepare for the class discussion.
Submission deadlines: 11:59pm Sunday before the paper will be presented.
Research Project
A large part of the work in this course is
in proposing and completing an open-ended project with research challenges.
The project will be done in groups of 2-3 students.
More details can be found in the project guideline in LumiNUS.
Q&A Forums
We will be
using the forums in
LumiNUS for idea exchange and Q&A.
Computing Resources
For GPU/FPGA, one option is to use cloud resources such as Amazon EC2 and AliCloud; the other option is to use resources from SoC.
Talk to me if you need the help.
Useful Resources
Computer Architecture: A Quantitative Approach by John Hennessy, David Patterson
Programming Massively Parallel Processors: A Hands-on Approach by David B. Kirk and Wen-mei W. Hwu
A list of papers related to big data systems on new hardware
Acknowledgement
I adapted the design on web and module from Dr. Julian Shun's "6.886 Algorithm Engineering Spring 2019".
The course materials (such as reading list) are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders (e.g., ACM and IEEE).