School of Computing
Department of Computer Science
CS3223: Database Systems
Implementation
Semester 2, 2022/2023
[Announcements] [Instructor] [Course Objectives] [Lecture Schedule] [Reference Texts and Materials] [Assignments (Tutorials & Gradiance] [Project] [Assessment]
This project will focus primarily on query processing (and related topics, e.g., parser, indexing, etc) for a single user. [We will not consider disk/memory management, transaction management (concurrency control and logging) and failures (recovery management), and we will also not consider multi-user setting.] You will gain a feel for how query processing works in a real system. You will also hopefully see how different query execution trees have different performance results, which will provide some motivation for query optimization. (Note that the differences in your simple query processing system will probably be minor, because you will be using toy data sets and an execution system that ignores many of the complex aspects of a real system.)
This project will be a three-person group project, so you should form your team ASAP. Each group will do the followings structured in the form of weekly labs/assignments. Note that in the following assignments, detail guidelines have been provided for Lab 0 to Lab 4. From Lab 5 onwards, it is as stated (no guidelines given).
1. Week 1 Lab 0: Set up SimpleDB & create a student database
2. Week 2 Lab 1: Support for non-equality predicates (Due: 28 Jan 2022)
3. Week 3 Lab 2: Support for hash index and B+-tree index (Due: 7 Feb 2022)
4. Week 4 Lab 3: Support for order by clause and sorting (14 Feb 2022)
5. Week 5 Lab 4: Support for nested-loops join, sort-merge join and index-based join (21 Feb 2022)
6. Week 6 Lab 5: Support for partition-based (or hash) join, and aggregates (SUM, COUNT, AGV, MIN, MAX) with/without group by clause (using sort-based implementation of group by operator)
7. Recess week
8. Week 7 Lab 6: Support for DISTINCT and displaying the query plan
9. Week 8 Lab 7: Integrate all features into SimpleDB+
10. Week 9 Submit report/code and arrange for demo (Due: 18 Mar 2022)
11. Week 10 Demo week
Extensions (Bonus)
For those who want something more challenging, you can try the following extensions to SimpleDB (for those who do not do the following, you will not be penalized, but bonus marks will be given to those who do it, i.e., you can get above 100% of the project marks! though eventually the maximum mark for the project is 100% of the project marks). The list is non-exhaustive; if you have other thoughts, please check with Prof Tan first.
1. Implement another data type besides integer and string, e.g. float, double, time
2. Implement another query optimizer
3. Implement non-unique attribute names, i.e., support for (relation, attribute)-pair in queries (e.g., t.a and t.a1=s.a1)
4. Implement multi-way sort (and solve the k>2 bugs in the sort code for join)!
5. Implement set operations (e.g., union, intersection)
6. Implement error-checking for queries, i.e., validate that queries are correct, tables exist, attributes defined, etc.
Note that these operations may require you to make changes to the parser. Demonstrate with sample queries that your extensions work.
For your final report for SimpleDB+, present results (running times) of the following experiments on your student database.
Experiment
1
Run a 2-table join query. Record the time to perform the join under each join algorithm.
Experiment
2
Run a 4-table join query.
Your team should try out three different execution plans for this query (by
restricting the join methods supported by the optimizer) and execute them with
your different join algorithms.
Now do a combined write-up of the results your group obtained.
In your experimental write-up, include the timings, the plans used, and a discussion of the differences (if any) between timings on different algorithms and different plans. Keep your report brief (no more than 4 pages).
Create a folder containing 3 files:
1. The source code of your integrated SimpleDB+
2. The experimental write-up (in PDF or Word)
3. The list of changes that you have made to the original code base. Although you have submitted some of these in earlier labs, you should include everything here. No need to integrate them into a nice table. Just consolidate them under Lab 1, Lab 2, etc.
Zip the folder and submit to the project folder in LumiNUS.
Demo
You are required to demonstrate your project. Please book a timeslot with Professor Tan for your demo. You will be given 20 minutes to demo your project.
Project Grading
Marks will be awarded for the final SimpleDB+ based on the following criteria:
For
each lab, 1 mark (out of the CA marks) will be deducted for late submission
(within 48 hours), and 2 marks for no submission (beyond 48 hours). For the
final submission, 3x marks will be deducted for late submission (within 24x
hours).
NO CHEATING/PLAGIARISM! Following NUS policy, any one or team found cheating will receive an F grade for the entire module, and will be reported to the faculty/university for disciplinary action.