SDSC3001 - Big Data: The Arts and Science of Scaling

Offering Academic Unit
School of Data Science
Credit Units
Course Duration
One Semester
Course Offering Term*:
Semester A 2021/22
Semester A 2022/23 (Tentative)

* The offering term is subject to change without prior notice
Course Aims

This course aims at teaching students how to tame massive data which are intensively used in high-impact industrial applications. Students will learn two mainstream categories of technical solutions for big data, namely algorithmic approaches and systems approaches. For algorithm approaches, some popular stream algorithms such as heavy hitters and sketching algorithms used when we have a limited memory will be introduced. To deal with huge amount of data, the instructor will also teach sampling-based algorithms, such as approximate counting, that tame big data via sampling a representative small collection of data. For the system approaches, the instructor will introduce Spark, one of the most popular big data computing software nowadays, to the students. Topics in Spark include the MapReduce model, Spark RDDs, DataFrames, DataSets, Spark SQL and Spark ML.

Assessment (Indicative only, please check the detailed course information)

Continuous Assessment: 70%
Examination: 30%
Examination Duration: 2 hours
Detailed Course Information


Useful Links

School of Data Science