SDSC6009 - Machine Learning at Scale

Offering Academic Unit
School of Data Science
Credit Units
Course Duration
One Semester
Course Offering Term*:
Semester B 2021/22

* The offering term is subject to change without prior notice
Course Aims

This course teaches the underlying principles required to develop scalable machine learning pipelines for structured and unstructured data at the petabyte scale. The course covers principles of scaling machine learning process under big data via deploying the MapReduce parallel computing. In addition, the hands-on algorithmic design and development of machine learning algorithms in parallel computing environments (Spark) will be discussed. Students will use MapReduce parallel computing frameworks for machine learning in industrial applications and deployments for various fields, including advertising, finance, healthcare, and search engines.

Assessment (Indicative only, please check the detailed course information)

Continuous Assessment: 65%
Examination: 35%
Examination Duration: 2 hours
Detailed Course Information


Useful Links

School of Data Science