DATA301-25S1 (C) Semester One 2025

Big Data Computing and Systems

15 points

Details:
Start Date: Monday, 17 February 2025
End Date: Sunday, 22 June 2025
Withdrawal Dates
Last Day to withdraw from this course:
  • Without financial penalty (full fee refund): Sunday, 2 March 2025
  • Without academic penalty (including no fee refund): Sunday, 11 May 2025

Description

The course introduces distributed computational techniques, distributed algorithms and systems/programming support for large-scale processing of data.

Covid-19 Update: Please refer to the course page on AKO | Learn for all information about your course, including lectures, labs, tutorials and assessments.

DESCRIPTION
This course teaches parallel and distributed programming, algorithms, and systems principles that are relevant for large-scale processing of big data sets on high performance computing clusters and cloud computing resources.

Learning Outcomes

1. Discuss the fundamentals of cloud computing systems (SaaS, PaaS, IaaS, storage and networking architectures, virtual machines and their management, job scheduling) [WA1, WA6, WA10]
2. Discuss different programming models for parallel and distributed computing (shared memory, shared-nothing / message-passing architectures) and common design patterns for distributed computations on big data sets (e.g. leader/follower, Map/Reduce, Gossiping) [WA2, WA6, WA10]
3. Discuss the drawbacks and advantages of different cloud solutions and distributed programming models and select appropriate solutions for a given situation [WA2, WA6, WA10]
4. Apply fundamental distributed algorithms (e.g. leader election, consensus) and their properties as well as selected specialized algorithms for distributed processing of big data (e.g. matrix algorithms in parallel / distributed environments, distributed optimisation) [WA1, WA3]
5. Design, implement and evaluate distributed processing programs for large data sets using appropriate software frameworks like Dask, MPI, CUDA, Hadoop or Apache SPARK [WA1, WA3, WA4, WA5]

University Graduate Attributes

This course will provide students with an opportunity to develop the Graduate Attributes specified below:

Employable, innovative and enterprising

Students will develop key skills and attributes sought by employers that can be used in a range of applications.

Prerequisites

Timetable 2025

Students must attend one activity from each section.

Lecture A
Activity Day Time Location Weeks
01 Monday 14:00 - 15:00 Meremere 108 Lecture Theatre
17 Feb - 6 Apr
28 Apr - 1 Jun
Lecture B
Activity Day Time Location Weeks
01 Tuesday 08:00 - 09:00 Meremere 108 Lecture Theatre
17 Feb - 6 Apr
28 Apr - 1 Jun
Computer Lab A
Activity Day Time Location Weeks
01 Tuesday 11:00 - 13:00 Jack Erskine 136 Lab 4
17 Feb - 6 Apr
28 Apr - 1 Jun
02 Wednesday 08:00 - 10:00 Jack Erskine 136 Lab 4
17 Feb - 6 Apr
28 Apr - 1 Jun
03 Wednesday 14:00 - 16:00 Jack Erskine 136 Lab 4
17 Feb - 6 Apr
28 Apr - 1 Jun
04 Thursday 10:00 - 12:00 Jack Erskine 136 Lab 4
17 Feb - 6 Apr
28 Apr - 1 Jun

Course Coordinator

James Atlas

Assessment

Covid-19 Update: Please refer to the course page on AKO | Learn for all information about your course, including lectures, labs, tutorials and assessments.

Textbooks / Resources

Recommended Reading

Blaise Barney; Introduction to Parallel Computing ; (Introduction to Parallel Computing (and other tutorials). https://hpc.llnl.gov/training/tutorials).

CUDA; CUDA Toolkit Documentation ; v10.0.130; (CUDA Programming Guide: https://docs.nvidia.com/cuda).

Jure Leskovec, Anand Rajarman, Jeffrey David Ullman; Mining of Massive Datasets ; 3rd; Cambridge University Press, 2020 (http://www.mmds.org).

Additional Course Outline Information

Academic integrity

You are encouraged to discuss the general aspects of a problem with others. However, anything you submit for credit must be entirely your own work and not copied, with or without modification, from any other person. If you share details of your work with anybody else then you are likely to be in breach of the University's General Course and Examination Regulations and/or Computer Regulations (both of which are set out in the University Calendar) and/or the Computer Science Department's policy (see section 9). The Department treats cases of dishonesty very seriously and, where appropriate, will not hesitate to notify the University Proctor.

If you need help with specific details relating to your work, or are not sure what you are allowed to do, then contact your tutors or lecturer for advice.

Assessment and grading system

Lab assessment - 30%

In the labs students will practice the design and implementation of distributed algorithms and they will gain practical experience with contemporary Big Data and Cloud Computing frameworks such as Dask, Apache SPARK, MPI, CUDA and Google Cloud / Amazon Web Services. LO2, LO4, LO5


Project - 40%

In this series of artifacts, students will complete a short, application focused project. Students will analyze a big data set, which requires them to design a data processing flow, write progress reports, implement and test an appropriate distributed algorithm in an appropriate software framework, to critique their design and to communicate the design and analysis results in a professional manner in a written report. This assessment item addresses LO3, LO5, LO6


Final exam - 30%

The final exam will allow a summative assessment of learning outcomes related to the full semester. This can include theoretical aspects, algorithms, programming, and techniques covered in lectures and assignments. LO1, LO2, LO3, LO4

Grade moderation

The Computer Science department's grading policy states that in order to pass a course you must meet two requirements:
1. You must achieve an average grade of at least 50% over all assessment items.
2. You must achieve an average mark of at least 45% on invigilated assessment items.

If you satisfy both these criteria, your grade will be determined by the following University-wide scale for converting marks to grades: an average mark of 50% is sufficient for a C- grade, an average mark of 55% earns a C grade, 60% earns a C+ grade and so forth. However if you do not satisfy both the passing criteria you will be given either a D or E grade depending on marks. Marks are sometimes scaled to achieve consistency between courses from year to year.

Students may apply for special consideration if their performance in an assessment is affected by extenuating circumstances beyond their control.

Applications for special consideration should be submitted via the Examinations Office website within five days of the assessment.

Where an extension may be granted for an assessment, this will be decided by direct application to the Department and an application to the Examinations Office may not be required.

Special consideration is not available for items worth less than 10% of the course.

Students prevented by extenuating circumstances from completing the course after the final date for withdrawing, may apply for special consideration for late discontinuation of the course. Applications must be submitted to the Examinations Office within five days of the end of the main examination period for the semester.

Course Outline

The topics covered in lectures will be organized generally with the following progression:

•Introduction: Big Data
•5 Vs (Variety, Velocity, Volume, Veracity, Value)
•Storage and networking architectures
•Divide and Conquer, Map, Reduce, Map/Reduce functional programming and DataFrames in Dask
•Algorithms in Dask: Group By, Union, Intersection, Difference, Matrix-Vector and Matrix-Matrix Multiplication
•Systems: SaaS, PaaS, IaaS, Google Cloud / Amazon Web Services, storage and networking architectures, virtual machines and their management, job scheduling, cloud resources
•Algorithms in Dask on cloud: Hashing, PageRank
•Data Processing: Distributed Data Structures, Graphs, Leader Election, Consensus
•Memory Hierarchy, Shared memory, Shared-nothing, distributed file systems, replication, communication cost, complexity theory
•Programming: Message-Passing (MPI)
•Programming: Threads, Locks and Atomics (CUDA)
•Programming: Work Queues, Schedulers, Streaming
•Heterogenous Processing: Systems and Programming

Preparation

The course assumes that you are proficient in Python, as taught in COSC121, and in algorithm design and analysis, as taught in COSC262. If you are enrolling in DATA301 but haven't already passed COSC121 and COSC262 or the equivalents, you should consult the course supervisor before enrolling.

Indicative Fees

Domestic fee $894.00

International fee $4,100.00

* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.

For further information see Computer Science and Software Engineering .

All DATA301 Occurrences

  • DATA301-25S1 (C) Semester One 2025