COMP9318

Name: ,
(Family name) (Given name)
Student ID:
THE UNIVERSITY OF NEW SOUTH WALES
Final Exam
COMP9318
Data Warehousing and Data Mining
SESSION 1, 2008
Time allowed: 10 minutes reading time + 3 hours
Total number of questions: 7 + 1
Total number of marks: 100 + 5 (Bonus)
Only UNSW exam calculators are allowed in this exam.
Answer all questions.
You can answer the questions in any order.
Start each question on a new page.
Answers must be written in ink.
Answer these questions in the script book provided.
Do not write your answer in this exam paper.
Start each questions on a new page.
If you use more than one script book, fill in your details on the front of each book.
You may not take this question paper out of the exam.
SECTION A: Potpourri
Question 1 (20 marks)
Briefly answer the following questions in your script book:
(a) List at least three differences between OLAP and OLTP.
(b) List at least two algorithms we have discussed in the course that follow the divide-
and-conquor paradigm.
(c) What is the confidence for the rule → A ( stands for the empty set)
(d) What is the confidence for the rule A → ( stands for the empty set)
(e) What are the main differences between clustering and classification
(f) Give an example of a distance function that satisfies the triangle inequality.
COMP9318 Page 1 of 9
SECTION B: Data Warehousing
Question 2 (10 marks)
Consider the following base cuboid Sales with four tuples and the aggregate function
SUM:
Location T ime Item Quantity
Sydney 2005 PS2 1400
Sydney 2006 PS2 1500
Sydney 2006 Wii 500
Melbourne 2005 XBox 360 1700
Location, Time, and Item are dimensions and Quantity is the measure. Suppose the
system has built-in support for the value ALL.
(a) How many tuples are there in the complete data cube of Sales
(b) Write down an equivalent SQL statement that computes the same result (i.e., the
cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.
(c) Consider the following ice-berg cube query:
SELECT Location, Time, Item, SUM(Quantity)
FROM Sales
CUBE BY Location, Time, Item
HAVING COUNT(*) > 1
Draw the result of the query in a tabular form.
(d) Assume that we adopt a MOLAP architecture to store the full data cube of R, with
the following mapping functions:
fLocation(x) =