Skip to main content

Description

HD and Dean's List Extensive Complete Course Notes. Covering every topic and lecture for the whole term. Topics covered: Week 1 (Introduction to Data Analytics): What is analytics, Banerjee, Bandyopadhyay and Acharya (2013) core reading (Vikalpa 38(4)), four types of analytics from the Gartner Report 2012 (descriptive, diagnostic, predictive, prescriptive) with complexity and frequency comparison table, three organisational use cases (dashboard, investigative, prescriptive scenario-builder), the analytics industry and India as a KPO hub, four required competencies for analytics practitioners (technical, business acumen, communication, domain knowledge), counter-views and limitations of analytics (greenfield limitation, black swan events, snob value adoption, creativity risk, employee threat perceptions, Tom Tom GPRS case), big data as a new frontier (volume, variety, velocity, unstructured data challenges). Week 2 (Data Visualisation in R): Overview of R and RStudio, fundamental R functions for data loading and exploration (, nrow, ncol, subset, colSums, head, tail), weekly ice cream sales workshop dataset (52 rows, 26 weeks, Staff vs Student, 5 flavour groups), subsetting and aggregating data in R with worked code, bar chart types (vertical, horizontal, grouped, stacked) with key R arguments for each, Staff vs Student sales comparison across flavour groups (full worked numbers for 26 weeks), grouped bar chart R code with matrix, beside, and legend parameters, pie charts (definition, limitations, R code), boxplots (median, Q1, Q3, IQR, whiskers, outliers) with component definition and interpretation table, boxplot R code with par and las parameters, seasonal patterns in weekly sales data (student peak in April, staff peak in June, Week 13 outlier). Week 3 (Descriptive Analytics): What is descriptive analytics and its role in the analytics pyramid, measures of central tendency table (mean, median, mode with definitions, strengths and weaknesses), measures of spread table (range, variance, standard deviation, IQR), data types and scales of measurement (nominal, ordinal, interval, ratio with examples and appropriate visualisations), exploratory data analysis (EDA) and John Tukey's philosophy (5-step EDA process), key distributions table (normal, right-skewed, left-skewed, bimodal with mean vs median relationship and examples), mosaic plots (two-variable structure, width and height encoding, Titanic survival example). Week 4 (Predictive Analytics I, Linear Regression): What is predictive analytics and the two categories (regression vs classification), simple linear regression model formula (Y = beta_0 + beta_1 multiplied by X + epsilon), multiple linear regression model, interpretation of coefficients (continuous and dummy variable predictors), Ordinary Least Squares estimation and the Gauss-Markov assumptions (5 conditions), R-squared and adjusted R-squared, statistical significance (standard errors, t-statistic, p-values, 95 percent confidence intervals), causal vs predictive regression distinction and the limits of observational data (omitted variable bias, endogeneity, selection bias, reverse causality), difference-in-means as a special case of regression with a binary dummy predictor. Week 5 (Predictive Analytics II, Classification): Classification vs regression comparison table (Y type, output, examples, methods), binary classification problem setup, four common classification examples (medical treatment, voting behaviour, customer churn, Titanic survival), classification trees (root node, internal nodes, branches, leaf nodes, key properties), apple vs pear toy example with decision rules, tree building in R using the rpart package (rpart, , fancyRpartPlot), Iris dataset classification tree with species rules, confusion matrix for binary classification (TP, FP, FN, TN with Type I and II errors), accuracy rate formula and class-specific accuracy formula, fruit classification worked confusion matrix ( percent accuracy), Titanic classification tree full worked analysis (training accuracy percent, test accuracy 80 percent, gender as dominant predictor), logistic regression model (log-odds formulation), probabilities vs odds vs log-odds table, interpreting logistic regression coefficients (positive and negative), Titanic logistic regression results table (10 predictors with estimates and p-values), classification tree vs logistic regression accuracy comparison for Titanic, dummy variable coding for categorical predictors. Week 6 (Prescriptive Analytics and Decision Support): Prescriptive analytics as the highest level of the hierarchy (what should we do?), four tools and techniques (optimisation, Monte Carlo simulation, decision analysis, scenario builders), decision trees in a prescriptive context (decision nodes, chance nodes, terminal nodes, expected value criterion), connecting predictive outputs to prescriptive frameworks (customer churn and retention incentive example, loan default and credit allocation example). Week 7 (Data Ethics): Two definitions of data ethics (personally identifiable information definition; Floridi and Cowls moral problems definition), deontological vs utilitarian ethics comparison table (origins, core focus, key idea, patient vs society orientation, keywords, weaknesses), ethics across the four analytics lifecycle stages table (collect, store, analyse, communicate; covering privacy, security, bias and transparency dimensions), data privacy definition and four dimensions (anonymity, pseudonymity, unobservability, unlinkability), five types of data bias (confirmation, outlier, selection, survivorship, historical) and the bias perpetuation cycle, algorithmic transparency and explainability, the accuracy vs interpretability trade-off, ethics vs law comparison table (meaning, objective, governed by, violation, binding nature), Charlie automated healthcare app case study (Type 2 diabetes AI app, privacy breach, security risk, training data bias, ACMA transparency issue, Benevolence Team initiatives: representative data, information inundation, Multi-Armed Bandit approach, utilitarian vs deontological analysis), Facebook and Amazon case study (social graph monetisation, deontological and utilitarian analysis, Target baby coupon case), John and Jane first home buyer case study (algorithmic credit rejection using non-financial data, transparency, privacy, fairness, discrimination concerns), Beauchamp and Childress four biomedical ethics principles (autonomy, beneficence, non-maleficence, justice), Outbreak film analysis by Tseng and Wang (2021) (McClintock utilitarian position, Daniels deontological position, COVID-19 parallels). Week 8 (Causal Inference and Experimental Design): The fundamental problem of causal inference and the Average Treatment Effect (ATE), randomised controlled experiments as the gold standard (comparison table: randomised vs observational), randomisation requirements and what makes assignment truly random, balance checks and the Lewis and Reiley (2014) large-sample example, bank credit card field experiment case study (factorial design critique), Engineers Society membership renewal experiment case study (seniority vs even-odd year assignment as random), Lewis and Reiley (2014) Yahoo online advertising study (endogenous selection critique of Abraham 2008, intent-to-treat vs ATT), Chattopadhyay and Duflo (2004) women politicians and policy case study (India GP reservation, ATE estimation for irrigation vs water facilities), natural experiments (Kingsford-Smith Airport runway closure and property prices), threats to validity in natural experiments (simultaneous changes, anticipation effects), difference-in-differences design. Week 9 (Big Data, Data Visualisation Typology, Observational Research): Fiebig (2016) core reading (big data and patient-centred care), the data deluge and characteristics of big data (volume, variety, velocity, veracity), Fiebig's two structural arguments against big data as a paradigm shift (more data does not mean better data, risk of low-hanging fruit), Griliches's First Law of Data Analysis (all data misbehave), data munging and wrangling, why research design remains central even with big data, smart data concept (suitability over quantity), Fiebig's proposed path forward (combine sources, patient-reported outcomes, novel research designs, data custodian relationships), privacy access and confidentiality tensions, Berinato (2016) Good Charts core reading (Harvard Business Review Press), two foundational questions framework (conceptual vs data-driven; declarative vs exploratory), four types of visualisation table (idea illustration, idea generation, visual discovery, everyday dataviz with nature, purpose and characteristics), idea illustration in detail (pyramid example, metaphor serving the idea principle), everyday dataviz in detail (healthcare cost growth example, self-evident message principle), visual discovery (confirmatory vs exploratory comparison), observational studies and selection problems (Jenny Craig weight loss, class size and student performance, underquoting real estate Agent X).


UNSW

Term 2, 2025


40 pages

11,988 words

$34.00

Add to cart