Syllabus

MA22004/MA52008 Syllabus (AY2025-2026)

Module Information

  • Duration: 11 weeks (10 lecutre weeks)
  • Contact Hours: 2 lecture hours + 2 tutorial hours per week + 5 hours self-paced labs
  • Software: R (including RMarkdown/Quarto for reproducibile report writing)

Aims

This module aims to provide students with the skills and knowledge required to analyse data and make statistical inferences using sample data. Students will develop a solid foundation in:

  • Exploratory data analysis
  • Sampling distributions
  • Hypothesis testing
  • Analysis of variance
  • Regression modeling

Throughout the module, students will use R for data visualization, statistical modeling, hypothesis testing, and creating reproducible reports.

Indicative Content

Data Analysis and Exploration

  • Exploratory data analysis (EDA) and visualization using R
  • Data characteristics, summary statistics, and descriptive measures
  • Correlation measures: Pearson, Spearman, Kendall
  • Principal Component Analysis (PCA) for dimensionality reduction
  • Reproducible research: RMarkdown/Quarto

Sampling Distributions and Estimation

  • Sample statistics and their distributions
  • Properties of sampling distributions (mean, variance, shape)
  • Applications of the Central Limit Theorem (CLT)
  • Random sampling methods and bias considerations

Inferences

  • Point estimators and their properties
  • Confidence intervals for:
    • Means and variances (Normal and t)
    • Proportions and rates (Binomial and Poisson)
    • Variances (Chi-square and F)
  • One and two-sample inference (including paired data and bootstrapping)
  • Hypothesis testing framework:
    • Null/alternative hypotheses, significance levels
    • Type I & II errors, sensitivity, specificity, likelihood ratio, power of a test
  • Goodness-of-fit tests: One-way and two-way chi-square tests
  • One-way analysis of variance (ANOVA)

Linear Regression Models

  • Response and explanatory variables
  • Simple and multiple linear regression models
  • Least squares estimation and interpretation of coefficients
  • Assessing model adequacy: Residual analysis and diagnostic tests
  • Use of R to fit a regression model and interpret output

Computational Tools and Reproducibility

  • Use of R for:
    • Bootstrapping and permutation tests
    • t-tests, ANOVA, goodness-of-fit tests
    • Linear regression analysis
  • Reproducible reporting using RMarkdown/Quarto

Intended Learning Outcomes

By the end of the module, students will be able to:

  • Explain how properties of populations relate to sample data and describe appropriate sampling techniques.
  • Use R to perform exploratory data analysis, generate summary statistics, and visualize data effectively.
  • Interpret correlation coefficients and apply Principal Component Analysis (PCA) for high-dimensional data analysis.
  • Construct and interpret confidence intervals for means, variances, proportions, and rates in one and two-sample situations.
  • Perform hypothesis tests, including goodness-of-fit tests, ANOVA, and regression model assessment.
  • Develop and assess simple and multiple linear regression models, including diagnostic testing.
  • Evaluate the reliability of statistical models and interpret the power of a test in an inference setting.
  • Engage in mathematical and statistical dialogue.
  • Use R to conduct analyses, test statistical models, and create reproducible research reports.

Lecture Plan

Week Topic Demo
1 Orientation, EDA, Correlation, PCA, Reproducibility Icebreaker, R PCA
2 Probability Distributions, Sampling Distributions, CLT Candy sampling
3 Estimation, CI, Hypothesis Testing, Errors & Power Why 0.05?
4 One-Sample Inference (CI & Tests for Means, Proportions, Variances) How much water?
5 Two-Sample Inference (Unpaired Cases) UN & Africa
6 Two-Sample Inference (Paired vs. Unpaired, Variance Testing) Tennis ball challenge
7 One-Way ANOVA, Goodness-of-Fit Tests R aov
8 Simple & Multiple Regression, LINE Assumptions R lm
9 Regression Inference, Model Diagnostics, Chi-square R lm, Chi-square
10 Quality Control, Bias Correction, Final Review R 3-sigma control charts

Assessment & Coursework

See MyDundee material.

Software and Resources

Appendix: Mapping to CS1 Actuarial Statistics Syllabus

This appendix provides a structured mapping of which CS1 topics are addressed in which weeks of the module.

CS1 Section Topics Covered Week(s)
1.1.1 Aims of data analysis 1
1.1.2 Stages of data analysis and suitable tools 1
1.1.3 Sources of data and their characteristics 1
1.1.4 Reproducible research methods 1, 10
1.2.1 Summary statistics and exploratory visualizations 1
1.2.2 Correlation measures, including Pearson’s, Spearman’s, and Kendall’s coefficients 1
1.2.3 Principal Component Analysis (PCA) for dimensionality reduction 1
2.6.1 Sample statistics and their distributions 2
2.6.2 Properties of sampling distributions (mean, variance, shape) 2
2.6.3 Applications of the Central Limit Theorem 2
2.6.4 Random sampling methods and bias considerations 2
3.2.1 Point estimators and their properties 3
3.2.2 Confidence intervals for means and variances (Normal and t), proportions and rates (Binomial and Poisson), variances (Chi-square and F) 3, 4, 5, 6
3.2.3 Confidence intervals based on one and two-sample situations, including paired data and bootstrapping 3, 4, 5, 6
3.3.1 Hypothesis testing framework, including null/alternative hypotheses, significance levels, errors, and power of a test 3
3.3.2 Hypothesis tests for one and two-sample situations 4, 5, 6
3.3.3 Goodness-of-fit tests, including one-way and two-way chi-square tests 7, 9
4.1.1 Simple and multiple linear regression models 8
4.1.2 Assessing model adequacy, including residual analysis and diagnostic tests 9
4.1.3 Use of R to fit a linear regression model and interpret output 8, 9