Syllabus

MA22004/MA52008 Syllabus (AY2025-2026)

Module Information

Duration: 11 weeks (10 lecutre weeks)
Contact Hours: 2 lecture hours + 2 tutorial hours per week + 5 hours self-paced labs
Software: R (including RMarkdown/Quarto for reproducibile report writing)

Aims

This module aims to provide students with the skills and knowledge required to analyse data and make statistical inferences using sample data. Students will develop a solid foundation in:

Exploratory data analysis
Sampling distributions
Hypothesis testing
Analysis of variance
Regression modeling

Throughout the module, students will use R for data visualization, statistical modeling, hypothesis testing, and creating reproducible reports.

Indicative Content

Data Analysis and Exploration

Exploratory data analysis (EDA) and visualization using R
Data characteristics, summary statistics, and descriptive measures
Correlation measures: Pearson, Spearman, Kendall
Principal Component Analysis (PCA) for dimensionality reduction
Reproducible research: RMarkdown/Quarto

Sampling Distributions and Estimation

Sample statistics and their distributions
Properties of sampling distributions (mean, variance, shape)
Applications of the Central Limit Theorem (CLT)
Random sampling methods and bias considerations

Inferences

Point estimators and their properties
Confidence intervals for:
- Means and variances (Normal and t)
- Proportions and rates (Binomial and Poisson)
- Variances (Chi-square and F)
One and two-sample inference (including paired data and bootstrapping)
Hypothesis testing framework:
- Null/alternative hypotheses, significance levels
- Type I & II errors, sensitivity, specificity, likelihood ratio, power of a test
Goodness-of-fit tests: One-way and two-way chi-square tests
One-way analysis of variance (ANOVA)

Linear Regression Models

Response and explanatory variables
Simple and multiple linear regression models
Least squares estimation and interpretation of coefficients
Assessing model adequacy: Residual analysis and diagnostic tests
Use of R to fit a regression model and interpret output

Computational Tools and Reproducibility

Use of R for:
- Bootstrapping and permutation tests
- t-tests, ANOVA, goodness-of-fit tests
- Linear regression analysis
Reproducible reporting using RMarkdown/Quarto

Intended Learning Outcomes

By the end of the module, students will be able to:

Explain how properties of populations relate to sample data and describe appropriate sampling techniques.
Use R to perform exploratory data analysis, generate summary statistics, and visualize data effectively.
Interpret correlation coefficients and apply Principal Component Analysis (PCA) for high-dimensional data analysis.
Construct and interpret confidence intervals for means, variances, proportions, and rates in one and two-sample situations.
Perform hypothesis tests, including goodness-of-fit tests, ANOVA, and regression model assessment.
Develop and assess simple and multiple linear regression models, including diagnostic testing.
Evaluate the reliability of statistical models and interpret the power of a test in an inference setting.
Engage in mathematical and statistical dialogue.
Use R to conduct analyses, test statistical models, and create reproducible research reports.

Lecture Plan

Week	Topic	Demo
1	Orientation, EDA, Correlation, PCA, Reproducibility	Icebreaker, R PCA
2	Probability Distributions, Sampling Distributions, CLT	Candy sampling
3	Estimation, CI, Hypothesis Testing, Errors & Power	Why 0.05?
4	One-Sample Inference (CI & Tests for Means, Proportions, Variances)	How much water?
5	Two-Sample Inference (Unpaired Cases)	UN & Africa
6	Two-Sample Inference (Paired vs. Unpaired, Variance Testing)	Tennis ball challenge
7	One-Way ANOVA, Goodness-of-Fit Tests	R `aov`
8	Simple & Multiple Regression, LINE Assumptions	R `lm`
9	Regression Inference, Model Diagnostics, Chi-square	R `lm`, Chi-square
10	Quality Control, Bias Correction, Final Review	R 3-sigma control charts

Assessment & Coursework

See MyDundee material.

Software and Resources

R and RStudio (including RMarkdown/Quarto)
Lecture notes: Statistics and Data Analysis Lecture Notes
Lab materials: Statistics and Data Analysis Labs
Recommended Texts: see MyDundee Library resources.

Appendix: Mapping to CS1 Actuarial Statistics Syllabus

This appendix provides a structured mapping of which CS1 topics are addressed in which weeks of the module.

CS1 Section	Topics Covered	Week(s)
1.1.1	Aims of data analysis	1
1.1.2	Stages of data analysis and suitable tools	1
1.1.3	Sources of data and their characteristics	1
1.1.4	Reproducible research methods	1, 10
1.2.1	Summary statistics and exploratory visualizations	1
1.2.2	Correlation measures, including Pearson’s, Spearman’s, and Kendall’s coefficients	1
1.2.3	Principal Component Analysis (PCA) for dimensionality reduction	1
2.6.1	Sample statistics and their distributions	2
2.6.2	Properties of sampling distributions (mean, variance, shape)	2
2.6.3	Applications of the Central Limit Theorem	2
2.6.4	Random sampling methods and bias considerations	2
3.2.1	Point estimators and their properties	3
3.2.2	Confidence intervals for means and variances (Normal and t), proportions and rates (Binomial and Poisson), variances (Chi-square and F)	3, 4, 5, 6
3.2.3	Confidence intervals based on one and two-sample situations, including paired data and bootstrapping	3, 4, 5, 6
3.3.1	Hypothesis testing framework, including null/alternative hypotheses, significance levels, errors, and power of a test	3
3.3.2	Hypothesis tests for one and two-sample situations	4, 5, 6
3.3.3	Goodness-of-fit tests, including one-way and two-way chi-square tests	7, 9
4.1.1	Simple and multiple linear regression models	8
4.1.2	Assessing model adequacy, including residual analysis and diagnostic tests	9
4.1.3	Use of R to fit a linear regression model and interpret output	8, 9

{{< include preamble.qmd >}} # Syllabus {-} MA22004/MA52008 Syllabus (AY2025-2026) ## Module Information {-} - Duration: 11 weeks (10 lecutre weeks) - Contact Hours: 2 lecture hours + 2 tutorial hours per week + 5 hours self-paced labs - Software: R (including RMarkdown/Quarto for reproducibile report writing) ## Aims {-} This module aims to provide students with the skills and knowledge required to analyse data and make statistical inferences using sample data. Students will develop a solid foundation in: - Exploratory data analysis - Sampling distributions - Hypothesis testing - Analysis of variance - Regression modeling Throughout the module, students will use R for data visualization, statistical modeling, hypothesis testing, and creating reproducible reports. ## Indicative Content {-} ### Data Analysis and Exploration {-} - Exploratory data analysis (EDA) and visualization using R - Data characteristics, summary statistics, and descriptive measures - Correlation measures: Pearson, Spearman, Kendall - Principal Component Analysis (PCA) for dimensionality reduction - Reproducible research: RMarkdown/Quarto ### Sampling Distributions and Estimation {-} - Sample statistics and their distributions - Properties of sampling distributions (mean, variance, shape) - Applications of the Central Limit Theorem (CLT) - Random sampling methods and bias considerations ### Inferences {-} - Point estimators and their properties - Confidence intervals for: - Means and variances (Normal and t) - Proportions and rates (Binomial and Poisson) - Variances (Chi-square and F) - One and two-sample inference (including paired data and bootstrapping) - Hypothesis testing framework: - Null/alternative hypotheses, significance levels - Type I & II errors, sensitivity, specificity, likelihood ratio, power of a test - Goodness-of-fit tests: One-way and two-way chi-square tests - One-way analysis of variance (ANOVA) ### Linear Regression Models {-} - Response and explanatory variables - Simple and multiple linear regression models - Least squares estimation and interpretation of coefficients - Assessing model adequacy: Residual analysis and diagnostic tests - Use of R to fit a regression model and interpret output ### Computational Tools and Reproducibility {-} - Use of R for: - Bootstrapping and permutation tests - t-tests, ANOVA, goodness-of-fit tests - Linear regression analysis - Reproducible reporting using RMarkdown/Quarto ## Intended Learning Outcomes {-} By the end of the module, students will be able to: - Explain how properties of populations relate to sample data and describe appropriate sampling techniques. - Use R to perform exploratory data analysis, generate summary statistics, and visualize data effectively. - Interpret correlation coefficients and apply Principal Component Analysis (PCA) for high-dimensional data analysis. - Construct and interpret confidence intervals for means, variances, proportions, and rates in one and two-sample situations. - Perform hypothesis tests, including goodness-of-fit tests, ANOVA, and regression model assessment. - Develop and assess simple and multiple linear regression models, including diagnostic testing. - Evaluate the reliability of statistical models and interpret the power of a test in an inference setting. - Engage in mathematical and statistical dialogue. - Use R to conduct analyses, test statistical models, and create reproducible research reports. ## Lecture Plan | Week | Topic | Demo | |------|-------|------| | 1 | Orientation, EDA, Correlation, PCA, Reproducibility | Icebreaker, R PCA | | 2 | Probability Distributions, Sampling Distributions, CLT | Candy sampling | | 3 | Estimation, CI, Hypothesis Testing, Errors & Power | Why 0.05? | | 4 | One-Sample Inference (CI & Tests for Means, Proportions, Variances) | How much water? | | 5 | Two-Sample Inference (Unpaired Cases) | UN & Africa | | 6 | Two-Sample Inference (Paired vs. Unpaired, Variance Testing) | Tennis ball challenge | | 7 | One-Way ANOVA, Goodness-of-Fit Tests | R `aov` | | 8 | Simple & Multiple Regression, LINE Assumptions | R `lm` | | 9 | Regression Inference, Model Diagnostics, Chi-square | R `lm`, Chi-square | | 10 | Quality Control, Bias Correction, Final Review | R 3-sigma control charts | ## Assessment & Coursework See MyDundee material. ## Software and Resources - R and RStudio (including RMarkdown/Quarto) - Lecture notes: [Statistics and Data Analysis Lecture Notes](https://dundeemath.github.io/MA22004/) - Lab materials: [Statistics and Data Analysis Labs](https://dundeemath.github.io/MA22004labs/) - Recommended Texts: see MyDundee Library resources. ## Appendix: Mapping to CS1 Actuarial Statistics Syllabus This appendix provides a structured mapping of which CS1 topics are addressed in which weeks of the module. | CS1 Section | Topics Covered | Week(s) | |-------------|---------------------------|---------| | 1.1.1 | Aims of data analysis | 1 | | 1.1.2 | Stages of data analysis and suitable tools | 1 | | 1.1.3 | Sources of data and their characteristics | 1 | | 1.1.4 | Reproducible research methods | 1, 10 | | 1.2.1 | Summary statistics and exploratory visualizations | 1 | | 1.2.2 | Correlation measures, including Pearson’s, Spearman’s, and Kendall’s coefficients | 1 | | 1.2.3 | Principal Component Analysis (PCA) for dimensionality reduction | 1 | | 2.6.1 | Sample statistics and their distributions | 2 | | 2.6.2 | Properties of sampling distributions (mean, variance, shape) | 2 | | 2.6.3 | Applications of the Central Limit Theorem | 2 | | 2.6.4 | Random sampling methods and bias considerations | 2 | | 3.2.1 | Point estimators and their properties | 3 | | 3.2.2 | Confidence intervals for means and variances (Normal and t), proportions and rates (Binomial and Poisson), variances (Chi-square and F) | 3, 4, 5, 6 | | 3.2.3 | Confidence intervals based on one and two-sample situations, including paired data and bootstrapping | 3, 4, 5, 6 | | 3.3.1 | Hypothesis testing framework, including null/alternative hypotheses, significance levels, errors, and power of a test | 3 | | 3.3.2 | Hypothesis tests for one and two-sample situations | 4, 5, 6 | | 3.3.3 | Goodness-of-fit tests, including one-way and two-way chi-square tests | 7, 9 | | 4.1.1 | Simple and multiple linear regression models | 8 | | 4.1.2 | Assessing model adequacy, including residual analysis and diagnostic tests | 9 | | 4.1.3 | Use of R to fit a linear regression model and interpret output | 8, 9 |