8 Quality control

Quality control is an area of applied statistics that makes interventions to maintain or improve the outcome of industrial processes. Random variations in output processes might negatively impact the quality of a product. We want to identify the sources of random output-process variations that might have assignable causes. Control charts are a tool that helps us to recognise when industrial processes are no longer controlled so that one might then seek to identify assignable causes.

8.1 Control charts

The essential elements of control charting involve specifying a control region and then analysing time-series data. We will specify a baseline value along with an upper and lower control limit and assume that a process is under control unless a test statistic suggests otherwise. To construct a control chart, one collects data about a process at fixed points of time and calculates the running value of a quality statistic. Suppose the quality statistic exceeds the upper or lower control limits. In that case, the process is deemed out of control, and the product quality is assumed to be negatively impacted.

Default position

The default position adopted for quality control will be reminiscent of hypothesis testing: “assume that a process is under control unless a test statistic suggests otherwise.”

The process of creating a control chart is best illustrated through an extended example, like Example 8.1 provided below.

Example 8.1 Here we consider the typical 3 \sigma control charting for a process mean \overline{X} based on estimated parameters. That is, we assume the generating process X is normally distributed with unknown parameters \mu and \sigma^2. We seek to estimate the mean \overline{X}. Our control region is specified to be three standard deviations; the process is in control if it remains within three standard deviations of a baseline value.

Note 8.1: Beer Production Data

The Beer Production Data contains measurements of the features OG, ABV, pH, and IBU for 50 batches of each of three types of product (Premium Lager, IPA, and Light Lager).

Code

beer |> glimpse()

Rows: 150
Columns: 6
$ Batch_Id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, …
$ OG       <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5…
$ ABV      <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4…
$ pH       <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1…
$ IBU      <dbl> 9.0, 10.0, 7.0, 9.0, 8.0, 7.7, 7.4, 7.1, 6.8, 6.5, 6.2, 5.9, 5.6, 5.3, …
$ Beer     <chr> "Premium Lager", "Premium Lager", "Premium Lager", "Premium Lager", "Pr…

Let’s consider the Beer Production Data in Note 8.1. We are interested in the IPA’s pH value, which influences saccharification. We assume that three batches of IPA are produced per day, and we prepare the data as follows.

Code

ipa <- beer |> 
 select(Batch_Id, pH, Beer) |> 
 filter(Beer == "IPA") |> 
 rename(Day = Batch_Id)

m <- 3    # three batches per day
k <- 16   # number of days
ipa$Day[1:(m*k)] <- rep(1:k, each = m)
ipa <- ipa[1:(m*k),]

The prepared data, ipa, is summarized in the Table 8.1.

Code

ipa_stat <- ipa |> 
 group_by(Day) |> 
 summarise(obs = list(pH), mean = signif(mean(pH), digits = 4), 
           sd = signif(sd(pH), digits = 4), range = max(pH) - min(pH)) 
ipa_stat |> 
 kbl(align = "rcccc", booktabs = T, escape = F) |>
 kable_styling(latex_options = c("striped"))

Table 8.1: Observations and summary statistics for the Beer Production Data.

Day	obs	mean	sd	range
1	4.7, 4.5, 4.9	4.700	0.20000	0.4
2	4.0, 4.6, 4.5	4.367	0.32150	0.6
3	4.7, 3.3, 4.6	4.200	0.78100	1.4
4	3.9, 3.5, 4.2	3.867	0.35120	0.7
5	4.0, 4.7, 3.6	4.100	0.55680	1.1
6	4.4, 4.5, 4.1	4.333	0.20820	0.4
7	4.5, 3.9, 4.8	4.400	0.45830	0.9
8	4.0, 4.9, 4.7	4.533	0.47260	0.9
9	4.3, 4.4, 4.8	4.500	0.26460	0.5
10	5.0, 4.5, 3.5	4.333	0.76380	1.5
11	3.8, 3.7, 3.9	3.800	0.10000	0.2
12	5.1, 4.5, 4.5	4.700	0.34640	0.6
13	4.7, 4.4, 4.1	4.400	0.30000	0.6
14	4.0, 4.4, 4.6	4.333	0.30550	0.6
15	4.0, 3.3, 4.2	3.833	0.47260	0.9
16	4.2, 4.2, 4.3	4.233	0.05774	0.1

We first observe that the pH measurements are (at least approximately) normal, as seen in the quantile-quantile plot in Figure 8.1.

Code

ipa |> ggplot(aes(sample = pH)) + stat_qq() + stat_qq_line()

Figure 8.1: Normal quantile-quantile plot of observed pH measurements of the IPA batches.

We consider the data for pH readings from three batches of IPA taken over sixteen days (k = 16) presented in Table 8.1. The Table includes the sample mean per day, \overline{x}, the sample standard deviation, s, and the range of values per day, \max{x_i} - \min{x_i} (each based on m=3 batches).

We estimate the mean
\widehat{\mu} = \frac{1}{k} \sum_{i=1}^k \overline{x}_i \,, by averaging the means found for the k days and, similarly, estimating the mean of the sample standard deviation, \overline{s} = \frac{1}{k} \sum_{i=1}^k s_i\,, by averaging the sample standard deviations for the k days. It can be shown that \widehat{\sigma} = \frac{\overline{S}}{a_m} is an unbiased estimator of \sigma where a_m = \frac{\sqrt{2} \Gamma(m/2)}{\sqrt{m-1}\Gamma\left((n-1)/2\right)}\,. Thus, we compute the 3\sigma upper and lower control limits, respectively, \mathsf{UCL} = \widehat{\mu} + 3 \frac{\overline{s}}{a_m \sqrt{m}} and \mathsf{LCL} = \widehat{\mu} - 3 \frac{\overline{s}}{a_m \sqrt{m}} \,. The computations in R follow, along with the resulting control chart in Figure 8.2.

Code

a <- function(m){ sqrt(2) * gamma(m/2) / (sqrt(m-1) * gamma((m-1)/2)) }
muhat = sum(ipa_stat$mean) / k
sbar = sum(ipa_stat$sd) / k
lcl = muhat - 3*sbar / (a(m) * sqrt(m))
ucl = muhat + 3*sbar / (a(m) * sqrt(m))

ggplot(ipa_stat, aes(x = Day)) + geom_point(aes(y = mean)) + 
 geom_hline(aes(yintercept = muhat, color = "Mean"), linewidth = lsz) + 
 geom_hline(aes(yintercept = lcl, color = "LCL"), linewidth = lsz*1.5) + 
 geom_hline(aes(yintercept = ucl, color = "UCL"), linewidth = lsz*1.5) + ylab("pH") + 
   theme(legend.justification = c(1,1), legend.position = c(0.9,0.9),
         legend.title = element_blank(), 
         legend.box.margin = margin(c(4, 4, 4, 4), unit = "pt"))

Figure 8.2: The 3\sigma control chart illustrates that with respect to pH the brewing process is in-control over the selected timeframe as the observations fall within the (\mathsf{LCL}, \mathsf{UCL}) control interval.

From Figure 8.2, we observe for each day the process is in-control as the observed mean pH values fall within the control limits (\mathsf{LCL}, \mathsf{UCL}). If this were not the case, our initial assumption that the process is in control would be violated. The violation of the assumption would require that we seek to identify an assignable cause for the variation. If a cause could be identified, we would need to recompute our control limits with the observations that were out of control removed.

{{< include preamble.qmd >}} ```{r setup-qc, include = FALSE} beer <- read_csv("data/beer.csv") ``` # Quality control {#sec-quality-control} Quality control is an area of applied statistics that makes interventions to maintain or improve the outcome of industrial processes. Random variations in output processes might negatively impact the quality of a product. We want to identify the sources of random output-process variations that might have *assignable causes*. Control charts are a tool that helps us to recognise when industrial processes are no longer controlled so that one might then seek to identify assignable causes. ## Control charts {#sec-control-charts} The essential elements of control charting involve specifying a control region and then analysing time-series data. We will specify a baseline value along with an upper and lower control limit and assume that a process is under control unless a test statistic suggests otherwise. To construct a control chart, one collects data about a process at fixed points of time and calculates the running value of a quality statistic. Suppose the quality statistic exceeds the upper or lower control limits. In that case, the process is deemed out of control, and the product quality is assumed to be negatively impacted. :::{.callout-important} ## Default position The default position adopted for quality control will be reminiscent of hypothesis testing: "assume that a process is under control unless a test statistic suggests otherwise." ::: The process of creating a control chart is best illustrated through an extended example, like @exm-qc-three-sigma provided below. :::{#exm-qc-three-sigma} Here we consider the typical $3 \sigma$ control charting for a process mean $\overline{X}$ based on estimated parameters. That is, we assume the generating process $X$ is normally distributed with unknown parameters $\mu$ and $\sigma^2$. We seek to estimate the mean $\overline{X}$. Our control region is specified to be three standard deviations; the process is in control if it remains within three standard deviations of a baseline value. :::{#nte-beer .callout-note collapse="true"} ## Beer Production Data The **Beer Production Data** contains measurements of the features OG, ABV, pH, and IBU for $50$ batches of each of three types of product (Premium Lager, IPA, and Light Lager). ```{r} #| echo: true #| warning: false #| message: false beer |> glimpse() ``` ::: Let's consider the Beer Production Data in @nte-beer. We are interested in the IPA's pH value, which influences saccharification. We assume that three batches of IPA are produced per day, and we prepare the data as follows. ```{r} #| echo: true #| warning: false #| message: false ipa <- beer |> select(Batch_Id, pH, Beer) |> filter(Beer == "IPA") |> rename(Day = Batch_Id) m <- 3 # three batches per day k <- 16 # number of days ipa$Day[1:(m*k)] <- rep(1:k, each = m) ipa <- ipa[1:(m*k),] ``` The prepared data, `ipa`, is summarized in the @tbl-qc-beer-data. ```{r} #| label: tbl-qc-beer-data #| tbl-cap: "Observations and summary statistics for the **Beer Production Data**." #| echo: true #| warning: false #| message: false ipa_stat <- ipa |> group_by(Day) |> summarise(obs = list(pH), mean = signif(mean(pH), digits = 4), sd = signif(sd(pH), digits = 4), range = max(pH) - min(pH)) ipa_stat |> kbl(align = "rcccc", booktabs = T, escape = F) |> kable_styling(latex_options = c("striped")) ``` We first observe that the pH measurements are (at least approximately) normal, as seen in the quantile-quantile plot in @fig-qc-beer-pH-norm. ```{r} #| label: fig-qc-beer-pH-norm #| fig-cap: "Normal quantile-quantile plot of observed pH measurements of the IPA batches." #| echo: true #| warning: false #| message: false ipa |> ggplot(aes(sample = pH)) + stat_qq() + stat_qq_line() ``` We consider the data for pH readings from three batches of IPA taken over sixteen days ($k = 16$) presented in @tbl-qc-beer-data. The Table includes the sample mean per day, $\overline{x}$, the sample standard deviation, $s$, and the range of values per day, $\max{x_i} - \min{x_i}$ (each based on $m=3$ batches). We estimate the mean $$ \widehat{\mu} = \frac{1}{k} \sum_{i=1}^k \overline{x}_i \,, $$ by averaging the means found for the $k$ days and, similarly, estimating the mean of the sample standard deviation, $$ \overline{s} = \frac{1}{k} \sum_{i=1}^k s_i\,, $$ by averaging the sample standard deviations for the $k$ days. It can be shown that $$ \widehat{\sigma} = \frac{\overline{S}}{a_m} $$ is an unbiased estimator of $\sigma$ where $$ a_m = \frac{\sqrt{2} \Gamma(m/2)}{\sqrt{m-1}\Gamma\left((n-1)/2\right)}\,. $$ Thus, we compute the $3\sigma$ upper and lower control limits, respectively, $$ \mathsf{UCL} = \widehat{\mu} + 3 \frac{\overline{s}}{a_m \sqrt{m}} $$ and $$ \mathsf{LCL} = \widehat{\mu} - 3 \frac{\overline{s}}{a_m \sqrt{m}} \,. $$ The computations in `R` follow, along with the resulting control chart in @fig-qc-beer-control-chart. ```{r} #| label: fig-qc-beer-control-chart #| fig-cap: "The $3\\sigma$ control chart illustrates that with respect to pH the brewing process is in-control over the selected timeframe as the observations fall within the $(\\mathsf{LCL}, \\mathsf{UCL})$ control interval." #| echo: true #| warning: false #| message: false a <- function(m){ sqrt(2) * gamma(m/2) / (sqrt(m-1) * gamma((m-1)/2)) } muhat = sum(ipa_stat$mean) / k sbar = sum(ipa_stat$sd) / k lcl = muhat - 3*sbar / (a(m) * sqrt(m)) ucl = muhat + 3*sbar / (a(m) * sqrt(m)) ggplot(ipa_stat, aes(x = Day)) + geom_point(aes(y = mean)) + geom_hline(aes(yintercept = muhat, color = "Mean"), linewidth = lsz) + geom_hline(aes(yintercept = lcl, color = "LCL"), linewidth = lsz*1.5) + geom_hline(aes(yintercept = ucl, color = "UCL"), linewidth = lsz*1.5) + ylab("pH") + theme(legend.justification = c(1,1), legend.position = c(0.9,0.9), legend.title = element_blank(), legend.box.margin = margin(c(4, 4, 4, 4), unit = "pt")) ``` From @fig-qc-beer-control-chart, we observe for each day the process is in-control as the observed mean pH values fall within the control limits $(\mathsf{LCL}, \mathsf{UCL})$. If this were not the case, our initial assumption that the process is in control would be violated. The violation of the assumption would require that we seek to identify an assignable cause for the variation. If a cause could be identified, we would need to recompute our control limits with the observations that were out of control removed. :::