How ANOVA analyze the variance

Often we Analysis of Variance (ANOVA) to analyze the variances to find if different cases results in similar outcome and if the difference is significant. Following are some simple examples,

  • The effect of different diets on growth of fishes
  • Comparing the height of three different species of a plant
  • Type of flour used for baking a bread

These are some common examples where in some cases data are collected by setting up an experiment and in other cases they are collected through sampling. This article tries to explain how the ANOVA analyze the variance and in what situation are they significant throught both simulated and real data.

Consider the following model with \(i=3\) groups and \(j=n\) observations,

\[y_{ij} = \mu + \tau_i + \varepsilon_{ij}, \; i = 1, 2, 3 \texttt{ and } j = 1, 2, \ldots n\]

here, \(\tau_i\) is the effect corresponding to group \(i\) and \(\varepsilon_{ij} \sim \mathrm{N}(0, \sigma^2)\), the usual assumption of linear model. In order to understand how ANOVA finds the differences between groups and how the group mean and their standard deviation influence the results from ANOVA, let us consider following four cases:

Case 1
Similar group means with high variation within the groups
Case 2
Similar group means with low variation within the groups
Case 3
Distant group means with high variation within the groups
Case 4
Distant group means with low variation within the groups

Simulating data resembling these cases

Fitting ANOVA model for each cases

Prepare data for plotting

Discussion