Scheffé's Test for Multiple Comparisons

The lesson is all about Scheffé's test - what it is, why it is needed, when to use it, and how to implement it.

Prerequisites: This lesson assumes familiarity with comparisons. You should know how to represent a statistical hypothesis mathematically by a comparison. You should be able to compute the sum of squares associated with a comparison. And you should understand how the probability of committing a Type I error is affected by the number of comparisons tested. If you don't know these things, review the following lessons:

Comparison of Treatment Means. This lesson defines an ordinary comparison. It explains how to represent a statistical hypothesis mathematically by a comparison. And it explains how to compute the sum of squares for a comparison.
Multiple Comparisons. This lesson describes how the probability of committing a Type I error is affected by the number of comparisons tested.

What is Scheffé's test?

Scheffé's test is a method for testing all pairwise and all non-pairwise comparisons of treatment means. Here's how it works:

Step 1. Set a significance level (α) for the error rate familywise. (The significance level for Scheffé's test should equal the signifcance level used for the omnibus ANOVA in Step 3.)
Step 2. Find the value for each comparison (L_i) that you want to test.
Step 3. Generate an ANOVA table from a standard, omnibus analysis of variance.

Step 4. Use the following formula to compute a critical value for Scheffé's test of comparison L_i:

	___________________________________
CV_i = √	(k - 1) F(v₁, v₂) MSE [ Σ(c_j² / n_j ) ]

where CV_i is the critical value for comparison L_i, (k - 1) is the between groups degrees of freedom, F(v₁, v₂) is the F value with v₁, v₂ degrees of freedom and a significance level of α, v₁ is degrees of freedom for the between groups factor, v₂ is degrees of freedom for the mean square error, MSE is the mean square error, c_j is a coefficient (weight) for treatment j in comparison L_i, and n_j is sample size in Group j.

Note: To find values for the degrees of freedom and the mean squared error, refer to the ANOVA table from Step 3. To find F(v₁, v₂), use Stat Trek's F Distribution Calculator with the significance level from Step 1.

Step 5. Compare the value from Step 2 (L_i) with the value from Step 4 (CV_i). If L_i is bigger than CV_i, the comparison is statistically significant.

Why Do We Need Scheffé's test?

The Scheffé test is used mainly with post hoc comparisons in analysis of variance (ANOVA) experiments. The test is used to determine whether the mean score in one treatment group differs from the mean score in a second treatment group, or whether the mean score for one set of treatment groups differs from the mean score for a second set of treatment groups.

When to Use Scheffé's test

In some situations, Scheffé's test is a good technique for testing the statistical significance of multiple comparisons. In other situations, it is not so good.

Advantages

There are several things to like about the Scheffé test, including the following:

The Scheffé test can be used to make all possible comparisons among treatment means - pairwise comparisons (comparisons involving only two means) and non-pairwise comparisons (comparisons involving more than two means).
The Scheffé test sets the error rate familywise equal to a significance level (α) specified by the experimenter.
The Scheffé test can be used with unequal sample sizes between groups.
The Scheffé test provides a more sensitive test of non-pairwise comparisons than some other post hoc testing procedures (e.g., Tukey's HSD test).
When an experiment calls for many planned comparisons, the risk of Type I errors can be unacceptably high. In this situation, the Scheffé test, which controls error rate familywise, may be a good alternative to tests that are normally used for planned comparisons.

For an experimenter who wants to test a lot of comparisons post hoc (particularly non-pairwise comparisons) and still control error rate familywise, the Scheffé test is a good choice.

Disadvantages

There are several things to dislike about the Scheffé test, including the following:

The Scheffé test has lower statistical power than tests that are designed for planned comparisons.
For testing pairwise comparisons, the Scheffé test is less sensitive some other post hoc procedures (e.g., Tukey's HSD test).

Note: A good way to increase the power of the Scheffé test is to use large sample sizes.

What Do Statisticians Say?

If you ask a statistician about when to use Scheffé's test, here are some comments you might hear:

For post hoc testing, it only makes sense to use Scheffé's test after a significant omnibus analysis of variance. If the analysis of variance does not provide evidence of significant differences among means, there is no need to conduct follow-up tests looking for those differences.
For post hoc testing of many comparisons, it makes sense to use Scheffé's test. For post hoc testing of only a few comparisons, Bonferroni's correction might be the better choice.
For a priori testing, Scheffé's test can be an acceptable choice when the experiment calls for tests of many comparisons. When there are many comparisons to be tested, Scheffé's test might be considered a "safe" technique; because compared to other methods, it provides a reasonable balance between control of Type I errors and risk of Type II errors.

A Step-By-Step Example

In this section, we'll work through a simple example to illustrate the planning and analysis required for post hoc testing with Scheffé's test.

Experimental Design

To test the long-term effect of aerobic exercise on resting pulse rate, an investigator conducts a controlled experiment. The experiment uses a completely randomized design, consisting of three treatment groups:

Control. Subjects do not participate in an exercise program.
Low-effort. Subjects jog 1 mile on Monday, Wednesday, and Friday.
High-effort. Subjects jog 2 miles every day, except Sunday.

Five subjects are randomly assigned to each group; and, after 28 days of treament, their resting pulse rate is measured on day 29.

A Priori Analysis

To test planned comparisons, the investigator poses the research questions to be answered, states statistical hypotheses implied by each research question, and identifies the analytical technique(s) used to test each statistical hypothesis - all before any data is collected. Then, following data collection, data is analyzed according to plan.

Research Question

For this experiment, the researcher is initially interested in one research question. That question, and the associated statistical hypotheses, appears below:

Overall research question. Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group?
H₀: μ_i = μ_j

H₁: μ_i ≠ μ_j

Analytical Techniques

The overall research question asks whether the mean pulse rate in one treatment group differs from the mean pulse rate in any other group. The null hypothesis implied by this research question can be tested by an omnibus analysis of variance.

For this example, assume that the investigator specifies a significance level of 0.05 to test the statistical significance of the main research question.

Experimental Data

Pulse rate measurements for each subject in each treatment group appear below:

Table 1. Pulse Rate for Each Subject in Each Group

Group 1 (control)	Group 2 (low effort)	Group 3 (high effort)
80	70	50
85	75	60
90	80	70
95	85	80
100	90	90

ANOVA Results

The overall research question for a priori analysis is: Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group? The statistical hypotheses implied by that question are:

H₀: μ_i = μ_j

H₁: μ_i ≠ μ_j

We can test this null hypothesis with a standard, omnibus analysis of variance. Here is the ANOVA table from that analysis.

Table 2. ANOVA Summary Table

Source	SS	df	MS	F	P
BG	1000	2	500	4.0	0.046
Error	1500	12	125
Total	2500	14

The P value for the between-groups (BG) effect is 0.046, which is less that the significance level of 0.05. Therefore, we reject the null hypothesis of no difference in pulse rates between treatment groups.

Note: We explained how to conduct a one-way analysis of variance in previous lessons. If you're wondering how to produce the ANOVA table shown above, see One-Way Analysis of Variance: Example or One-Way Analysis of Variance With Excel.

Post Hoc Analysis

Having ascertained through the a priori analysis that a significant difference exists among the mean scores, suppose the experimenter wants to investigate how the means differ.

Post Hoc Research Questions

For this post hoc analysis, the researcher decides to ask four follow-up questions. For each question, there is an implied statistical hypothesis which can be tested by a unique comparison. The questions, hypotheses, and comparisons appear below:

Follow-up question 1. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in the low-effort group (Group 2)?
H₀: μ₁ = μ₂

H₁: μ₁ ≠ μ₂
This statistical hypothesis can be represented mathematically by the comparison L₁:
L₁ = X₁ - X₂
Follow-up question 2. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in the high-effort group (Group 3)?
H₀: μ₁ = μ₃

H₁: μ₁ ≠ μ₃
This statistical hypothesis can be represented mathematically by the comparison L₂:
L₂ = X₁ - X₃
Follow-up question 3. Will mean pulse rate of subjects in the low-effort group (Group 2) differ from the mean pulse rate of subjects in the high-effort group (Group 3)?
H₀: μ₂ = μ₃

H₁: μ₂ ≠ μ₃
This statistical hypothesis can be represented mathematically by the comparison L₃:
L₃ = X₂ - X₃
Follow-up question 4. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in treatment groups (Group 2 and Group 3)?
H₀: μ₁ = (μ₂ + μ₃) / 2

H₁: μ₁ ≠ (μ₂ + μ₃) / 2
This statistical hypothesis can be represented mathematically by the comparison L₄:
L₄ = X₁ - 0.5X₂ - 0.5X₃

In the equations above, X₁, X₂, and X₃ are mean scores for Groups 1, 2, and 3, respectively.

Post Hoc Analysis With Scheffé's Test

Each null hypothesis associated with a follow-up question can be represented mathematically by a unique comparison. To determine whether to reject the null hypothesis for a follow-up question, we can test its associated comparison for statistical significance, using Scheffé's test. To illustrate the process, we'll work though Scheffé's test step-by-step.

Step 1. Specify a Significance Level

For post hoc analyses with Scheffé's test, the significance level should equal the significance level used a priori for the omnibus, analysis of variance. We used a significance level of 0.05 for the a priori analysis, so we will use a significance level of 0.05 for Scheffé's test.

Step 2. Find Comparison Values

Each comparison is a function of mean scores from treatment groups. Mean pulse rate within each group (computed from raw scores in Table 1) appears below:

Table 3. Mean Pulse Rate in Each Treatment Group

Group 1 (control)	Group 2 (low effort)	Group 3 (high effort)
90	80	70

Given the treatment means, it is a simple matter to compute values for each comparison, as shown below:

Table 4. Comparison Values

Comparison	Value
L₁ = X₁ - X₂	10
L₂ = X₁ - X₃	20
L₃ = X₂ - X₃	10
L₄ = X₁ - 0.5X₂ - 0.5X₃	15

Step 3. Generate ANOVA Table

The summary table from an omnibus analysis of variance includes two outputs that we can use to test the statistical significance of a comparison. Those outputs are (1) the value of the mean squared error and (2) the degrees of freedom for the mean squared error.

We generated the ANOVA summary table earlier, as part of the a priori analysis. For convenience, here it is again.

Table 2. ANOVA Summary Table

Source	SS	df	MS	F	P
BG	1000	2	500	4.0	0.046
Error	1500	12	125
Total	2500	14

Step 4. Find the Critical Values

The critical value for Scheffé's test of comparison L_i can be computed from the following formula:

	___________________________________
CV_i = √	(k - 1) F(v₁, v₂) MSE [ Σ(c_j² / n_j ) ]

where CV_i is the critical value for comparison L_i, (k - 1) is the between groups degrees of freedom, F(v₁, v₂) is the F value with v₁, v₂ degrees of freedom, v₁ is degrees of freedom for the between groups factor, v₂ is degrees of freedom for the mean square error, MSE is the mean square error, c_j is a coefficient (weight) for treatment j in comparison L_i, and n_j is sample size in Group j.

To find values for the degrees of freedom and the mean squared error, refer to the ANOVA table. From the table, we see that v₁ equals 2, v₂ equals 12, and the mean squared error equals 125.

To find F(v₁, v₂), use Stat Trek's F Distribution Calculator. In the field for the numerator degrees of freedom, enter 2. In the field for the denominator degrees of freedom, enter 12. And in the field for P(F≤f), enter 1 - α which is 1 - 0.05 or 0.95; Then, click the Calculate button.

From the calculator, we see that F(2,12) equals about 3.89 when the significance level (α) is 0.05. At last, we have all the values we need to compute a critical value for each comparison:

	_________________________________
CV_i = √	(k - 1) F(v₁, v₂) MSE [ Σ(c_j² / n_j ) ]
	___________________________
CV₁ = √	2 * 3.89 * 125 * (0.2 + 0.2) = 19.7
	___________________________
CV₂ = √	2 * 3.89 * 125 * (0.2 + 0.2) = 19.7
	___________________________
CV₃ = √	2 * 3.89 * 125 * (0.2 + 0.2) = 19.7
	____________________________________
CV₄ = √	2 * 3.89 * 125 * (0.2 + 0.05 + 0.05) = 17

Step 5. Test Hypotheses

To test the statistical significance of each comparison, we compare the value of the comparison (L_i from Step 2) with the critical value for the comparison (CV_i from Step 4). If L_i is bigger than CV_i, the comparison is statistically significant.

Table 5 shows Scheffé test results for each comparison.

Table 5. Scheffé Test Results

Comparison	L_i value	CV_i value	Conclusion
X₁ - X₂	10	19.7	Not significant
X₁ - X₃	20	19.7	Significant
X₂ - X₃	10	19.7	Not significant
X₁ - 0.5X₂ - 0.5X₃	15	17.0	Not significant

The second comparison is statistically significant, since L₂ is bigger than CV₂. The second comparison measures the difference between resting pulse rate in the control group (Group 1) and resting pulse rate in the high-effort group (Group 3). From this post hoc analysis, we conclude that the high-effort treatment has a significant effect on resting pulse rate.

None of the other comparisons are statistically significant.

Note: In a previous lesson, we tested the fourth comparison as part of a planned analysis and found it to be statistically significant. This illustrates the value of deciding in advance which comparisons to test. When the number of hypotheses tested is small, a priori tests (like the F ratio) tend to be more sensitive than post hoc tests (like the Scheffé test).

Last lesson Next lesson