Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Scheffé's Test for Multiple Comparisons

The lesson is all about Scheffé's test - what it is, why it is needed, when to use it, and how to implement it.

Prerequisites: This lesson assumes familiarity with comparisons. You should know how to represent a statistical hypothesis mathematically by a comparison. You should be able to compute the sum of squares associated with a comparison. And you should understand how the probability of committing a Type I error is affected by the number of comparisons tested. If you don't know these things, review the following lessons:

  • Comparison of Treatment Means. This lesson defines an ordinary comparison. It explains how to represent a statistical hypothesis mathematically by a comparison. And it explains how to compute the sum of squares for a comparison.
  • Multiple Comparisons. This lesson describes how the probability of committing a Type I error is affected by the number of comparisons tested.

What is Scheffé's test?

Scheffé's test is a method for testing all pairwise and all non-pairwise comparisons of treatment means. Here's how it works:

  • Step 1. Set a significance level (α) for the error rate familywise. (The significance level for Scheffé's test should equal the signifcance level used for the omnibus ANOVA in Step 3.)
  • Step 2. Find the value for each comparison (Li) that you want to test.
  • Step 3. Generate an ANOVA table from a standard, omnibus analysis of variance.
  • Step 4. Use the following formula to compute a critical value for Scheffé's test of comparison Li:
      ___________________________________
    CVi =   (k - 1) F(v1, v2) MSE [ Σ(cj2 / nj ) ] 
    where CVi is the critical value for comparison Li, (k - 1) is the between groups degrees of freedom, F(v1, v2) is the F value with v1, v2 degrees of freedom and a significance level of α, v1 is degrees of freedom for the between groups factor, v2 is degrees of freedom for the mean square error, MSE is the mean square error, cj is a coefficient (weight) for treatment j in comparison Li, and nj is sample size in Group j.

    Note: To find values for the degrees of freedom and the mean squared error, refer to the ANOVA table from Step 3. To find F(v1, v2), use Stat Trek's F Distribution Calculator with the significance level from Step 1.

  • Step 5. Compare the value from Step 2 (Li) with the value from Step 4 (CVi). If Li is bigger than CVi, the comparison is statistically significant.

Why Do We Need Scheffé's test?

The Scheffé test is used mainly with post hoc comparisons in analysis of variance (ANOVA) experiments. The test is used to determine whether the mean score in one treatment group differs from the mean score in a second treatment group, or whether the mean score for one set of treatment groups differs from the mean score for a second set of treatment groups.

When to Use Scheffé's test

In some situations, Scheffé's test is a good technique for testing the statistical significance of multiple comparisons. In other situations, it is not so good.

Advantages

There are several things to like about the Scheffé test, including the following:

  • The Scheffé test can be used to make all possible comparisons among treatment means - pairwise comparisons (comparisons involving only two means) and non-pairwise comparisons (comparisons involving more than two means).
  • The Scheffé test sets the error rate familywise equal to a significance level (α) specified by the experimenter.
  • The Scheffé test can be used with unequal sample sizes between groups.
  • The Scheffé test provides a more sensitive test of non-pairwise comparisons than some other post hoc testing procedures (e.g., Tukey's HSD test).
  • When an experiment calls for many planned comparisons, the risk of Type I errors can be unacceptably high. In this situation, the Scheffé test, which controls error rate familywise, may be a good alternative to tests that are normally used for planned comparisons.

For an experimenter who wants to test a lot of comparisons post hoc (particularly non-pairwise comparisons) and still control error rate familywise, the Scheffé test is a good choice.

Disadvantages

There are several things to dislike about the Scheffé test, including the following:

  • The Scheffé test has lower statistical power than tests that are designed for planned comparisons.
  • For testing pairwise comparisons, the Scheffé test is less sensitive some other post hoc procedures (e.g., Tukey's HSD test).

Note: A good way to increase the power of the Scheffé test is to use large sample sizes.

What Do Statisticians Say?

If you ask a statistician about when to use Scheffé's test, here are some comments you might hear:

  • For post hoc testing, it only makes sense to use Scheffé's test after a significant omnibus analysis of variance. If the analysis of variance does not provide evidence of significant differences among means, there is no need to conduct follow-up tests looking for those differences.
  • For post hoc testing of many comparisons, it makes sense to use Scheffé's test. For post hoc testing of only a few comparisons, Bonferroni's correction might be the better choice.
  • For a priori testing, Scheffé's test can be an acceptable choice when the experiment calls for tests of many comparisons. When there are many comparisons to be tested, Scheffé's test might be considered a "safe" technique; because compared to other methods, it provides a reasonable balance between control of Type I errors and risk of Type II errors.

A Step-By-Step Example

In this section, we'll work through a simple example to illustrate the planning and analysis required for post hoc testing with Scheffé's test.

Experimental Design

To test the long-term effect of aerobic exercise on resting pulse rate, an investigator conducts a controlled experiment. The experiment uses a completely randomized design, consisting of three treatment groups:

  • Control. Subjects do not participate in an exercise program.
  • Low-effort. Subjects jog 1 mile on Monday, Wednesday, and Friday.
  • High-effort. Subjects jog 2 miles every day, except Sunday.

Five subjects are randomly assigned to each group; and, after 28 days of treament, their resting pulse rate is measured on day 29.

A Priori Analysis

To test planned comparisons, the investigator poses the research questions to be answered, states statistical hypotheses implied by each research question, and identifies the analytical technique(s) used to test each statistical hypothesis - all before any data is collected. Then, following data collection, data is analyzed according to plan.

Research Question

For this experiment, the researcher is initially interested in one research question. That question, and the associated statistical hypotheses, appears below:

  • Overall research question. Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group?

    H0: μi = μj

    H1: μi ≠ μj

Analytical Techniques

The overall research question asks whether the mean pulse rate in one treatment group differs from the mean pulse rate in any other group. The null hypothesis implied by this research question can be tested by an omnibus analysis of variance.

For this example, assume that the investigator specifies a significance level of 0.05 to test the statistical significance of the main research question.

Experimental Data

Pulse rate measurements for each subject in each treatment group appear below:

Table 1. Pulse Rate for Each Subject in Each Group

Group 1 (control) Group 2 (low effort) Group 3 (high effort)
80 70 50
85 75 60
90 80 70
95 85 80
100 90 90

ANOVA Results

The overall research question for a priori analysis is: Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group? The statistical hypotheses implied by that question are:

H0: μi = μj

H1: μi ≠ μj

We can test this null hypothesis with a standard, omnibus analysis of variance. Here is the ANOVA table from that analysis.

Table 2. ANOVA Summary Table

Source SS df MS F P
BG 1000 2 500 4.0 0.046
Error 1500 12 125
Total 2500 14

The P value for the between-groups (BG) effect is 0.046, which is less that the significance level of 0.05. Therefore, we reject the null hypothesis of no difference in pulse rates between treatment groups.

Note: We explained how to conduct a one-way analysis of variance in previous lessons. If you're wondering how to produce the ANOVA table shown above, see One-Way Analysis of Variance: Example or One-Way Analysis of Variance With Excel.

Post Hoc Analysis

Having ascertained through the a priori analysis that a significant difference exists among the mean scores, suppose the experimenter wants to investigate how the means differ.

Post Hoc Research Questions

For this post hoc analysis, the researcher decides to ask four follow-up questions. For each question, there is an implied statistical hypothesis which can be tested by a unique comparison. The questions, hypotheses, and comparisons appear below:

  • Follow-up question 1. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in the low-effort group (Group 2)?

    H0: μ1 = μ2

    H1: μ1 ≠ μ2

    This statistical hypothesis can be represented mathematically by the comparison L1:

    L1 = X1 - X2

  • Follow-up question 2. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in the high-effort group (Group 3)?

    H0: μ1 = μ3

    H1: μ1 ≠ μ3

    This statistical hypothesis can be represented mathematically by the comparison L2:

    L2 = X1 - X3

  • Follow-up question 3. Will mean pulse rate of subjects in the low-effort group (Group 2) differ from the mean pulse rate of subjects in the high-effort group (Group 3)?

    H0: μ2 = μ3

    H1: μ2 ≠ μ3

    This statistical hypothesis can be represented mathematically by the comparison L3:

    L3 = X2 - X3

  • Follow-up question 4. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in treatment groups (Group 2 and Group 3)?

    H0: μ1 = (μ2 + μ3) / 2

    H1: μ1 ≠ (μ2 + μ3) / 2

    This statistical hypothesis can be represented mathematically by the comparison L4:

    L4 = X1 - 0.5X2 - 0.5X3

In the equations above, X1, X2, and X3 are mean scores for Groups 1, 2, and 3, respectively.

Post Hoc Analysis With Scheffé's Test

Each null hypothesis associated with a follow-up question can be represented mathematically by a unique comparison. To determine whether to reject the null hypothesis for a follow-up question, we can test its associated comparison for statistical significance, using Scheffé's test. To illustrate the process, we'll work though Scheffé's test step-by-step.

Step 1. Specify a Significance Level

For post hoc analyses with Scheffé's test, the significance level should equal the significance level used a priori for the omnibus, analysis of variance. We used a significance level of 0.05 for the a priori analysis, so we will use a significance level of 0.05 for Scheffé's test.

Step 2. Find Comparison Values

Each comparison is a function of mean scores from treatment groups. Mean pulse rate within each group (computed from raw scores in Table 1) appears below:

Table 3. Mean Pulse Rate in Each Treatment Group

Group 1 (control) Group 2 (low effort) Group 3 (high effort)
90 80 70

Given the treatment means, it is a simple matter to compute values for each comparison, as shown below:

Table 4. Comparison Values

Comparison Value
L1 = X1 - X2 10
L2 = X1 - X3 20
L3 = X2 - X3 10
L4 = X1 - 0.5X2 - 0.5X3 15
Step 3. Generate ANOVA Table

The summary table from an omnibus analysis of variance includes two outputs that we can use to test the statistical significance of a comparison. Those outputs are (1) the value of the mean squared error and (2) the degrees of freedom for the mean squared error.

We generated the ANOVA summary table earlier, as part of the a priori analysis. For convenience, here it is again.

Table 2. ANOVA Summary Table

Source SS df MS F P
BG 1000 2 500 4.0 0.046
Error 1500 12 125
Total 2500 14
Step 4. Find the Critical Values

The critical value for Scheffé's test of comparison Li can be computed from the following formula:

  ___________________________________
CVi =   (k - 1) F(v1, v2) MSE [ Σ(cj2 / nj ) ] 

where CVi is the critical value for comparison Li, (k - 1) is the between groups degrees of freedom, F(v1, v2) is the F value with v1, v2 degrees of freedom, v1 is degrees of freedom for the between groups factor, v2 is degrees of freedom for the mean square error, MSE is the mean square error, cj is a coefficient (weight) for treatment j in comparison Li, and nj is sample size in Group j.

To find values for the degrees of freedom and the mean squared error, refer to the ANOVA table. From the table, we see that v1 equals 2, v2 equals 12, and the mean squared error equals 125.

To find F(v1, v2), use Stat Trek's F Distribution Calculator. In the field for the numerator degrees of freedom, enter 2. In the field for the denominator degrees of freedom, enter 12. And in the field for P(F≤f), enter 1 - α which is 1 - 0.05 or 0.95; Then, click the Calculate button.

F distribution calculator

From the calculator, we see that F(2,12) equals about 3.89 when the significance level (α) is 0.05. At last, we have all the values we need to compute a critical value for each comparison:

  _________________________________
CVi =  (k - 1) F(v1, v2) MSE [ Σ(cj2 / nj ) ]
  ___________________________
CV1 =  2 * 3.89 * 125 * (0.2 + 0.2) = 19.7
  ___________________________
CV2 =  2 * 3.89 * 125 * (0.2 + 0.2) = 19.7
  ___________________________
CV3 =  2 * 3.89 * 125 * (0.2 + 0.2) = 19.7
  ____________________________________
CV4 =  2 * 3.89 * 125 * (0.2 + 0.05 + 0.05) = 17
Step 5. Test Hypotheses

To test the statistical significance of each comparison, we compare the value of the comparison (Li from Step 2) with the critical value for the comparison (CVi from Step 4). If Li is bigger than CVi, the comparison is statistically significant.

Table 5 shows Scheffé test results for each comparison.

Table 5. Scheffé Test Results

Comparison Li value CVi value Conclusion
X1 - X2 10 19.7 Not significant
X1 - X3 20 19.7 Significant
X2 - X3 10 19.7 Not significant
X1 - 0.5X2 - 0.5X3 15 17.0 Not significant

The second comparison is statistically significant, since L2 is bigger than CV2. The second comparison measures the difference between resting pulse rate in the control group (Group 1) and resting pulse rate in the high-effort group (Group 3). From this post hoc analysis, we conclude that the high-effort treatment has a significant effect on resting pulse rate.

None of the other comparisons are statistically significant.

Note: In a previous lesson, we tested the fourth comparison as part of a planned analysis and found it to be statistically significant. This illustrates the value of deciding in advance which comparisons to test. When the number of hypotheses tested is small, a priori tests (like the F ratio) tend to be more sensitive than post hoc tests (like the Scheffé test).