Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Regression Analysis of Experimental Data with Excel

The analysis of variance module in Excel's Analysis Toolpak can handle analysis of variance with one or two factors (see previous lesson). But if your experiment has three or more factors, that module is no help. Luckily, the regression module in Excel's Analysis Toolpak can handle any number of factors.

In this lesson, we explain how to analyze data from a full-factorial experiment, using the regression module in Excel's Analysis Toolpak. Specifically, we explain how to:

  • Assess the statistical significance of main effects and interactions.
  • Generate a regression equation to predict dependent variable scores from selected independent variables.
  • Assess the relative importance of individual main effects and interactions.

The Analysis ToolPak

To access the regression module in Excel, you need a free Microsoft add-in called the Analysis ToolPak, which may or may not be already installed on your copy of Excel.

To determine whether you have the Analysis ToolPak, click the Data tab in the main Excel menu. If you see Data Analysis in the Analysis section, you're good. You have the ToolPak.

Excel Analysis ToolPak
Excel Analysis ToolPak

If you don't have the ToolPak, you need to get it. Go to: How to Install the Data Analysis ToolPak in Excel.

The Regression Equation

A key output from regression analysis is an equation to predict the value of a dependent variable. This regression equation is expressed as a linear model with k independent variables:

ŷ i = b0 + b1x1 + b2x2 + ... + bk-1xk-1 + bkxk

where ŷ i is the predicted dependent variable score for observation i, b0 through bk are regression coefficients, and x1 through xk are independent variable scores.

Note that independent variables can appear in the equation as main effects and/or as interaction effects. For example, suppose a researcher conducts a three-factor, completely randomized, full-factorial experiment. With regression analysis, the researcher could produce a predictive equation made up of all the main effects and all the interaction effects, as shown below:

ŷ i = b0 + b1x1 + b2x2 + b3x3 + b1,2x1x2 + b1,2x1x2 + b1,3x1x3 + b2,3x2x3 + b1,2,3x1x2x3

where xixj is a two-way interaction equal to the product of xi and xj; and xixjxk is a three-way interaction equal to the product of xi, xj, and xk.

What makes the regression equation special? The regression coefficients satisfy the principle of least squares. This means the regression coefficients are defined to minimize Σ(y - ŷ)2, where y is the actual value of a dependent variable and ŷ is the predicted value. As a result, the regression equation works better than any other linear, additive equation to minimize errors of prediction.

Note: For a discussion of the mathematics used to find coefficients that satisfy the least squares principle, visit Stat Trek's lesson on regression coefficients.

How to Use the Regression Module in Excel

To illustrate how to analyze experimental data with Excel's regression module, let's work through an example.

Problem Statement

Suppose we conduct a three-factor, completely randomized, full-factorial experiment. In this experiment, each factor has two levels; so there are eight treatment groups. Dependent variable (Y) scores were collected from two subjects in each group. Each row in the table below shows the dependent variable score and factor values for a single subject. Note that interaction values are simply products of factor values.

Grp Y Factor Interaction
A B C AB BC AC ABC
1 -3.38 -1 -1 -1 1 1 1 -1
1 3.64 -1 -1 -1 1 1 1 -1
2 0.90 -1 -1 1 1 -1 -1 1
2 4.95 -1 -1 1 1 -1 -1 1
3 -4.46 -1 1 -1 -1 -1 1 1
3 1.57 -1 1 -1 -1 -1 1 1
4 3.96 -1 1 1 -1 1 -1 -1
4 7.92 -1 1 1 -1 1 -1 -1
5 2.02 1 -1 -1 -1 1 -1 1
5 0.22 1 -1 -1 -1 1 -1 1
6 2.70 1 -1 1 -1 -1 1 -1
6 3.24 1 -1 1 -1 -1 1 -1
7 5.81 1 1 -1 1 -1 -1 -1
7 14.0 1 1 -1 1 -1 -1 -1
8 16.30 1 1 1 1 1 1 1
8 12.97 1 1 1 1 1 1 1

Use Excel's regression module to do the following:

  • Assess the statistical significance of all main effects and all interaction effects.
  • Build a predictive equation, using all main effects and all interaction effects.

Analysis of Experimental Data

Here are the steps required to conduct a regression analysis of experimental data with Excel.

  • Step 1. Enter data on an Excel spreadsheet. List data for the dependent variable (Y) in one column. List data for each main effect (A, B, and C) and for each interaction effect (AB, AC, BC, and ABC) in adjacent columns, as shown below:
  • Step 2. Access the regression module in Excel. Begin by clicking the Data tab and the Data Analysis button.
Excel Analysis ToolPak
Excel Analysis ToolPak
  • Step 3 This will open the Data Analysis dialog box. From the drop-down list, select "Regression" and click OK.
  • Step 4. Excel will display the Regression dialog box. This is where you identify data fields for the independent and dependent variables. In the Input Y Range, enter coordinates for the dependent variable. In the Input X Range, enter coordinates for the independent variable(s). If you include column labels in these input ranges, check the Labels box. In the example below, we have included labels, so the Labels box is checked. By default, Excel will produce a standard set of outputs. For this sample problem, that's all we need; so click OK to generate standard regression outputs.
  • Step 5. The main output is a regression table (shown below). The regression table includes coefficients for each term in the regression equation, as well as a P value for each effect.
  • P-values in the regression table (in the blue box) tell us that Effects A, B, C, and AB are statistically significant (P < 0.05).

    Coefficients in the regression table (in the green box) define the reqression equation that (a) satisfies the principle of least squares and (b) minimizes errors of prediction better than any other linear, additive equation:

    ŷ = 4.52 + 2.64A + 2.74B + 2.10C + 2.38AB + 0.93AC - 0.45AC - 0.21ABC

  • The coefficient of determination is 0.804. This indicates that about 80 percent of the variation in the dependent variable is accounted for by the regression equation.

Magnitude of Effects

The magnitude of an effect refers to the strength of its influence on a dependent variable. Suppose the regression output generated by Excel indicates that a main effect or an interation effect is not statistically significant. Can we conclude that its influence on the dependent variable is small? Not necessarily. An effect can be influential, yet not statistically significant.

To assess the strength of a single main effect or interaction effect from a balanced, full-factorial design, an experimenter can find the coefficient of determination when that effect is the only predictor in a regression equation. This coefficient of determination can be interpreted as the proportion of variance accounted for by that effect.

To illustrate how to find the magnitude of an effect with Excel's regression module, let's work through an example. For this exercise, we'll use the same data that we used previously to illustrate regression analysis of experimental data.

Problem Statement

Suppose we conduct a three-factor, completely randomized, full-factorial experiment. In this experiment, each factor has two levels; so there are eight treatment groups. Dependent variable (Y) scores were collected from two subjects in each group. Each row in the table below shows the dependent variable score and factor values for a single subject. Note that interaction values are simply products of factor values.

Grp Y Factor Interaction
A B C AB BC AC ABC
1 -3.38 -1 -1 -1 1 1 1 -1
1 3.64 -1 -1 -1 1 1 1 -1
2 0.90 -1 -1 1 1 -1 -1 1
2 4.95 -1 -1 1 1 -1 -1 1
3 -4.46 -1 1 -1 -1 -1 1 1
3 1.57 -1 1 -1 -1 -1 1 1
4 3.96 -1 1 1 -1 1 -1 -1
4 7.92 -1 1 1 -1 1 -1 -1
5 2.02 1 -1 -1 -1 1 -1 1
5 0.22 1 -1 -1 -1 1 -1 1
6 2.70 1 -1 1 -1 -1 1 -1
6 3.24 1 -1 1 -1 -1 1 -1
7 5.81 1 1 -1 1 -1 -1 -1
7 14.0 1 1 -1 1 -1 -1 -1
8 16.30 1 1 1 1 1 1 1
8 12.97 1 1 1 1 1 1 1

Use Excel's regression module to do find the magnitude of effect for the AB interaction.

How to Assess Magnitude of Effects

Here are the steps required to find the magnitude of an effect from a balanced, full-factorial design with Excel. In this example, we will find the magnitude of effect for the AB interaction.

  • Step 1. Enter data on an Excel spreadsheet. List data for the dependent variable (Y) in one column. List data for the predictor effect in another column. Here, we see data for the dependent variable in Column B; and data for the AB interaction in Column C.
  • Step 2. Access the regression module in Excel. Begin by clicking the Data tab and the Data Analysis button.
Excel Analysis ToolPak
Excel Analysis ToolPak
  • Step 3 This will open the Data Analysis dialog box. From the drop-down list, select "Regression" and click OK.
  • Step 4. Excel will display the Regression dialog box. This is where you identify data fields for the independent and dependent variables. In the Input Y Range, enter coordinates for the dependent variable. In the Input X Range, enter coordinates for the independent variable(s). If you include column labels in these input ranges, check the Labels box. In the example below, we have included labels, so the Labels box is checked.
  • Step 5. For this analysis, we are looking for the coefficient of determination, which appears as R Square in the Regression Statistics table.
  • The coefficient of determination is 0.177. This indicates that about 18 percent of the variation in the dependent variable is accounted for by the effect we tested, the AB interaction.

Conclusion

A researcher could regress the dependent variable against each of the other effects individually to assess the proportion of variance accounted for by each individual effect. If we did that, here is what we would find:

Effect R-square
A 0.2182
B 0.2353
C 0.1379
AB 0.1775
AC 0.0064
BC 0.0274
ABC 0.0014
Total 0.804

Individually, Effect A accounts for about 22 percent of the variance; Effect B, for about 24 percent of the variance; Effect C, for about 14 percent of the variance; and so on. In combination, all of the effects account for about 80 percent of the variance. This finding is consistent with the earlier analysis, where we defined a regression equation that used all of the effects as independent variables. That equation also accounted for about 80 percent of the variance.

Note: In a balanced, full-factorial design, no effect is confounded with any other effect. This makes it easy to assess the relative importance of individual effects. The bigger the R-square for an individual effect, the greater the magnitude of that effect.