Introduction to Multiple Regression

Simple linear regression is a technique for predicting the value of a dependent variable, based on the value of a single independent variable. Sometimes, you only need one relevant independent variable to make an accurate prediction.

Often, however, the prediction is better when you use two or more independent variables. Multiple regression is a technique for predicting the value of a dependent variable, based on the values of two or more independent variables.

The Regression Equation

This is a tutorial about linear regression, so our focus is on linear relationships between variables. The regression equation that expresses the linear relationships between a single dependent variable and one or more independent variables is:

ŷ = b₀ + b₁x₁ + b₂x₂ + … + b_k-1x_k-1 + b_kx_k

In this equation, ŷ is the predicted value of the dependent variable. Values of the k independent variables are denoted by x₁, x₂, x₃, … , x_k.

And finally, we have the b's - b₀, b₁, b₂, … , b_k. The b's are constants, called regression coefficients. Values are assigned to the b's based on the principle of least squares.

What is the Principle of Least Squares?

In multiple regression, the deviation of the actual value for a dependent variable from its predicted value is called the residual. The residual (e) for a single observation i is:

e_i = y_i - ŷ_i = y_i - ( b₀ + b₁x_1i + b₂x_2i + … + b_kx_ki )

Assume that the set of data consists of n observations. The principle of least squares requires that the sum of squared residuals for all n observations be minimized. That is, we want the following value to be as small as possible:

Σ [ y_i - ( b₀ + b₁x_1i + b₂x_2i + … + b_kx_ki ]²

Regression analysis requires that the values of b₀, b₁, … , b_k be defined to minimize the sum of the squared residuals. When we assign values to regression coefficients in this way, we are following the principle of least squares.

Normal Equations for Simple Regression

Finding the right values for regression coefficients (i.e., values that satisfy a least squares criterion) involves solving a set of linear equations. These equations can be derived using calculus, and they are called normal equations.

To illustrate the use of normal equations, let's look at simple linear regression - regression with one dependent variable (y) and one independent variable (x). With simple linear regression, the regression equation is:

ŷ = b₀ + b₁x

The normal equations for simple linear regression are:

Σ y_i = nb₀ + b₁( Σx_i )

Σ x_iy_i = b₀( Σx_i ) + b₁( Σx_i² )

Here, we have two equations with two unknowns. The unknowns are the regression coefficients b₀ and b₁. Using ordinary algebra, we can solve for b₀ and b₁. The result is:

b₁ = Σ [ (x_i - x)(y_i - y) ] / Σ [ (x_i - x)²]

b₀ = y - b₁ * x

where x is the mean x score, and y is the mean y score. Note that these are the same equations that we presented in a previous lesson, when we introduced the topic of simple linear regression.

The use of normal equations to assign values to regression coefficients becomes more complicated when there are two or more independent variables. We'll tackle that challenge in the next lesson.

Test Your Understanding

Problem 1

Which of the following statements are true?

I. A regression equation with k independent variables has k regression coefficients.
II. Regression coefficients (b_o, b₁, b₂, etc.) are variables in the regression equation.
III. The principle of least squares calls for minimizing the sum of the squared residuals.

(A) I only.
(B) II only.
(C) III only.
(D) All of the above.
(E) None of the above.

Solution

The correct answer is (C). The principle of least squares defines regression coefficients that minimize the sum of the squared residuals. A regression equation with k independent variables has k + 1 regression coefficients. For example, if there were two independent variables, there would be three regression coefficients - b_o, b₁, and b₂. And finally, regression coefficients are constants - not variables.

Last lesson Next lesson