Sampling Distribution of the Mean
Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a mean score for each sample. The probability distribution of this statistic is the sampling distribution of the mean.
Shape of Sampling Distribution
It is safe to assume that the shape of the sampling distribution for a mean will be close to a t distribution when the following conditions are true:
- The sampling method is simple random sampling.
- Population values are approximately normally distributed. If you are not sure whether population values are normally distributed, you can assume that they are when:
- Sample size is smaller than 15; and the plot of sample data is symmetric, unimodal, without outliers.
- Sample size is between 15 and 40; and the plot of sample data is unimodal, without outliers, and only moderately skewed.
- Sample size is greater than 40, without outliers.
The central limit theorem predicts that the sampling distribution will be approximately normally distributed when the sample size is sufficiently large. And when the sample size is large, the t distribution is almost identical to the normal distribution.
Recommendation: When sample size is large and population values are normally distributed, you could use the t distribution or normal distribution for analysis. If the population standard deviation is unknown, use the t distribution. If the population standard deviation is known, use the normal distribution.
Standard Deviation of the Sampling Distribution
Suppose we draw all possible simple random samples of size n from a population of size N. Suppose further that we compute a mean score x for each sample. In this way, we create a sampling distribution of the mean.
We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μx) is equal to the mean of the population (μ). And the standard deviation of the sampling distribution (σx) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n), as shown in the equation below:
σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
In the standard deviation formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard deviation formula can be approximated by:
σx = σ / sqrt(n).
You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.
Standard Error of the Sampling Distribution
Often, we don't know the value for population standard deviation σ. And, if we don't know the population standard deviation, we cannot compute the standard deviation of the sampling distribution of the mean (σx).
However, we can use the sample standard deviation s to estimate the unknown population standard deviation. Substituting s into the equation for σx, we get:
s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
SEm = [ s / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, n is the number of elements in the sample, and SEm is a sample estimate of σx, the standard deviation of the sampling distribution. SEm is the standard error of the sampling distribution of the mean.
And when the population size is very large relative to the sample size, the standard error formula can be approximated by:
SEm = s / sqrt(n)
In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. It will allow us to compute confidence intervals for mean scores and to test hypotheses about mean scores.
Summary of Key Points
The key takeaways from this lesson are summarized below.
-
A sampling distribution of the mean can be approximated by a t distribution when:
- The sampling method is simple random sampling.
- Population values are normally distributed.
- When sample size is large, a sampling distribution of the mean can be approximated by a normal distribution.
- The standard error of the sampling distribution of the mean can be computed from the following formula:
SEm = [ s / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
-
If population size is large relative to sample size, the standard error of the sampling distribution of the mean can be computed from the following formula:
SEm = s / sqrt(n)
A population is considered "large" if it is at least 20 times bigger than its sample.
Test Your Understanding
Here is an example to illustrate how sampling distributions are used to solve commom statistical problems. In this example, we use Stat Trek's t-Distribution Calculator to compute probabilities.
Normal Distribution Calculator
The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.
Normal Distribution Calculator
Example 1
Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds, with a standard deviation of 20
pounds. Suppose you draw a random sample of 50 students. What is the
probability that the sample mean will be less than 75 pounds?
Solution: To solve this problem, we need to define the sampling distribution of the mean. Because our sample size is relatively large (greater than 40), the central limit theorem tells us that the sampling distribution will approximate a normal distribution.
To define our normal distribution, we need to know both the mean of the sampling distribution and the standard deviation. Finding the mean of the sampling distribution is easy, since it is equal to the mean of the population. Thus, the mean of the sampling distribution is equal to 80.
The standard deviation of the sampling distribution can be computed using the following formula.
σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
σx = [ 20 / sqrt(50) ] * sqrt[ (10,000 - 50 ) / (10,000 - 1) ]
σx = (20/7.071) * (0.995) = 2.81
Let's review what we know and what we want to know. We know that the sampling distribution of the mean is normally distributed with a mean of 80 and a standard deviation of 2.81. We want to know the probability that a sample mean is less than or equal to 75 pounds.
Because we know the population standard deviation and the sample size is large, we'll use the normal distribution to find probability. To solve the problem, we plug these inputs into the Normal Distribution Calculator: mean = 80, standard deviation = 2.81, and normal random variable = 75.
The Calculator tells us that the probability that the average weight of a sampled student is less than 75 pounds is equal to 0.03759.
Note: In this problem, the population standard deviation was known and the sample size was large, so we used the normal distribution. If the standard deviation had been unknown, we would have used the t distribution with degrees of freedom equal to n minus one. In a previous lesson, we presented guidelines for choosing between the normal distribution and the t distribution.