Statistics Formulas Used on Stat Trek
This web page lists statistics formulas used in the Stat Trek tutorials. Each formula links to a web page that explains how to use the formula.
Parameters
 Population mean = μ = ( Σ X_{i} ) / N
 Population standard deviation = σ = sqrt [ Σ ( X_{i}  μ )^{2} / N ]
 Population variance = σ^{2} = Σ ( X_{i}  μ )^{2} / N
 Variance of population proportion = σ_{P}^{2} = PQ / n
 Standardized score = Z = (X  μ) / σ
 Population correlation coefficient = ρ = [ 1 / N ] * Σ { [ (X_{i}  μ_{X}) / σ_{x} ] * [ (Y_{i}  μ_{Y}) / σ_{y} ] }
Statistics
Unless otherwise noted, these formulas assume simple random sampling.
 Sample mean = x = ( Σ x_{i} ) / n
 Sample standard deviation = s = sqrt [ Σ ( x_{i}  x )^{2} / ( n  1 ) ]
 Sample variance = s^{2} = Σ ( x_{i}  x )^{2} / ( n  1 )
 Variance of sample proportion = s_{p}^{2} = pq / (n  1)
 Pooled sample proportion = p = (p_{1} * n_{1} + p_{2} * n_{2}) / (n_{1} + n_{2})
 Pooled sample standard deviation = s_{p} = sqrt [ (n_{1}  1) * s_{1}^{2} + (n_{2}  1) * s_{2}^{2} ] / (n_{1} + n_{2}  2) ]
 Sample correlation coefficient = r = [ 1 / (n  1) ] * Σ { [ (x_{i}  x) / s_{x} ] * [ (y_{i}  y) / s_{y} ] }
Correlation
 Pearson productmoment correlation = r = Σ (xy) / sqrt [ ( Σ x^{2} ) * ( Σ y^{2} ) ]
 Linear correlation (sample data) = r = [ 1 / (n  1) ] * Σ { [ (x_{i}  x) / s_{x} ] * [ (y_{i}  y) / s_{y} ] }
 Linear correlation (population data) = ρ = [ 1 / N ] * Σ { [ (X_{i}  μ_{X}) / σ_{x} ] * [ (Y_{i}  μ_{Y}) / σ_{y} ] }
Simple Linear Regression
 Simple linear regression line: ŷ = b_{0} + b_{1}x
 Regression coefficient = b_{1} = Σ [ (x_{i}  x) (y_{i}  y) ] / Σ [ (x_{i}  x)^{2}]
 Regression slope intercept = b_{0} = y  b_{1} * x
 Regression coefficient = b_{1} = r * (s_{y} / s_{x})
 Standard error of regression slope = s_{b1} = sqrt [ Σ(y_{i}  ŷ_{i})^{2} / (n  2) ] / sqrt [ Σ(x_{i}  x)^{2} ]
Counting
 n factorial: n! = n * (n1) * (n  2) * . . . * 3 * 2 * 1. By convention, 0! = 1.
 Permutations of n things, taken r at a time: _{n}P_{r} = n! / (n  r)!
 Combinations of n things, taken r at a time: _{n}C_{r} = n! / r!(n  r)! = _{n}P_{r} / r!
Probability
 Rule of addition: P(A ∪ B) = P(A) + P(B)  P(A ∩ B)
 Rule of multiplication: P(A ∩ B) = P(A) P(BA)
 Rule of subtraction: P(A') = 1  P(A)
Random Variables
In the following formulas, X and Y are random variables, and a and b are constants.
 Expected value of X = E(X) = μ_{x} = Σ [ x_{i} * P(x_{i}) ]
 Variance of X = Var(X) = σ^{2} = Σ [ x_{i}  E(x) ]^{2} * P(x_{i}) = Σ [ x_{i}  μ_{x} ]^{2} * P(x_{i})
 Normal random variable = zscore = z = (X  μ)/σ
 Chisquare statistic = Χ^{2} = [ ( n  1 ) * s^{2} ] / σ^{2}
 f statistic = f = [ s_{1}^{2}/σ_{1}^{2} ] / [ s_{2}^{2}/σ_{2}^{2} ]
 Expected value of sum of random variables = E(X + Y) = E(X) + E(Y)
 Expected value of difference between random variables = E(X  Y) = E(X)  E(Y)
 Variance of the sum of independent random variables = Var(X + Y) = Var(X) + Var(Y)
 Variance of the difference between independent random variables = Var(X  Y) = Var(X) + Var(Y)
Sampling Distributions
 Mean of sampling distribution of the mean = μ_{x} = μ
 Mean of sampling distribution of the proportion = μ_{p} = P
 Standard deviation of proportion = σ_{p} = sqrt[ P * (1  P)/n ] = sqrt( PQ / n )
 Standard deviation of the mean = σ_{x} = σ/sqrt(n)
 Standard deviation of difference of sample means = σ_{d} = sqrt[ (σ_{1}^{2} / n_{1}) + (σ_{2}^{2} / n_{2}) ]
 Standard deviation of difference of sample proportions = σ_{d} = sqrt{ [P_{1}(1  P_{1}) / n_{1}] + [P_{2}(1  P_{2}) / n_{2}] }
Standard Error
 Standard error of proportion = SE_{p} = s_{p} = sqrt[ p * (1  p)/n ] = sqrt( pq / n )
 Standard error of difference for proportions = SE_{p} = s_{p} = sqrt{ p * ( 1  p ) * [ (1/n_{1}) + (1/n_{2}) ] }
 Standard error of the mean = SE_{x} = s_{x} = s/sqrt(n)
 Standard error of difference of sample means = SE_{d} = s_{d} = sqrt[ (s_{1}^{2} / n_{1}) + (s_{2}^{2} / n_{2}) ]
 Standard error of difference of paired sample means = SE_{d} = s_{d} = { sqrt [ (Σ(d_{i}  d)^{2} / (n  1) ] } / sqrt(n)
 Pooled sample standard error = s_{pooled} = sqrt [ (n_{1}  1) * s_{1}^{2} + (n_{2}  1) * s_{2}^{2} ] / (n_{1} + n_{2}  2) ]
 Standard error of difference of sample proportions = s_{d} = sqrt{ [p_{1}(1  p_{1}) / n_{1}] + [p_{2}(1  p_{2}) / n_{2}] }
Discrete Probability Distributions
 Binomial formula: P(X = x) = b(x; n, P) = _{n}C_{x} * P^{x} * (1  P)^{n  x} = _{n}C_{x} * P^{x} * Q^{n  x}
 Mean of binomial distribution = μ_{x} = n * P
 Variance of binomial distribution = σ_{x}^{2} = n * P * ( 1  P )
 Negative Binomial formula: P(X = x) = b*(x; r, P) = _{x1}C_{r1} * P^{r} * (1  P)^{x  r}
 Mean of negative binomial distribution = μ_{x} = rQ / P
 Variance of negative binomial distribution = σ_{x}^{2} = r * Q / P^{2}
 Geometric formula: P(X = x) = g(x; P) = P * Q^{x  1}
 Mean of geometric distribution = μ_{x} = Q / P
 Variance of geometric distribution = σ_{x}^{2} = Q / P^{2}
 Hypergeometric formula: P(X = x) = h(x; N, n, k) = [ _{k}C_{x} ] [ _{Nk}C_{nx} ] / [ _{N}C_{n} ]
 Mean of hypergeometric distribution = μ_{x} = n * k / N
 Variance of hypergeometric distribution = σ_{x}^{2} = n * k * ( N  k ) * ( N  n ) / [ N^{2} * ( N  1 ) ]
 Poisson formula: P(x; μ) = (e^{μ}) (μ^{x}) / x!
 Mean of Poisson distribution = μ_{x} = μ
 Variance of Poisson distribution = σ_{x}^{2} = μ
 Multinomial formula: P = [ n! / ( n_{1}! * n_{2}! * ... n_{k}! ) ] * ( p_{1}^{n1} * p_{2}^{n2} * . . . * p_{k}^{nk} )
Linear Transformations
For the following formulas, assume that Y is a linear transformation of the random variable X, defined by the equation: Y = aX + b.
 Mean of a linear transformation = E(Y) = Y = aX + b.
 Variance of a linear transformation = Var(Y) = a^{2} * Var(X).
 Standardized score = z = (x  μ_{x}) / σ_{x}.
 t statistic = t = (x  μ_{x}) / [ s/sqrt(n) ].
Estimation
 Confidence interval: Sample statistic + Critical value * Standard error of statistic
 Margin of error = (Critical value) * (Standard deviation of statistic)
 Margin of error = (Critical value) * (Standard error of statistic)
Hypothesis Testing
 Standardized test statistic = (Statistic  Parameter) / (Standard deviation of statistic)
 Onesample ztest for proportions: zscore = z = (p  P_{0}) / sqrt( p * q / n )
 Twosample ztest for proportions: zscore = z = z = [ (p_{1}  p_{2})  d ] / SE
 Onesample ttest for means: t statistic = t = (x  μ) / SE
 Twosample ttest for means: t statistic = t = [ (x_{1}  x_{2})  d ] / SE
 Matchedsample ttest for means: t statistic = t = [ (x_{1}  x_{2})  D ] / SE = (d  D) / SE
 Chisquare test statistic = Χ^{2} = Σ[ (Observed  Expected)^{2} / Expected ]
Degrees of Freedom
The correct formula for degrees of freedom (DF) depends on the situation (the nature of the test statistic, the number of samples, underlying assumptions, etc.).
 Onesample ttest: DF = n  1
 Twosample ttest: DF = (s_{1}^{2}/n_{1} + s_{2}^{2}/n_{2})^{2} / { [ (s_{1}^{2} / n_{1})^{2} / (n_{1}  1) ] + [ (s_{2}^{2} / n_{2})^{2} / (n_{2}  1) ] }
 Twosample ttest, pooled standard error: DF = n_{1} + n_{2}  2
 Simple linear regression, test slope: DF = n  2
 Chisquare goodness of fit test: DF = k  1
 Chisquare test for homogeneity: DF = (r  1) * (c  1)
 Chisquare test for independence: DF = (r  1) * (c  1)
Sample Size
Below, the first two formulas find the smallest sample sizes required to achieve a fixed margin of error, using simple random sampling. The third formula assigns sample to strata, based on a proportionate design. The fourth formula, Neyman allocation, uses stratified sampling to minimize variance, given a fixed sample size. And the last formula, optimum allocation, uses stratified sampling to minimize variance, given a fixed budget.
 Mean (simple random sampling): n = { z^{2} * σ^{2} * [ N / (N  1) ] } / { ME^{2} + [ z^{2} * σ^{2} / (N  1) ] }
 Proportion (simple random sampling): n = [ ( z^{2} * p * q ) + ME^{2} ] / [ ME^{2} + z^{2} * p * q / N ]
 Proportionate stratified sampling: n_{h} = ( N_{h} / N ) * n
 Neyman allocation (stratified sampling): n_{h} = n * ( N_{h} * σ_{h} ) / [ Σ ( N_{i} * σ_{i} ) ]

Optimum allocation (stratified sampling):
n_{h} = n * [ ( N_{h} * σ_{h} ) / sqrt( c_{h} ) ] / [ Σ ( N_{i} * σ_{i} ) / sqrt( c_{i} ) ]