Sampling Distributions, Central Limit Theorem, Confidence Intervals
Sampling Distributions and Confidence Intervals
-
Testing Hypothesis
-
When do we trust our data?
-
Are the differences real?
-
Is the data reliable?
-
plus or minus what?
Observed (estimated) vs Actual
-
mean: x bar vs. m (pronounced
mu)
-
standard deviation: s vs. s (pronounced
sigma)
We want to know the underlying distribution
-
Computer failures
-
Date rejections
-
Grading
-
Reaction Time
-
How many green balls are in the urn
How do we do this?
-
Get a sample
-
Is one observation sufficient?
-
How about three?
-
How about a thousand?
-
More is better
-
Bigger sample gives us a better idea of what is going on
-
one "test" could be a fluke (that's why I drop lowest)
-
maybe I should drop the highest too (ha ha)
Will some distributions demand more "tests" to zero in
on the mean?
-
yep
-
which ones?
-
ones with greater variance
What we are really talking about is understanding the
sampling distribution.
-
The distribution of the means of samples
-
if you did this exp again, would you find the same thing?
-
This might take a bit to get your head around
Sampling Distributions
-
The normal distribution can help.
-
If you survey N people, your survey will get some mean response
X1.
-
If you took another survey of N people from the same population,
this survey would have a mean X2.
-
If you took a bunch of surveys and plotted the means on a
histogram, you would find something that looked like a normal distribution
-
Even if the data you are sampling is not normally distributed.
-
Sampling Distribution of the Mean
Demo on skew and normal distribution (alter initial distribution
This is the Central Limit Theorem!
-
as the size of the samples increases, the distribution of
the means becomes normal
-
but what is the standard deviation of these sample
means?
Again, two things are going to affect the standard
deviation of the mean
-
The size of the sample
-
The standard deviation of the population measure
Lots of Means
-
This distribution of survey results would follow a normal
distribution
-
Mean = mu
-
standard deviation = sd/(N)1/2
underlying distribution
mean=100
sd=10
Increasing Sample Size
-
As N (sample size) increases, the variability in this distribution
decreases substantially.
-
By N = 1000, the true mean is quite likely to be very close
to the mean obtained in the survey
How big a sample?
-
The sampling distribution of the mean
-
Mean = mu
-
standard deviation = sd/(N)1/2
-
As N gets larger, the standard deviation of the sampling
distribution gets smaller.
-
Diminishing returns for additional observations
Where is this train taking me?
-
The normal distribution is a convenient mathematical construct,
but it may not be a good model of your data.
-
Poorly behaved distributions deviate from the normal
-
The normal distribution will come back when we talk about
inferential statistics.
Many of the statistical tests we will talk about assume
that the data being tested follow a normal distribution.
Confidence Intervals
-
Sampling distributions
-
The central limit theorem
-
Creating confidence intervals
-
Using confidence intervals
Sampling Distributions
-
Return to this topic
-
How is the sample mean related to the population mean?
-
Requires thinking about sampling distributions
-
Sampling distribution of the mean
-
Distribution of means of samples of a particular size
-
N(m,sd /n1/2)
Deconstruct this
-
If we take a lot of samples of size N
-
Look at the mean of these samples
-
What will the mean of these means be?
-
Approximately the population mean
-
How closely this approximates the population mean will depend
on the sample size
-
The larger the sample, the more accurate is the estimate
of the mean.
Central Limit Theorem
-
Tells us something neat about means
-
For any population
-
For large sample sizes, the sampling distribution of the
mean is normally distributed.
-
Even if the population values are not normally distributed.
Using Sampling Distributions
-
The sampling distribution is very useful.
-
Generally, we want to know the population mean.
-
How can we make a guess about what that mean actually is?
-
Collect a sample.
-
The mean of that sample is an estimate of the population
mean
The population mean
-
Where is the population mean likely to be?
-
You can construct an interval that is likely to contain the
population mean.
-
This interval is called a confidence interval
-
You can build an interval that is as wide as the confidence
you want to have in it.
An Example
-
Suppose you want to know the average number of raisins in
a box of Raisin Bran
-
Kelloggs has said that the there is some variability in the
number of raisins in a box
-
They list the standard deviation as 26 raisins
-
This is the population standard deviation
-
Take a sample of 100 boxes and count them
-
How confident do you want to be that the actual mean falls
in the interval?
Demo on
loose vs. stringent confidence intervals
Building Confidence Intervals
-
A procedure to use when s is known
-
For the confidence interval
-
Use the mean of the sample (263)
-
Use the standard deviation of the mean
-
Find the z-score associated with that level of confidence
-
z(95%)=1.96 (leaving 2.5% in each tail)
-
95% confidence interval (z * 2.6)
Summary
-
To construct a confidence interval when the population standard
deviation is known
-
Collect a sample
-
Find the z-score for the desired confidence
-
Leave p/2 of the probability in each tail
-
The confidence interval
-
Mean ± z * standard deviation of mean