Distributions and Variability
Distributions
-
What to do if you actually collect data?
-
Variables in statistics
-
What is a distribution?
-
Visualizing distributions
-
A lot of this should be familiar
What do we do with data?
-
Imagine you collect a lot of data on how long it takes to
press a button after a flash of light.
Now what?
What have we got here?
-
What are the entries in this table?
-
Variables
-
Something that can be expressed as a number.
-
Value
-
Numerical value taken on by that variable
-
What variable are we dealing with here?
-
What are the units of measurement?
Types of Variables
-
Quantitative variables
-
A quantitative variable is one for which mathematical operations
make sense
-
Response time is a good quantitative variable
-
Quantitative variables will often be measured in experiments
-
Categorical variables
-
Define groups or classes in the data
-
Gender or Year in college are good categorical variables
-
Categorical variables often correspond to things manipulated
in experiments.
Variability
-
The difficult thing about analyzing data is that not all
of the data is the same.
What do we do if all of the data points are not identical?
How do we understand the variability in the data?
Sorting the data can help
-
The data may look different when sorted.
Still, there seem to be a lot of numbers here.
Graphics in data analysis
-
It is often helpful to graph the data in some way.
-
Humans are visual creatures.
-
Patterns become evident in graphs.
-
One simple type of graph is the stemplot
Constructing a Stemplot
16 | 12344688899
17 | 1122233333333344455677788899
18 | 01234455
19 |
20 | 1
-
The spread of the numbers is a distribution.
-
Think in distributions always
-
It makes you smarter and wiser
-
Think about what is possible
-
Think about how likely the possibilities are.
-
That is what a distribution is -- the overall pattern.
Qualities of Distributions to keep in mind.
-
What is the center?
-
What is shape?
-
What is the spread?
-
Are there outliers?
Aspects of the stemplot
-
The stemplot is good for looking at a small set of numbers.
-
We can see whether the distribution is symmetric or
skewed.
-
Unimodal?
-
We can find any potential outliers.
16 | 12344688899
17 | 1122233333333344455677788899
18 | 01234455
19 |
20 | 1
Splitting the Stems
Don't get hung-up on the rules....
16 | 12344
16 | 688899
17 | 11222333333333444
17 | 55677788899
18 | 012344
18 | 55
19 |
19 |
20 | 1
What if you have too much data?
-
Stemplots quickly start to look crowded.
-
If there is too much data, use a histogram instead.
-
A histogram is a bar-graph with frequency along the y-axis
A Histogram
Here's
an interactive demo that shows how bin size affects histograms.
Here's
a more recent example.
So, what does this mean?
-
Interpreting data is not a mechanical process.
-
Interpretation involves thinking both about the data and
about how they were obtained.
-
How were the data collected?
-
What are the units of measurement?
-
How much information is contained in those units?
-
Are there any distinctive patterns in the data?
-
Is the distribution symmetric or skewed?
-
Are there any outliers?
Are the data well-behaved?
-
It is important to look at the shape of the data.
-
Many statistics that we will see will assume that the data
are symmetric with a peak in the middle.
-
Why is this important?
-
Statistics like the mean (average) provide information about
the central tendency of the distribution
-
If the distribution is not well-behaved, statistics like
the mean will provide little information.
Other Ways of Looking at Data
Other Plots
-
You can look at data in any way.
-
Time plots
-
Plot observations by the time they were taken
-
Can reveal patterns related to timing of events
-
Common for showing practice effects.
Time Plot (Practive Effect)
Summary
-
Data (and life in general) come in distributions
-
An important first step in analyzing data is to plot and
look at the data
-
Patterns may become evident
-
Outliers may become apparent
-
There are no rules for plotting data
-
Find ways to look at data that are revealing
-
Good analysis takes practice
-
Don't be frustrated if it takes time to develop the skill.