Correlation
-
Problems with regression
-
Correlation
-
Using correlation
Here's a Scenario
-
Market research firm
-
Studying preference for a new type of computer
-
What is the relationship between people's degree of computer
expertise and their preference for the new brand?
-
Get a measure of their preference
-
Self-rating of expertise
Expertise vs Preference
Correlation
-
Correlation is a measure of mathematical association.
-
There are man possible correlations we might measure.
-
Typically, when we talk about correlation, we are referring
to Pearson's correlation coefficient (called r)
-
Correlation is defined as:
WOW, we can make sense of the equation! It's like calculating
a t.
What is going on?
-
Each observation has its mean removed
-
Each observation is divided by its standard deviation
-
The resulting value has no units.
-
What are the properties of the correlation
-
Range
-
-1: Perfect negative relationship
-
1: Perfect positive relationship
-
0: No relationship at all.
A Positive Relationship (r=.54)
A Negative Relationship (r=-.62)
No Relationship (r=-.08)
Important Things to Know
-
Correlation measures the linear association between the variables.
-
No good for nonlinear associations
-
The correlation coefficient is not affected b changes in
the units of measurement of the variables
-
A correlation of 1 or -1 indicates that the observed points
all fall on a straight line.
What does this value signify?
-
The square of the correlation is the proportion of variance
in one variable that can be accounted for b the other.
-
This value is abbreviated r2
-
What does this mean?
-
How much of the variability in a variable can be predicted
from the regression line?
-
How much comes from other factors?
-
How far do the points lie from the regression line?
Proportion of Variance (r=.54)
Resistance
-
Regression and correlation coefficients are not resistant
-
The are strongly influenced b outliers
-
Must graph our data to ensure that the effects our see are
not due to outliers.
Resistance (r=1.0 vs r=.54)
Demo on
outliers in correlation/regression
Extrapolation
-
A strong relationship suggests that you can predict the value
of one variable from the value of another.
-
Prediction is only valid within the range for which measurements
were taken.
-
Consider height and basketball ability
-
True for correlation and regression
Interpreting Correlations
-
A correlation coefficient is just a description of our data
-
What it means depends on how data were collected.
-
All the correlation implies is a numerical relationship
-
Imagine we found a positive correlation between atmospheric
pollution levels and murder rates for 100 counties in the United States.
-
Why would this be?
-
The third variable problem
Summary
-
Correlation measures linear association
-
Units are stripped off the measurements
-
Correlation ranges from -1 to +1
-
Correlations must be interpreted in light of the way the
data were collected.