Regression
-
Modeling data in a scatterplot
-
Linear regression
-
Measures of goodness
The Deal
-
Staring at a scatterplot gives us a sense of the relationship
between variables.
-
How can we give a more precise description of the relationship
between variables?
-
Linear regression
-
Draws a straight line on the data
-
The data is called a regression line
-
The question is, which line is the best line?
Lines (r=.99)
Deviations-Residuals
-
How far from the regression line?
-
Less implies higher correlation
-
More Variance accounted for
Which line do we draw?
-
Draw the line that minimizes the squared deviation from the
points to the line
-
Deviation = observed y - predicted y
-
Least-squares regression
-
How do we do that?
-
The equation for a line
-
y = a + bx
-
y is the variable to be predicted
-
x is the predictor variable
-
a is the intercept of the line
-
b is the slope of the line
Actually Fitting a Line
-
You Don't have to guess.
-
predicted y=a + bx
-
b=r*(sy/sx)
-
a=y bar - b*x bar
-
line goes through (x bar, y bar)
-
It makes sense
A Regression Line
predicted y=64.9+.63*x
Some Irregular Data (r=.05)
What do we use this line for?
-
The regression line provides a model of the data.
-
We can do three things with this line.
-
How good a model is the line?
-
Use the line as a quantitative description of the data.
-
Predicting values not given.
Accounting for Variance
-
If the model (the regression line) is a good one, then the
line will account for most of the variance.
-
Most of the points will fall around the line.
-
The residuals will generally be small.
-
If the model is a poor one, then the line will account for
very little of the variance.
-
Most of the points will fall far from the line
-
The residuals will generally be large.
-
There is a measure called r2, which is the proportion
of variance that a model accounts for.
What r2 means
-
The correlation r squared
-
Calculate the variance of the predicted y's, then Divide
it by the variance of the observed y's
Plot the residuals
-
We also want to know whether the line fits equally well everywhere.
-
We can plot the residuals.
-
Residual = observed y - predicted y
Know the Warning Signs
-
Any systematic pattern of residuals suggests systematic variance
that is not being explained.

Nonlinear Relationships
-
Linear regression tries to fit a line to the data
-
A line is not always the best relationship between points.
Creating Linear Relationships
-
Linear regression is a fast easy way to model data.
-
If the data are non-linear, there may be a way to transform
one of the variables.
-
If the data have an exponential relationship, then taking
the logarithm of one variable will yield a linear relationship.
The Regression Line as Description
-
Looking at a scatterplot gives a general idea of the relationship
between variables
-
The regression line allows a more precise statement of the
relationship.
-
This aspect of regression is particularly important in psychology.
-
If a line provides a good model of the data, that will affect
how we think about that process.
The Regression Line as Predictor
-
Regression lines permit extrapolation from the data.
-
Data is collected at particular points.
-
Using the regression line, we can predict data at intermediate
points.
-
Our confidence in that prediction is based on the goodness
of the line as a model for the data.
-
Predictions should be made from good-fitting lines, but not
from poorly fitting lines.
Summary
-
Scatterplots display the relationship between variables
-
This relationship can be described using least squares regression
-
Minimizes the squared deviations
-
Lines can be assessed for their goodness
-
Look at r2
-
Look at residuals
-
Good lines can be used as descriptions of data
-
Good lines can be used to predict new values.