Maximizing Precision/Reliability
-
An apparatus for measurement should be as accurate as possible
relative to what we want to measure.
-
Measuring response times: Don't use a sweep second
hand on a clock
-
There is a point of diminishing returns
-
Our operational definitions should be specific
-
Our measures should be repeatable
-
There is often a tradeoff between precision and ease of measurement
-
Do the results replicate under similar circumstances?
-
e.g., a weight scale + or - 2 pounds
-
e.g., Bush over Gore 51 to 46 + or - 2 points
Accuracy
-
Easier to assess when precision is high
-
Bias-systematically high or low
Operational Definitions
-
In order to make precise measurements, we must know what
we are measuring
-
Constructs
-
Abstract (hypothetical) factor
-
Cannot be observed directly, but is inferred from observations
-
Constructs generally cohere
-
Generally assumed to have an underlying cause
-
Examples
-
Depression
-
Learned Helplessness
-
Constructs are made concrete via operational definitions.
Validity
-
Operational definitions
-
Must be clear and precise
-
Milgram: Defined "disobedience"
-
The point of rupture is the act of disobedience. A
quantitative value is assigned to the subject's performance based on the
maximum intensity shock he is willing to administer before he refuses to
participate further.
-
Our definitions must match what we think we are testing
-
What is a good measure of communication?
-
Why wouldn't we count sneezes?
-
Olympic skiing (again)
-
Downhill skiing: Time
-
Ski jumping: Distance
-
Freestyle skiing: Aesthetic value
Types of Validity (External Validity)
-
Face Validity
-
e.g., asking math questions to assess math ability
-
Content Validity
-
Using representative material
-
e.g., typing test for a typist
-
Criterion-related validity
-
Can one infer a value of some other measure?
-
e.g., two intelligence tests correlate
-
SAT and grades
-
Construct validity
-
Does the measure really capture the underlying construct?
-
e.g., low validity: measuring intelligence by feeling for
lumps on the head.
Practice with Operationilization (indulge me)
-
People who live in clean houses get sick less often than
people who live in dirty ones.
-
What is a clean house? A dirty house?
-
People with high mathematical ability can play chess better
than people with low mathematical ability.
-
Physically attractive individuals make better salespeople
than physically unattractive people.
MORE: Memory, Pornography, Death, Punishment
Problems with Dependent Measures
-
Range effects
-
Scores crammed to the top or bottom of the scale
-
Floor and ceiling effects
-
Sensitivity
-
Measuring fear
-
Screaming vs. blood pressure
Maximizing Objectivity
-
There is always a danger of observer bias
-
Expectancy
-
The experimenter's actions may influence the outcome of an
experiment
-
Clever Horse
-
Bias can affect observations
-
Observer may be more sensitive to behaviors that support
the hypothesis than to those that disconfirm it.
-
Observers involved in a situation may become emotionally
involved
Objectivity (cont)
-
Ensuring objectivity
-
Naive raters
-
Don't know the point of the study
-
Blind raters
-
Don't know the experimental condition of the data they are
scoring
-
Subjective measures require a check on inter-rater reliability
-
double blind: neither subject nor experimenter know condition
Issues
-
Observation requires losing information
-
When will that information be lost
-
Some measurements preserve a lot of information
-
Some measurements lose a lot of information
-
Response and latency measurements
-
How can we deal with these kinds of observations?
Classes of Measures
-
Narrative records
-
Videotape, audiotape, written descriptions
-
Requires massive data reduction
-
Coding of behavior
-
What events should be recorded?
-
Sampling of record
-
Frequency method
-
How often does a particular behavior occur?
-
Must define the event...
- Intervals Method
- only look at certain set times
- Must define the event...
-
Duration Method
-
How long does a particular behavior last?
-
Must define the event...
Coding Data
-
Techniques for coding data
-
Limits and checks for particular behaviors
-
Binary checklists (happened/didn't happen)
-
Pretty objective as long as behaviors are well defined
-
How strong/present was behavior
-
These measures are more subjective
-
All of these measures require checks of inter-rater reliability
Inter-rater Reliability
-
Most open-ended measures require checks of inter-rater reliability
-
Requires multiple raters
-
Important even if naive or blind raters are used
-
Proportion of agreement
-
(# of agreements)/(# of possible agreements)
-
Low reliability (less than 85%) means the operational definitions
are not specific enough
-
There are other measures as well
-
Cohen's Kappa
-
Accounts for correlations between raters
An Imaginary Study
-
Ratings of boredom
-
Subjects are shown a videotape of this class
-
Observe amount of time subjects are bored
-
What counts as boredom
-
Perhaps observe amount of time subjects:
-
Put head on desk
-
Yawn more than three times in one minute
-
Check their watch more than three times
-
Look away from screen
-
Are these measures reliable?
-
Are they really measures of boredom?
Measurement Scales
-
Nominal Scales
-
Name
-
e.g., Male/Female
-
No ordering of values
-
Ordinal Scales
-
Can infer ordering of values
-
e.g., low, medium, high self-esteem
-
Interval Scale
-
Can infer ordering of variables
-
The variable values are evenly spaced
-
e.g., Celsius scale, Intelligence tests
-
Ratio Scale
-
Same as Interval scale, but zero is special
-
Can make ratios
-
e.g., weight, Kelvin scale (absolute zero)
Miscellaneous Topics
Pilot Study
-
Pre-study with a few subjects
-
Exploratory
Manipulation Check
-
Are you results consistent with previous results?
-
Sanity Check