We have seen that a statistic gives us numerical information
about a class, such as the total number of its members or their
average values on a variable.
Statistics can also tell us about correlations among these
numerical properties.
A correlation can take many forms, depending on the type of
statistic involved.
Example:
Average income correlates with the amount of education people
have.
The frequency of lung cancer is higher among smokers than
among nonsmokers.
Total government revenues from the capital
gains tax have increased as the tax rate has gone down.
What these examples have in common, what makes them examples of
correlation, is a systematic, nonrandom relationship between two
variables: income and education, smoking and lung cancer,
revenues and tax rates.
Correlations are important because they can give us evidence of
causality. In a complex system, a given effect is often the
result of a great many factors--none of which by itself is either
necessary or sufficient. However, a given factor can be a
partial or contributing factor, something that increases the
likelihood of the effect, something that weighs in the balance--
and can tip the balance if the right combination of other factors
is also present.
In general, a contributing factor usually can't be identified by
looking at individual cases, but it reveals itself in the
existence of a correlation among variables in the relevant class.
The existence of a correlation, however, does not prove
causality--not by itself. A correlation may occur by chance, or
it may reflect a causal relationship quite different from the one
it suggests.
The rules for evaluating statistical evidence of causality rest
on the same basic principle as Mill's method, just as drawing a
statistical generalization from a sample is governed by the same
basic principle as universal generalizations.
We are now, though, comparing groups instead of individual cases,
as we did when we studied Mill's methods earlier. We take two
groups that are identical except that one (the experimental
group) has the property we're testing, while the other (the
control group) does not. The property that we're testing is
called the independent variable, and the effect is the dependent
variable.
We have to use groups when a factor is only a contributing
factor, for the reason explained above. We also have to
make an adjustment in the way we measure the dependent variable,
the effect. The question is not whether the effect occurs, or to
what degree, in a particular case; we are not comparing groups.
The question is whether a factor makes a statistical
difference in the effect.
Finding two groups whose members are all identical is out of the
question. Fortunately, that is not necessary.
Example:
Suppose you want to know whether a certain cram course can raise
people's SAT scores.
Because we are dealing with groups, what matters is that they have
the same distribution on those variables--the same distribution
by verbal ability, memory, and so forth. In that case, the experimental
and control groups are statistically identical except for the
variable we are testing, and a statistical difference in the
effect can then be attributed to that variable.
Comprehension Questions
Statistical significance |
Observational studies |
Internal and external validity
Return to Tutorial Index