Threats to Validity
The purpose of any study is to tell us what is “really” happening in the world: Does streptokinase reduce cardiac mortality? What causes sudden infant death syndrome? Did the swine flu vaccination program do more good than harm? We hope that the results from our sample can be generalized to the population at large so that our findings also hold true for similar people. Consequently it is disconcerting, at the least, to find different studies coming to opposite conclusions.
The major reason for these differences is that all studies have flaws involving (1) the definition of the disorder or phenomenon of interest, (2) the selection of the subjects, or (3) the design or execution of the study itself. Cook and Campbell call these flaws threats to validity. In this discussion we examine some of the more common ones and see how they can affect the interpretation of the results. In Chapter 4 we discuss those forms of bias that affect eliciting and recording information.
Subject Selection BiasesSubject selection biases involve a host of factors that may result in the subjects in the sample being unrepresentative of the population. We’ve already discussed one class of selection bias—nonrandom sampling. However, even with the best of sampling strategies, nature (human and otherwise) conspires against us in many ways. Sackett compiled a list of various biases, 57 at last count, and even this is probably incomplete. To keep life simple, we can think of two major types of subject selection biases: who gets invited to participate in a study and who accepts. We cannot even attempt to provide a complete catalog of these two classes of factors; rather, the following three examples of invitational bias (healthy worker, incidence-prevalence, and Berkson’s) and one of acceptance bias (volunteer) are illustrative only. We hope these examples help enlighten and warn the reader of where things can go wrong.
Healthy Workers BiasRandom sampling does not help us if the group from which the sample is drawn is unrepresentative of the population to which we want to generalize. For example, comparing the outcome of pregnancies of women who work with VDTs with those of a group of women chosen at random may open up the researcher to the healthy worker bias; that is, people who work are, as a group, healthier than the population as a whole. The entire adult population consists of those people who are working, those who are able to work but do not for one reason or another, and those who cannot work because of health problems. Any group of workers, by definition, does not include this last category of people that tends to lower the overall health status of the population. This selection bias operates even more strongly when the job applicants have to pass a physical examination, as in the Armed Forces or certain labor-intensive occupations. Seltzer and Jablon, for example, found lower morbidity rates among people discharged from the Army than among people of similar ages in the general population. This effect was seen even 23 years after the men had been discharged.
The effects of this bias are to (1) make any sample drawn from a group of workers appear healthier than the general population; (2) make the standardized mortality rate (see Chapter 4) less than 1:1 when workers are compared with the general population; and (3) make the proportional mortality rate (see Chapter 4) for occupational hazards greater than 1.0 because of “borrowing” (i.e., if they are dying less from heart disease, they must be dying more from something else).
Incidence-Prevalence (Neyman) BiasIf a group is investigated a significant amount of time after the people have been exposed to a putative cause or after the disorder has developed, those who have died and those who have recovered will be missed. This is known as incidence-prevalence bias or the Neyman bias. For example, a cross-sectional look at depressed patients in hospital misses those in whom the depression culminated in suicide or resolved itself. Similarly, a study of cardiac patients in a tertiary care hospital does not include (1) those who died before reaching hospital and (2) those whose myocardial infarction was not sufficiently severe to warrant transfer to a specialized facility.
As another example, even the latest version of the Diagnostic and Statistical Manual of Mental Disorders (1994) is somewhat pessimistic regarding the long-term prognosis in schizophrenia. However, this pessimism may be unwarranted and may be based on the fact that most “natural history” studies use patients who are in hospital at a given time. Follow-up studies with patients who have been admitted for the first time, which are much less susceptible to the Neyman bias than cross-sectional ones, give a different picture; according to these follow-up studies, the majority of patient—anywhere between 60 and 80—percent-go on to lead productive lives outside the hospital.
The effects on the Neyman bias can he in two different directions. Missing those who died before they could he included in the study makes the disorder look less severe because the outcome is generally more positive than had all patients been included. Conversely, missing those who have already gotten better makes the outcome look grimmer. The net effect is often unknowable and depends on the relative proportions of patients in the three groups (i.e., studied, died, and improved).
Berkson’s Bias Berkson’s bias is the spurious association found between some characteristic and a disease, and it results from admission rates to hospital (or any other setting where the study is carried out) being different for those persons (1) with the disease, (2) without the disease, and (3) with the characteristic. For example ( Fig.1266), assume that in the general population there is no relationship at all between vaginal bleeding (the characteristic) and endometrial cancer (the disease).
Let us further assume that 10 percent of patients with endometrial cancer have vaginal bleeding and 10 percent of patients with other cancers have bleeding. If the probability of being admitted to hospital because of vaginal bleeding is 70 percent, if it’s 10 percent because of endometrial cancer and if it’s 50 percent because of other forms of cancer, then we can assume the following:
-
Of the 100 patients with vaginal bleeding and endometrial cancer (cell A), 10 will be admitted because of endometrial cancer (i.e., 10 percent). Of the remaining 90 patients in cell A, 63 (70 percent) will be admitted because of vaginal bleeding, so that a total of 73 women will be admitted with endometrial cancer and bleeding.
-
Of the 100 patients with vaginal bleeding and other forms of cancer (cell B), 50 will be admitted because of the other cancers. Of the remaining 50, 35 (again, 70 percent) will be admitted because of vaginal bleeding, so that in total 85 will be admitted with bleeding and other cancers.
-
Of the 900 patients with endometrial cancer and no bleeding (cell C), 90 (again 10 percent) will be admitted because of endometrial cancer.
-
Of the 900 patients with other forms of cancer and no bleeding (cell D), 450 will be admitted because of the other cancers.
Figure 1266 – (Table 3-1) Association Between Endometrial Cancer and Vaginal Bleeding

Streiner DL, Norman GR. PDQ Epidemiology-Second Edition, 1996, BC Decker Inc., Hamilton, Ontario.
Some figures may not display clearly when rendered as a PDF or printed.
Fig.1289 shows the graphic results of these different admission rates. Now it appears that 44.8 percent of patients with endometrial cancer have vaginal bleeding, whereas only 15.9 percent of patients with other forms of cancer have vaginal bleeding. This apparent (and false) association is the result of different hospitalization rates for endometrial and other cancers and for vaginal bleeding. Thus Berkson’s bias comes into play whenever we sample from a setting in which there are different rates of admission for different disorders.
Volunteer BiasTo be ethical, most studies allow patients to refuse to participate. Thus the results are predicated to some degree on the assumption that those who do not volunteer are similar to those who do. However, there is now ample evidence to show that this is not the case and that volunteers differ systematically from nonvolunteers.
For example, the National Diet-Heart Study found that, compared with nonvolunteers, volunteers more frequently (1) were nonsmokers, (2) were concerned about health matters, (3) had a higher level of education, (4) were employed in professional and skilled jobs, (5) were Protestant or Jewish, (6) were living in households with children, and (7) were active in community affairs.
Not surprisingly, a similar problem exists when we’re trying to track people down, either to find out what happened to them or to ask them some questions as part of a survey; those who are harder to find are different from people who can be contacted more easily. What may he surprising is that this is an issue even for those who have gone to meet the great epidemiologist in the sky. The mortality rate among neurotic patients who were easy to trace was 2.7%; for those who were more difficult to locate, it shot up to 27.3%.
Figure 1289 – (Table 3-2) Results Caused by Different Hospitalization Rates for Characteristic (Bleeding) and Disease (Cancer)

Streiner DL, Norman GR. PDQ Epidemiology-Second Edition, 1996, BC Decker Inc., Hamilton, Ontario.
Some figures may not display clearly when rendered as a PDF or printed.
An analogous sort of effect, which we could probably name compliance bias, exists among those who participate in trials. In one arm of the Coronary Drug Project, the 5-year mortality rate for compliers (those who took 80 percent or more of their medication) was 15.1 percent. It was almost twice as high among noncompliers (28.2 percent), even though the “medication” they were complying with was a placebo. Similarly the mortality rate in the β-Blocker Heart Attack Trial was twice as high for noncompliers, whether they were adhering to taking their propranolol or their placebo. Although all subjects were volunteers in both of these trials, those who complied with the treatment regimen were apparently a different breed from those who did not comply.
Even for those who participate in a trial, a type of volunteer bias may operate. The incidence of inactive tuberculosis was lower among volunteers who appeared early during a mass screening than among those who appeared later, whereas the opposite trend was noted for pneumoconiosis.
Hawthorne EffectAccording to legend, worker productivity improved at the Hawthorne plant of the Western Electric Company not only when the illumination was increased but also later when it was decreased. The reason for this was supposed to be the attention paid to the workers by the researchers and not the lighting itself. Although later studies showed that the increase in productivity likely resulted from other factors, the term Hawthorne effect has remained to explain the phenomenon that occurs when a subject’s performance changes simply because he or she is being studied (some have referred to this as the psychologic equivalent of the Heisenberg Uncertainty Principle).
For example, Frank reported that the introduction of a research project onto a hospital ward was “followed by considerable behavioral improvement in the patients,” even though no medication or special treatments were involved. He felt that the most likely explanation was that “participation in the project raised the general level of interest of the treatment staff, and the patients responded favorably to this.”
To counteract the Hawthorne effect it is often necessary to use an attention control group, which is treated exactly the same as the experimental group except for the active treatment. For example, studies of psychotherapy often use a control group that meets with the therapist as frequently and for the same duration as does the treatment group, but the content of the session is not supposed to be therapeutic. In drug trials the control group receives a placebo, which usually involves taking the same number of pills at the same time of day as the experimental subjects.
BlindingOne effect of the attention control group we just discussed is to blind the subject and perhaps the experimenter. A person is considered blind if he or she is unaware of the group to which a subject belongs. If only the subject is unaware but the experimenter knows, the study is called single blind. If both the subject and the researcher do not how, the study is labeled double blind.
The purpose of blinding is to prevent various biases from affecting the results. Subjects may show a placebo effect if they know they are receiving an active agent or may not show it if they think they are not receiving the new drug. With single blinding, both groups should show an equivalent reaction. The magnitude of the placebo effect should not be underestimated. The results of one typical study, shown in Fig.1234, indicate that more than 50 percent of patients experienced relief of headache pain from placebos.
Figure 1234 – (Figure 3-11) Results of this study show the placebo effect. In this case more than 50 percent of subjects on placebo experienced relief of headache pain

From Beecher HK: The powerful placebo, JAMA 159:1602-1606, 1955.
Some figures may not display clearly when rendered as a PDF or printed.
If the clinicians (or evaluators) were aware of group membership, they could be more alert or attentive to signs of improvement. Likewise, clinicians who know that a disease should be present may be more diligent when looking for it (diagnostic suspicion bias). Rosenthal conducted a series of studies that showed that what a researcher expects to find in an experiment affects what does occur, irrespective of whether the subjects are humans or rats.
Proxy MeasuresProxy measures are variables, both dependent and independent, that stand in for other variables. They’re used for two reasons. The first, and more legitimate one, is that what we really want to look at may be too difficult to measure directly because it is too invasive to do so (e.g., density of neural plaques in Alzheimer’s disease) or it may take too long to manifest itself (such as death), and so on. The second reason, which is less defensible, is that we’re not aware that we’re dealing with proxy measures. Let’s start off by looking at the dependent variable.
Surrogate End PointsImagine that you’ve discovered a new drug that promises to reduce cardiac mortality by raising the levels of “good” cholesterol among otherwise healthy women. However, you quickly find out that in order to see if the treatment, which you’ve called “LiDLe Women” works, you’ll have to enroll 20,000 subjects and follow them for 30 years. Part of the problem, about which you can do nothing, is that the rate of cardiac deaths among young women is low. The other factor contributing to the large sample size requirement is that you’re looking at a dichotomous outcome-alive or dead. As we’ll see in the next chapter, you need far fewer subjects if you measure the outcome on a continuum so you look around for some end point that can be measured this way and come up with an index of coronary artery stenosis. Congratulations! You have just played the surrogate end point game.
Stenosis, diastolic blood pressure, or CD4+ cell counts among acquired immunodeficiency syndrome (AIDS) patients are surrogate measures because, when we come right down to it, changing the values of these measures isn’t what the therapies really concern. We are interested in them only to the degree that they are correlated with the true outcome, which in these cases is death. If there is a strong association between the surrogate and the actual outcome of interest, then using a proxy can result in shorter trials with fewer patients and for far less money. However, if the relationship is weak, despite what our theory tells us, then we can come to wrong conclusions. For example, the Cardiac Arrhythmia Suppression Trial tried to reduce premature ventricular contractions (PVCs) because it was believed that suppressing them would result in fewer deaths. The good news is that the drugs did suppress PVCs; the bad news is that these patients died at a rate 2½ times that of the control group. And paralleling our fictitious example, a large study in Finland succeeded in reducing the risk factors for cardiovascular disease by 46 percent in the treatment group. Unfortunately, the men in that group died of heart disease at more than twice the rate as that in the control group.
The moral of this tale is that surrogate end points can lead to more efficient trials, but they must have been proven to be closely associated with the true outcome. Reliance on theory or clinical supposition alone is never sufficient.
Surrogate Explanatory VariablesOften we see among the list of explanatory variables ones like sex, education, marital status, income, or ethnicity. These tend to be proxy measures of the second type; variables of which we’re not aware are actually stand-ins for other variables. But if we think about it a bit, we’ll see that we are rarely interested in these variables in their own right. For example, many studies have documented the inverse relationship between income (or socioeconomic status) and health. But, as seen in countries with universal health insurance (e.g., Canada, Great Britain), money does not buy health. Rather, income is a measure of large differences in “lifestyle” factors between richer and poorer people that affect health, running the gamut from jobs (low-paying jobs tend to be more dangerous than office work), to nutrition, to smoking status. Similarly, when studies report differences in compliance rates between men and women, it is highly doubtful that they are postulating a biologic explanation of why women take medication more regularly than men. Here, gender is a proxy for other factors, such as socialization, relationships with authority figures, or concern about health.
The problems with using surrogate explanatory variables are at least two-fold. First, we may be fooling ourselves about what the important factors are and how modifiable they are. We cannot change gender, for example, but we can alter attitudes toward health. Second, any time we measure a variable, we introduce measurement error. With proxy variables, we are introducing error upon error—the error of the measure and the degree to which the proxy is not a perfect indicator of the underlying variable that interests us.
Confounding Confounding is the illusory association between two variables when in fact no such association exists. It is caused by a third variable (the “confounder”), which is correlated with the first two. For example, Table 3-3 Fig.1242 shows bifocal use (needed or not) and nocturnal enuresis (present or absent) in a group of 200 patients.
The odds ratio is 1.93, which indicates that persons who need bifocals are twice as likely to have enuresis as those who don’t need bifocals.
However, a closer look at these data shows that there are actually two age groups involved (Table 3-4 Fig.1292). For each age group, there is no association between bifocal use and enuresis. In those less than 60 years of age, 5 percent of bifocal users are enuretic (1 of 20 subjects), as are 5 percent of nonusers (4 of 80 subjects). For those more than age 60, 20 percent are enuretic, irrespective of bifocal use. The confounder here is age; bifocal users are more apt to be more than age 60, which is also the group that has the higher rate of enuresis ( Fig.1238).
Figure 1242 – (Table 3-3) Relationship Between the Need for Bifocals and Nocturnal Enuresis

Streiner DL, Norman GR. PDQ Epidemiology-Second Edition, 1996, BC Decker Inc., Hamilton, Ontario.
Some figures may not display clearly when rendered as a PDF or printed.
Figure 1292 – (Table 3-4) No Association Between Bifocal Need and Nocturnal Enuresis When Subjects are Divided by Age

Streiner DL, Norman GR. PDQ Epidemiology-Second Edition, 1996, BC Decker Inc., Hamilton, Ontario.
Some figures may not display clearly when rendered as a PDF or printed.
To be a confounder, a variable must meet two criteria: (1) it must be a risk factor for the outcome of interest, and (2) it must be associated with the independent variable or distributed differently between the groups. Age meets these conditions, in that it is a risk factor for enuresis and is also related to the need for bifocals. We can control for confounders in a number of ways. If we were doing a study that allowed us to assign patients to groups, we could hope that randomization would balance the groups in terms of the confounder. If we cannot randomize or if we don’t want to rely on it alone to ensure balanced groups, we can match on the variable; in this case, for each person who needs bifocals, we would choose a person to be in the nonbifocal group who is the same age, ensuring that the groups do not differ on this variable. Third, we can stratify on that variable by dividing the pool of subjects in each group into a number of age strata and sampling equal numbers per group from each stratum. If we had a smaller pool of potential subjects from which to draw, we could frequency match. Here, we would ensure that the proportion of people of different ages were the same in the two groups, without being concerned that a specific person in one group was paired with someone in the other group. We would be satisfied if the overall proportions or frequencies were the same. However, matching raises some problems in the analysis stage and can be costly. Last, we could try to account for the differences between the groups statistically, by using the confounder as a covariate (a term that we explain in some detail in PDQ Statistics and Biostatistics: The Bare Essentials). Statisticians still argue vehemently among themselves whether any posthoc statistical manipulation can adequately control for preexisting differences between groups on some confounding variable. However, most of us blithely continue to do it, most likely because no potential subjects are lost because a similar person for the other group could not be found for them, as can easily happen with matching.
Figure 1238 – (Figure 3-12) A, When unaware of the confounder, it appears that there is a direct association between enuresis and bifocals. B, There is a direct association between age (confounder) and bifocals and between age and enuresis

Streiner DL, Norman GR. PDQ Epidemiology-Second Edition, 1996, BC Decker Inc., Hamilton, Ontario.
Some figures may not display clearly when rendered as a PDF or printed.
InteractionsOften people use the term confounding when they really mean interaction. At one level, it’s easy to see why—both are effect modifiers, that is, they modify the strength of the association between two variables. However, they’re different animals. As we just saw, a confounder is a third variable that can produce an illusory association between two other variables or result in an apparent lack of association. An interaction, as the name implies, means that the effect of Variable A depends on the value of Variable B. For example, does oral contraceptive use increase the risk of heart attacks among women?
Fig.1159, modified from Shapiro et al. (1979), shows the risk of myocardial infarctions (MIs) for women who use oral contraceptives and those who don’t and for women who smoke to varying degrees, as compared with nonsmokers who are not taking oral contraceptives (i.e., their risk is 1.0). Does the risk increase? It all depends. If the women smoke fewer than 25 cigarettes a day (it‘s assumed they don’t smoke cigars or pipes), then there is no appreciable increase in risk. However, if they smoke 25 cigarettes or more a day, their risk jumps from 7 times that of the nonsmokers (that’s the effect of smoking) to 39 times. Therefore there is an interaction between oral contraceptive use and smoking in terms of the risk of MI: no increase for nonsmokers, and more than a five-fold risk for smokers. This means that one variable cannot be looked at in isolation; the overall risk is too high for those who smoke less than a pack a day and too low for those who smoke more.
Figure 1159 – (Figure 3-13) Interaction between oral contraceptive use and smoking in risk for heart attack

Modified from Shapiro S, Slone D, Rosenberg L: Oral contraceptive use in relation to myocardial infarction, Lancet i:743-764, 1979.
Some figures may not display clearly when rendered as a PDF or printed.
ContaminationIn studies in which one group receives the experimental treatment and another group gets either conventional treatment or a placebo, the validity of the results is predicated on the purity of the groups. If some subjects in the control group receive the new treatment, both groups will improve to some degree (assuming that the treatment works). Thus differences between the groups are diminished or even eliminated. This condition is referred to as contamination.
Contamination is a particular problem when a medication used in a study is also available over the counter or as an ingredient in other compounds (e.g., aspirin) or when it can be prescribed by family physicians who are unaware (or have forgotten) that certain drugs should not be given to some of their patients. However, contamination is not limited to drug trials; it can occur with any form of intervention, such as respite care for those taking care of demented elderly, psychotherapy, and similar maneuvers in which subjects in the control group receive some form of the treatment.
In cohort and case-control studies contamination is caused by misclassification, that is, assigning exposed subjects to the nonexposed group or vice versa. This is often caused by errors in recall by the subjects.
The effect of contamination is to reduce differences between the treated and untreated groups. This may lead us to draw the erroneous conclusion that the intervention is of limited or no use.
Cointervention Cointervention refers to subjects in a study receiving therapies other than those given as part of the experiment that affect the outcome of interest. For example, some subjects in a study that compares the effectiveness of various nonsteroidal antiinflammatory drugs for arthritis could be given other drugs by another physician, be enrolled in a program using transcutaneous stimulation, or might be taking over-the-counter aspirin.
Cointervention differs from contamination in two ways: (1) the intervention and (2) the groups that are affected. First, contamination refers to the control group receiving the experimental intervention, whereas cointervention refers to some treatment other than the one under investigation. Second, all groups in a study can he witting or unwitting recipients of a certain cointervention, but only the control group can be contaminated.
Although all groups can be subject to cointervention, it is a particular danger when the control subjects do not improve or even deteriorate on placebo. If any other clinician is involved in the case and unaware of the study, he or she may prescribe other treatments to help the person, thereby minimizing differences between the groups. If subjects in all groups receive other therapies, then it becomes almost impossible to determine if the results are caused by the treatment under study, by the cointervention, or by both.
Regression Toward the Mean Regression toward the mean refers to the phenomenon whereby groups of subjects that are chosen because of their extreme score on any variable will have scores that are less extreme and closer to the mean value when they are retested. The reason is that any test result we observe—some serum value, a decision based on a radiograph, or a score on a paper and pencil test—is comprised of two parts: the true score and the error score. Written out in the form of an equation, we say the following:
Observed Score = True Score ± Error Component
There are many sources of error (see Chapter 4), including variations in the machine, biologic variation within the subject, motivation, fatigue, and recording error. The assumption is that this error component is random, sometimes adding to the true score and sometimes diminishing it. We can never see the true score, only the observed score.
When we select a group because of its extreme scores (either very high or very low), we are including two types of persons: (1) those whose true scores are extreme and (2) those whose true scores do not fall in the extreme range, but the error component added to the true score has placed them in the extreme region. Similarly, we have excluded persons whose true scores are extreme but whose observed scores are below the cut-off level. For example, let’s assume that we’re using a test with a mean of 50, and a score of 70 or more identifies the most extreme 2 percent of the sample, which is the group we want to include in our study. We’ve shown the true score plus or minus the error component for the 10 subjects whose observed scores are 70 and for a few of the other subjects ( Fig.1176). Thus we have biased our sample to include an overrepresentation of people who have error scores in the direction away from the mean. Because the error component is random, when these people are retested only half of them will have error scores away from the mean (keeping them in the extreme range), and half will have error scores that move the observed score closer to the mean. On the whole, the group average on the second testing will be closer to the mean than on the first testing.
Figure 1176 – (Figure 3-14) True score ± error component for 10 subjects with observed scores greater than 70 and 4 subjects with observed scores less than 70

Streiner DL, Norman GR. PDQ Epidemiology-Second Edition, 1996, BC Decker Inc., Hamilton, Ontario.
Some figures may not display clearly when rendered as a PDF or printed.
In practical terms this means that if we select a group of subjects because they appear abnormal on some test (i.e., their score differs from the mean) and do nothing to them, they will seem to improve (move closer to the mean) when they are retested. So if we had intervened, it would be impossible to know if the improvement was caused by us or simply by regression effects.
The magnitude of this effect is inversely related to the reliability of the test; the less reliable the test is, the greater the regression effect. The reason is that reliability expresses the relative contributions of the true score and the error scores so that an unreliable test has a large error component (see the discussion on reliability in Chapter 4).
Regression toward the mean can be minimized in two ways: (1) by increasing the reliability of the test and (2) by testing each subject at least twice and requiring all the tests to be extreme before he or she is included in the study. This is often done in hypertension trials in which the person has to have three consecutive abnormal readings before being called hypertensive.
Cohort EffectsAs we noted earlier, a cohort refers to a group of people chosen because they share some common characteristic (e.g., employment in a specific job or exposure to a given agent). Previously, however, cohort was used in a narrower sense and meant a group of similar age (i.e., members only have year of birth in common). Cohorts of this type have been useful in elucidating many epidemiologic findings, such as increases in longevity and height over time. A danger arises when one attempts to attribute a causal factor to differences among age cohorts because one cohort differs from another on many variables other than age.
For example, studies done in the 1940s and 1950s tended to show a decline in intellectual ability that began fairly early in life. These studies were done by measuring the intelligence quotient (IQ) in a group of people in their teens, another group of people in their 20s, and so on. A different picture emerges when we follow one group of people over time, as we see in Fig.1188. The cross-sectional data show the decline with time, but the longitudinal data show that inductive reasoning actually increases until we’re ready for retirement, and then the decline is relatively slow and modest. The problem with the earlier studies is that they confounded age with cohort; not only were the older subjects more advanced in years than the younger ones, but they were also exposed to a different educational and cultural environment, which accounted for most of the differences among the cohorts and hence for most of the apparent decline.
Figure 1188 – (Figure 3-15) Changes in inductive reasoning with age based on longitudinal and cross-sectional data

Modified from Schaie KW: The course of adult intellectual development, Am Psycho1 49:304-313, 1994.
Some figures may not display clearly when rendered as a PDF or printed.
Ecologic FallacyEcologic studies attempt to demonstrate a relationship between two variables, such as suicide rate and religion, by using aggregate data. These are data about groups of people rather than individuals. For example, we can look at the rates of lung cancer per 100,000 individuals in a number of cities and see if these are correlated with pollution levels.
Although this technique is inexpensive and has at times led to useful findings, there is one major problem-there is no guarantee that those people who developed lung cancer were the same ones who were exposed to the pollution. That is, it is possible (although unlikely) that pollution is unrelated to cancer of the lung but that pollution is caused by large factories. We know that cigarette smoking is related to social class and that factory workers smoke more heavily than the general population. So it may be that pollution is simply a marker for heavy smoking, and it is the smoking that is producing cancer.
The ecologic fallacy was nicely demonstrated
by Robinson, who showed that there was a strong relationship (r=0.62)
between literacy rates and the proportion of nonnative born people;
that is, regions with the largest number of immigrants had the lowest
rates of illiteracy. Because most immigrants had relatively little
education, especially in the 1930s when the data were collected, this
seems to fly in the face of common sense. However, the individual
correlation between literacy and foreign birth was
The explanation is that immigrants usually settle in large cities, which have high rates of literacy, rather than in rural areas where literacy rates are lower. Thus areas with low rates of illiteracy have a high proportion of immigrants, but illiteracy and immigrant status are correlated (albeit weakly) within the individual.
Content on this page was last changed on March 19, 2009.
© 2002 BC Decker Inc. Show Disclaimer
| 5476. | Streiner DL, Norman GR. PDQ Epidemiology. 2nd ed. Hamilton, Ontario: BC Decker Inc.; 1996. |
Next Page: Epidemiologic Research Studies »