HESC 401 Epidemiology
Objectives � Describe the design of a case-control study
� Explain the differences between case-control and cohort studies
� Explain how cases and controls are selected and the different sources used for selection. Specifically know the importance of control selection
� Define and describe matching and the different types, along with problems associated with matching
� Explain the advantages of embedding a case-control study in a defined cohort, also know as a nested cased/control study
� What is recall bias and how does it effect a study?
� Explain and describe the design of a cross-sectional study
Study designs so far… � So far, we have examined randomized clinical trials (RCTs) and
cohort studies. � In summary, RCTs, use an experimental design and randomly
assign subjects into an intervention group that receives some sort of treatment/intervention and a comparison group (that receives no treatment) and then follows the subjects to see whether disease occurs in the groups.
� In a cohort study, we begin with exposure first, for example the exposure could be smoking, or diet or physical activity and then we follow those that are exposed (smokers) and unexposed (non-smokers) and see whether disease occurs.
� There is no intervention or treatment in a cohort study. � However, for both study designs we can test whether the
exposure was associated with the disease.
Study designs so far… � Some times however, it’s not feasible to begin with
the exposure first and then follow up participants for 5 – 20 years and see if a disease develops.
� Other times, a clinician/researcher may have a group of diseased individuals and they may want to see what in their life or past few years, the patients have been exposed to that might have caused the disease. They would compare the diseased group’s past exposure to a non-diseased groups (control group) past exposure.
� This is a case-control study.
Design of a Case-Control Study
� In a case-control study, we want to examine the possible relation of an exposure to a certain disease. (Fig. 10-1)
� There are two groups in this study that we want to compare:
� Cases- people with the disease
� Controls- people without the disease
� We want to determine:
� What proportion of cases were exposed and what proportion was not
� What proportion of controls were exposed and proportion was not
In the diagram above, in a case/control study, we begin with the diseased
participants (cases) and those without disease (controls) and collect their
past exposure to see if those with the disease have higher exposure than
those without the disease.
Example of a case-control study � We want to test whether second-hand smoke is associated with
� We would identify a group of individuals with asthma.
� We would also need to identify a group of people without asthma.
� Then we would collect information on the exposure (second- hand smoke) and see whether those with asthma and without asthma were exposed to second-hand smoke.
� If there is an association between second-hand smoke and asthma, the prevalence of the history of second-hand smoke should be higher in those with asthma compared with those without asthma.
Measuring past exposure � How can you measure past exposure?
� There are several ways. Generally you can use a questionnaire and ask the participants about their past exposure, or review past medical records (i.e. if you were interested to see if BMI is associated with disease then you can collect height and weight recorded in medical records).
� One can also conduct a biologic assay to test for chemicals in the blood.
Cases (With Disease) Controls (Without
Then Measure Past Exposure
Were exposed a b
Were not exposed c d
Total a + c b + d
� The 2 x 2 table above (10-1) represents how a case/control study is conducted and importantly, how to measure whether the exposure is associated with increased odds of having the disease.
� First we select the cases and controls. Then we interview them on their past exposure. a + c represents all the cases. To measure the proportion of cases who were exposed, you would divide a/a+c. b + d represents all those without disease-the controls. To figure out the proportion of those without disease who were exposed, you would divide b/b+d.
� If exposure is associated with disease, we would expect the proportion of the cases who were exposed (a/a+c), to be greater than the proportion of the controls who were exposed (b/b+d).
CHD Cases Controls
Smoke cigarettes 112 176
Do not smoke cigarettes 88 224
Total 200 400
% Smoking cigarettes 56.0 44.0
� Gordis provides an excellent example of a case/control study above
(table 10-2). In this study, we want to assess whether cigarette smoking is associated with heart disease/CHD.
� First we find a group with CHD and a group without CHD (controls).
� Then we interview them on their past smoking history.
� If we hypothesize there is a relationship between smoking and CHD, then we would suspect that the proportion of smokers would be higher in those that have CHD compared with those who do not.
� From the table above, the proportion of smokers was higher in the CHD group (112/200 = 56%) compared with the controls (176/400 = 44%).
� This is only the first step and a more concrete way of assessing higher odds of having the disease will be discussed in chapters 11 and 12.
� Based on the table above, if I provided you with the numbers for box a and b and gave you the totals, you should be able to figure out what belongs in the other boxes/cells.
Differences between Case-Control
Study and Cohort Study
� According to Gordis, “the hallmark of the case-control
study is that it begins with people with the disease (cases)
and compares them to people without the disease
� This is different in comparison to a cohort study, which
starts with a group of exposed people and compares them
to a group of non-exposed people.
Differences between Case-Control
Study and Cohort Study (cont’d)
� People think that the main difference between the two
studies is that cohort studies go forward in time and case-
control studies go backward in time.
� This incorrectly implies that calendar time is what
distinguished these two studies from each other.
� The previous chapter explained that a retrospective cohort
study also uses data from the past, which means that
calendar time is not the characteristic that distinguishes
cohort from case-control.
Selection of Cases
� There are many different sources where cases can be
selected: hospital patients, patients in physicians’
practices, or clinic patients.
� Patient registries from communities can serve as sources
for cases as well.
� Problems can arise when selecting cases.
� If the cases come from one source, any risk factors that are
identified may be unique to that source only and therefore
ideally cases should come from several different sources,
including hospitals in different areas, clinics, registries,
How many controls can you have? � Many case-control studies generally have a ratio of 1:1,
meaning one case to one control.
� However, in some examples, if a disease is rare there may not be enough cases so the sample size will be small and therefore we may not be able to detect a relationship between our exposure and disease.
� In order to detect an association if it is present, investigators increase their sample by increasing the number of controls.
� The highest ratio that is used for a case control study is 1:4, meaning 4 controls for every case.
Famous example of case/control
study � One of the most famous examples of a
case/control study was conducted by Sir Richard Doll and Bradford Hill.
� Doll and Hill were one of the first to hypothesize that smoking and lung cancer is associated. They conducted a case-control study to show this relationship.
� For their work, Richard Doll was knighted by the Queen of England. Also, he stopped smoking after he reviewed his own study’s conclusions.
Sir Richard Doll
Selection of Cases
• Ideally, an investigator identifies & enrolls all incident cases in a defined population in a specified time period
• One can select cases from registries or hospitals, clinics • When all incident cases in a population are included, the study is
generalizable to other populations; otherwise there is potential for bias.
• The most common bias is referral bias. For example if cases (lets say we are examining lung cancer) are selected from a hospital that admits or is referred to for gastrointestinal symptoms, then maybe when we look at exposures, caffeine use and antacid use maybe more commonly reported among the cases, because patients with GI symptoms often have these exposures. Therefore, it may appear that caffeine use is related to lung cancer, but this is probably because of the referral pattern.
• There is a similar example of pancreatic cancer and caffeine use in the text.
Incident or Prevalent Cases �When we are conducting a case/control
study, how do we know whether to use incident, meaning newly diagnosed cases or prevalent, meaning existing cases.
�So for example, should we use women who have just been diagnosed with breast cancer (incident cases) or cases that have been diagnosed within the past 5 years (prevalent)?
Incident or Prevalent Cases � Using incident cases may be a problem because we
must wait for new cases to be diagnosed.
� Prevalent cases allow us to have people who are already diagnosed and “living” for a while, which offers us a large number of cases for study.
Incident or Prevalent Cases � Between the two, it is preferable to use incident cases
in case-control studies of disease etiology. Mainly because if we use prevalent cases, those who die off immediately after the disease would be excluded from the study and if the exposures of those who died is different than the prevalent cases then our findings between the exposure and disease are skewed.
� This is a type of “survivorship” bias, because those that survived and are prevalent cases could be different than those who died soon after diagnosis.
Selection of controls � One of the most important aspects on whether
one will find the most “accurate” relationship between the exposure and outcome in a case/control study is how the controls are selected.
� It is critical that the exposure in the controls is representative of the exposure in the general population that the cases were selected from.
� See example on next page.
Selection of controls � For example, lets say we are interested in looking at alcohol use
and liver problems in CSUF students. We select our cases from the health center and then decide that it would be convenient to select our controls from a fraternity. What is the problem with this control selection?
� When we inquire about the exposure-alcohol use, it probably is higher in the fraternity group and possibly it may not be high for the cases going to the health center, so it may appear that alcohol use is protective against liver disease, because we erroneously selected controls (disease-free individuals) that may not represent the “general” CSUF population.
• The best control group is a random sample of individuals from same source population (as the cases) who have not developed the disease.
• Population-based controls are the best way to ensure that the distribution of exposure among the controls is representative of the general population.
Nonhospitalized Persons as Controls
� These controls may be selected from several sources in the community.
� One option is to select a resident of a defined area, such as the neighborhood in which the case lives. This is a neighborhood control. The most common way to obtain neighborhood controls is by going door-to- door for interviews.
� However, random-digit dialing is more widely used in the U.S. This is a system where a computer will randomly select and dial telephone numbers.
Nonhospitalized Persons (cont’d)
� Another approach to select controls is to use a best friend control.
� A case is asked for the name of a best friend who is more likely to participate in the study knowing that his/her friend is also participating.
Hospitalized Persons as Controls
� According to Gordis, “hospital patients are often
selected as controls because of the extent to which
they are a “captive population” and are clearly
�When using hospital controls, it is questioned whether
to use a sample of all other patients admitted to the
hospital or whether to select a specific “other
� If we were to choose specific diagnostic groups, on
what basis do we select and exclude groups?
Problems in Control Selection
� We discussed this earlier that depending on how we select our controls, it can completely skew or even result in the opposite findings, therefore control selection is key for case/control studies.
� Please review the example of coffee drinking and pancreatic cancer in chapter 10 which exemplifies this issue.
Matching � A major concern in case-control studies is that cases
and controls may differ in characteristics/exposures other than the one that has been targeted for the study.
� One method for dealing with this problem is to match the cases and controls for factors about which we may be concerned.
� Matching: the process of selecting the controls so that they are similar to the cases in certain characteristics.
� There are two types: group matching and individual matching.
� Also known as frequency matching.
� It consists of selecting the controls in such a way that the proportion of control with a certain characteristic is identical to the proportion of cases with the same characteristic.
� This requires that all of the cases be selected first, followed by selecting the controls.
� Also called matched pairs
� For each case selected, a control is selected who is similar to the case in terms of the specific variable(s) of concern. Therefore, each case is individually matched to a control.
� It is most often used in case-control studies that use hospital controls.
Problems with Matching:
� Practical Problems: If an attempt is made to match according to too many characteristics, it may prove difficult to identify an appropriate control.
� Example from text: Trying to match each case for race, sex, age, marital status, number of children, zip code, area of residence and occupation.
� The more variables chosen to match, the more difficult it will be to find a suitable control.
Problems with Matching:
Conceptual Problems � Once controls have been matched to cases according to a given characteristic, we cannot study that characteristic. Why is this?
� When you match according to a certain characteristic, you have artificially established an identical proportion in cases and controls.
� By using matching to impose comparability for a certain factor, we ensure the same prevalence of that factor in the cases and controls.
� Consequently, we do not want to match on any variable that we may wish to explore in our study.
� In conclusion to matching, in carrying out a case- control study, we only want to match on variables that we are convinced are risk factors for the disease.
� Matching on variables other than these is called overmatching.
Limitations in Recall
� In case-control studies, data collection occurs by interviewing subjects.
� All humans are limited to some extent in their ability to recall information, which makes this an issue in case-control studies.
� Recall bias is one of the key biases in case/control studies.
Limitations in Recall � Recall bias doesn’t refer to “memory bias” or actually
remembering the data, but it refers to how or why a patient may remember certain data.
� For example, memory bias is simply not being able to remember exposure. For example remembering what you had to eat over the past 3 years would be difficult.
Recall Bias � However, recall bias refers to when cases might
selectively remember differently than controls. If you ask a mother who gave birth to a baby with neural tube defects about her lifestyle, she may really try and recall what her lifestyle was because she really wants to know what caused the illness in the baby.
� And therefore she may report exposures that a woman who had a “normal” baby wouldn’t really “probe into” herself because her baby was fine.
� So, then because the recall is greater in the cases, we may artifactually report a relationship between the exposure and disease when it really isn’t there.
When is a Case-Control Study
� According to Gordis, “a case-control study is useful as a first step when searching for a cause of an adverse health outcome.
� Using the case-control design, we compare people with the disease (cases) and people without the disease (controls).
� We can then explore the possible roles of a variety of exposures or characteristics in causing the disease.
When is a Case-Control Study
Warranted ? (cont’d)
� If the exposure is associated with the disease, we would expect the proportion of cases who have been exposed to be greater than the proportion of controls who have been exposed.
� Case-control studies are less expensive than cohort studies and can be carried out more quickly. This is why they are often the first step in determining whether an exposure is linked to an increased risk of disease.
When is a Case-Control Study
Warranted ? (cont’d)
� Case-control studies are valuable when the diseases being investigated is rare.
� It is often possible to identify cases for study from disease registries, hospital records, or other sources.
� Cohort studies, on the other hand, are not good for rare diseases and may involve many years of follow-up to actually wait for the rare disease to occur.
Case-Control Studies Based in a
Defined Cohort � This combined study is in effect a hybrid design in which
a case-control study is initiated within a cohort study.
� In this type of study, a population is identified and
followed over time. At the time the population is
identified, baseline data are obtained from records or
� The population is then followed for a period of years.
� For most of the diseases that are studied, a small
percentage of study participants manifest the disease,
whereas most do not.
Two Types of Cohort-Based Case-
Control Studies � Nested case-control studies: In theses studies, the controls
are a sample of individuals who are at risk for the disease
at the time each case of the disease develops.
� Basically a nested case-control study is “embedded” in a
cohort study. In a cohort study, we follow-up those who
developed a disease or not.
� In a nested case/control, those that develop the disease
become our cases and those that haven’t become the
controls. This is a good way to ask a new hypothesis.
� And then we can ask the cases and controls about their
exposures or use the biological samples that have already
been collected in the cohort study and test for differences
between our newly created cases and controls.
Advantages of Embedding a Case-Control
Study in a Defined Cohort � 1) Recall bias is eliminated because interviews or
blood/urine specimens are obtained at the beginning of the
� 2) Collection of biological samples was completed prior to
when the disease occurred, so we avoid problems of
whether the disease may have “altered” or changed
chemicals in the blood.
� 3) It is more economical to conduct. Specimens obtained
initially are frozen or otherwise stored, which reduces
� 4) Cases and controls are derived from the same original
cohort, so there is likely to be greater comparability
between the cases and the controls.
� This study is also used to investigate etiology of disease.
� In this study, both exposure and disease outcome are
determined simultaneously for each subject, as if we were
viewing a snapshot of the population at a certain point in
� Often a cross-sectional study is referred to as a snapshot in
� The cases of disease that are identified in this study are
prevalent cases because we know that they existed at the
time of the study but did not know their duration. For this
reason, this design is also known as a prevalence study.
Design of a Cross-Sectional Study
� Figure 10-14 displays the design of a cross-sectional study on the next slide.
� We define a population and determine the presence or absence of exposure and the presence or absence of disease for each individual.
� Each subject can then be categorized into one of four possible subgroups.
Advantages of cross-sectional
studies � Cross-sectional studies are important for public
� By examining exposures and diseases in a population, we can estimate how many hospital beds we’ll need, if we should implement a nutrition and physical activity campaign.
� Some very popular cross-sectional studies are NHANES – National Health and Nutrition Examination Survey and the California Health Interview Survey (CHIS).