Case-Control Studies and Other Study Designs

HESC 401 Epidemiology

Lecture 10

Objectives � Describe the design of a case-control study

� Explain the differences between case-control and cohort studies

� Explain how cases and controls are selected and the different sources used for selection. Specifically know the importance of control selection

� Define and describe matching and the different types, along with problems associated with matching

� Explain the advantages of embedding a case-control study in a defined cohort, also know as a nested cased/control study

� What is recall bias and how does it effect a study?

� Explain and describe the design of a cross-sectional study

Study designs so far… � So far, we have examined randomized clinical trials (RCTs) and

cohort studies. � In summary, RCTs, use an experimental design and randomly

assign subjects into an intervention group that receives some sort of treatment/intervention and a comparison group (that receives no treatment) and then follows the subjects to see whether disease occurs in the groups.

� In a cohort study, we begin with exposure first, for example the exposure could be smoking, or diet or physical activity and then we follow those that are exposed (smokers) and unexposed (non-smokers) and see whether disease occurs.

� There is no intervention or treatment in a cohort study. � However, for both study designs we can test whether the

exposure was associated with the disease.

Study designs so far… � Some times however, it’s not feasible to begin with

the exposure first and then follow up participants for 5 – 20 years and see if a disease develops.

� Other times, a clinician/researcher may have a group of diseased individuals and they may want to see what in their life or past few years, the patients have been exposed to that might have caused the disease. They would compare the diseased group’s past exposure to a non-diseased groups (control group) past exposure.

� This is a case-control study.

Design of a Case-Control Study

� In a case-control study, we want to examine the possible relation of an exposure to a certain disease. (Fig. 10-1)

� There are two groups in this study that we want to compare:

� Cases- people with the disease

� Controls- people without the disease

� We want to determine:

� What proportion of cases were exposed and what proportion was not

� What proportion of controls were exposed and proportion was not

In the diagram above, in a case/control study, we begin with the diseased

participants (cases) and those without disease (controls) and collect their

past exposure to see if those with the disease have higher exposure than

those without the disease.

Example of a case-control study � We want to test whether second-hand smoke is associated with


� We would identify a group of individuals with asthma.

� We would also need to identify a group of people without asthma.

� Then we would collect information on the exposure (second- hand smoke) and see whether those with asthma and without asthma were exposed to second-hand smoke.

� If there is an association between second-hand smoke and asthma, the prevalence of the history of second-hand smoke should be higher in those with asthma compared with those without asthma.

Measuring past exposure � How can you measure past exposure?

� There are several ways. Generally you can use a questionnaire and ask the participants about their past exposure, or review past medical records (i.e. if you were interested to see if BMI is associated with disease then you can collect height and weight recorded in medical records).

� One can also conduct a biologic assay to test for chemicals in the blood.

First Select

Cases (With Disease) Controls (Without


Then Measure Past Exposure

Were exposed a b

Were not exposed c d

Total a + c b + d

Proportions exposed

� The 2 x 2 table above (10-1) represents how a case/control study is conducted and importantly, how to measure whether the exposure is associated with increased odds of having the disease.

� First we select the cases and controls. Then we interview them on their past exposure. a + c represents all the cases. To measure the proportion of cases who were exposed, you would divide a/a+c. b + d represents all those without disease-the controls. To figure out the proportion of those without disease who were exposed, you would divide b/b+d.

� If exposure is associated with disease, we would expect the proportion of the cases who were exposed (a/a+c), to be greater than the proportion of the controls who were exposed (b/b+d).

CHD Cases Controls

Smoke cigarettes 112 176

Do not smoke cigarettes 88 224

Total 200 400

% Smoking cigarettes 56.0 44.0

� Gordis provides an excellent example of a case/control study above

(table 10-2). In this study, we want to assess whether cigarette smoking is associated with heart disease/CHD.

� First we find a group with CHD and a group without CHD (controls).

� Then we interview them on their past smoking history.

� If we hypothesize there is a relationship between smoking and CHD, then we would suspect that the proportion of smokers would be higher in those that have CHD compared with those who do not.

� From the table above, the proportion of smokers was higher in the CHD group (112/200 = 56%) compared with the controls (176/400 = 44%).

� This is only the first step and a more concrete way of assessing higher odds of having the disease will be discussed in chapters 11 and 12.

� Based on the table above, if I provided you with the numbers for box a and b and gave you the totals, you should be able to figure out what belongs in the other boxes/cells.

Differences between Case-Control

Study and Cohort Study

� According to Gordis, “the hallmark of the case-control

study is that it begins with people with the disease (cases)

and compares them to people without the disease


� This is different in comparison to a cohort study, which

starts with a group of exposed people and compares them

to a group of non-exposed people.

Differences between Case-Control

Study and Cohort Study (cont’d)

� People think that the main difference between the two

studies is that cohort studies go forward in time and case-

control studies go backward in time.

� This incorrectly implies that calendar time is what

distinguished these two studies from each other.

� The previous chapter explained that a retrospective cohort

study also uses data from the past, which means that

calendar time is not the characteristic that distinguishes

cohort from case-control.

Selection of Cases

� There are many different sources where cases can be

selected: hospital patients, patients in physicians’

practices, or clinic patients.

� Patient registries from communities can serve as sources

for cases as well.

� Problems can arise when selecting cases.

� If the cases come from one source, any risk factors that are

identified may be unique to that source only and therefore

ideally cases should come from several different sources,

including hospitals in different areas, clinics, registries,


How many controls can you have? � Many case-control studies generally have a ratio of 1:1,

meaning one case to one control.

� However, in some examples, if a disease is rare there may not be enough cases so the sample size will be small and therefore we may not be able to detect a relationship between our exposure and disease.

� In order to detect an association if it is present, investigators increase their sample by increasing the number of controls.

� The highest ratio that is used for a case control study is 1:4, meaning 4 controls for every case.

Famous example of case/control

study � One of the most famous examples of a

case/control study was conducted by Sir Richard Doll and Bradford Hill.

� Doll and Hill were one of the first to hypothesize that smoking and lung cancer is associated. They conducted a case-control study to show this relationship.

� For their work, Richard Doll was knighted by the Queen of England. Also, he stopped smoking after he reviewed his own study’s conclusions.

Sir Richard Doll

Selection of Cases

• Ideally, an investigator identifies & enrolls all incident cases in a defined population in a specified time period

• One can select cases from registries or hospitals, clinics • When all incident cases in a population are included, the study is

generalizable to other populations; otherwise there is potential for bias.

• The most common bias is referral bias. For example if cases (lets say we are examining lung cancer) are selected from a hospital that admits or is referred to for gastrointestinal symptoms, then maybe when we look at exposures, caffeine use and antacid use maybe more commonly reported among the cases, because patients with GI symptoms often have these exposures. Therefore, it may appear that caffeine use is related to lung cancer, but this is probably because of the referral pattern.

• There is a similar example of pancreatic cancer and caffeine use in the text.

Incident or Prevalent Cases �When we are conducting a case/control

study, how do we know whether to use incident, meaning newly diagnosed cases or prevalent, meaning existing cases.

�So for example, should we use women who have just been diagnosed with breast cancer (incident cases) or cases that have been diagnosed within the past 5 years (prevalent)?

Incident or Prevalent Cases � Using incident cases may be a problem because we

must wait for new cases to be diagnosed.

� Prevalent cases allow us to have people who are already diagnosed and “living” for a while, which offers us a large number of cases for study.

Incident or Prevalent Cases � Between the two, it is preferable to use incident cases

in case-control studies of disease etiology. Mainly because if we use prevalent cases, those who die off immediately after the disease would be excluded from the study and if the exposures of those who died is different than the prevalent cases then our findings between the exposure and disease are skewed.

� This is a type of “survivorship” bias, because those that survived and are prevalent cases could be different than those who died soon after diagnosis.

Selection of controls � One of the most important aspects on whether

one will find the most “accurate” relationship between the exposure and outcome in a case/control study is how the controls are selected.

� It is critical that the exposure in the controls is representative of the exposure in the general population that the cases were selected from.

� See example on next page.

Selection of controls � For example, lets say we are interested in looking at alcohol use

and liver problems in CSUF students. We select our cases from the health center and then decide that it would be convenient to select our controls from a fraternity. What is the problem with this control selection?

� When we inquire about the exposure-alcohol use, it probably is higher in the fraternity group and possibly it may not be high for the cases going to the health center, so it may appear that alcohol use is protective against liver disease, because we erroneously selected controls (disease-free individuals) that may not represent the “general” CSUF population.

Population-Based Controls

• The best control group is a random sample of individuals from same source population (as the cases) who have not developed the disease.

• Population-based controls are the best way to ensure that the distribution of exposure among the controls is representative of the general population.

Nonhospitalized Persons as Controls

� These controls may be selected from several sources in the community.

� One option is to select a resident of a defined area, such as the neighborhood in which the case lives. This is a neighborhood control. The most common way to obtain neighborhood controls is by going door-to- door for interviews.

� However, random-digit dialing is more widely used in the U.S. This is a system where a computer will randomly select and dial telephone numbers.

Nonhospitalized Persons (cont’d)

� Another approach to select controls is to use a best friend control.

� A case is asked for the name of a best friend who is more likely to participate in the study knowing that his/her friend is also participating.

Hospitalized Persons as Controls

� According to Gordis, “hospital patients are often

selected as controls because of the extent to which

they are a “captive population” and are clearly


�When using hospital controls, it is questioned whether

to use a sample of all other patients admitted to the

hospital or whether to select a specific “other


� If we were to choose specific diagnostic groups, on

what basis do we select and exclude groups?

Problems in Control Selection

� We discussed this earlier that depending on how we select our controls, it can completely skew or even result in the opposite findings, therefore control selection is key for case/control studies.

� Please review the example of coffee drinking and pancreatic cancer in chapter 10 which exemplifies this issue.

Matching � A major concern in case-control studies is that cases

and controls may differ in characteristics/exposures other than the one that has been targeted for the study.

� One method for dealing with this problem is to match the cases and controls for factors about which we may be concerned.

� Matching: the process of selecting the controls so that they are similar to the cases in certain characteristics.

� There are two types: group matching and individual matching.

Group Matching

� Also known as frequency matching.

� It consists of selecting the controls in such a way that the proportion of control with a certain characteristic is identical to the proportion of cases with the same characteristic.

� This requires that all of the cases be selected first, followed by selecting the controls.

Individual Matching

� Also called matched pairs

� For each case selected, a control is selected who is similar to the case in terms of the specific variable(s) of concern. Therefore, each case is individually matched to a control.

� It is most often used in case-control studies that use hospital controls.

Problems with Matching:

Practical Problems

� Practical Problems: If an attempt is made to match according to too many characteristics, it may prove difficult to identify an appropriate control.

� Example from text: Trying to match each case for race, sex, age, marital status, number of children, zip code, area of residence and occupation.

� The more variables chosen to match, the more difficult it will be to find a suitable control.

Problems with Matching:

Conceptual Problems � Once controls have been matched to cases according to a given characteristic, we cannot study that characteristic. Why is this?

� When you match according to a certain characteristic, you have artificially established an identical proportion in cases and controls.

� By using matching to impose comparability for a certain factor, we ensure the same prevalence of that factor in the cases and controls.

� Consequently, we do not want to match on any variable that we may wish to explore in our study.

Matching (cont’d)

� In conclusion to matching, in carrying out a case- control study, we only want to match on variables that we are convinced are risk factors for the disease.

� Matching on variables other than these is called overmatching.

Limitations in Recall

� In case-control studies, data collection occurs by interviewing subjects.

� All humans are limited to some extent in their ability to recall information, which makes this an issue in case-control studies.

� Recall bias is one of the key biases in case/control studies.

Limitations in Recall � Recall bias doesn’t refer to “memory bias” or actually

remembering the data, but it refers to how or why a patient may remember certain data.

� For example, memory bias is simply not being able to remember exposure. For example remembering what you had to eat over the past 3 years would be difficult.

Recall Bias � However, recall bias refers to when cases might

selectively remember differently than controls. If you ask a mother who gave birth to a baby with neural tube defects about her lifestyle, she may really try and recall what her lifestyle was because she really wants to know what caused the illness in the baby.

� And therefore she may report exposures that a woman who had a “normal” baby wouldn’t really “probe into” herself because her baby was fine.

� So, then because the recall is greater in the cases, we may artifactually report a relationship between the exposure and disease when it really isn’t there.

When is a Case-Control Study

Warranted ?

� According to Gordis, “a case-control study is useful as a first step when searching for a cause of an adverse health outcome.

� Using the case-control design, we compare people with the disease (cases) and people without the disease (controls).

� We can then explore the possible roles of a variety of exposures or characteristics in causing the disease.

When is a Case-Control Study

Warranted ? (cont’d)

� If the exposure is associated with the disease, we would expect the proportion of cases who have been exposed to be greater than the proportion of controls who have been exposed.

� Case-control studies are less expensive than cohort studies and can be carried out more quickly. This is why they are often the first step in determining whether an exposure is linked to an increased risk of disease.

When is a Case-Control Study

Warranted ? (cont’d)

� Case-control studies are valuable when the diseases being investigated is rare.

� It is often possible to identify cases for study from disease registries, hospital records, or other sources.

� Cohort studies, on the other hand, are not good for rare diseases and may involve many years of follow-up to actually wait for the rare disease to occur.

Case-Control Studies Based in a

Defined Cohort � This combined study is in effect a hybrid design in which

a case-control study is initiated within a cohort study.

� In this type of study, a population is identified and

followed over time. At the time the population is

identified, baseline data are obtained from records or


� The population is then followed for a period of years.

� For most of the diseases that are studied, a small

percentage of study participants manifest the disease,

whereas most do not.

Two Types of Cohort-Based Case-

Control Studies � Nested case-control studies: In theses studies, the controls

are a sample of individuals who are at risk for the disease

at the time each case of the disease develops.

� Basically a nested case-control study is “embedded” in a

cohort study. In a cohort study, we follow-up those who

developed a disease or not.

� In a nested case/control, those that develop the disease

become our cases and those that haven’t become the

controls. This is a good way to ask a new hypothesis.

� And then we can ask the cases and controls about their

exposures or use the biological samples that have already

been collected in the cohort study and test for differences

between our newly created cases and controls.

Advantages of Embedding a Case-Control

Study in a Defined Cohort � 1) Recall bias is eliminated because interviews or

blood/urine specimens are obtained at the beginning of the


� 2) Collection of biological samples was completed prior to

when the disease occurred, so we avoid problems of

whether the disease may have “altered” or changed

chemicals in the blood.

� 3) It is more economical to conduct. Specimens obtained

initially are frozen or otherwise stored, which reduces


� 4) Cases and controls are derived from the same original

cohort, so there is likely to be greater comparability

between the cases and the controls.

Cross-Sectional Studies

� This study is also used to investigate etiology of disease.

� In this study, both exposure and disease outcome are

determined simultaneously for each subject, as if we were

viewing a snapshot of the population at a certain point in


� Often a cross-sectional study is referred to as a snapshot in


� The cases of disease that are identified in this study are

prevalent cases because we know that they existed at the

time of the study but did not know their duration. For this

reason, this design is also known as a prevalence study.

Design of a Cross-Sectional Study

� Figure 10-14 displays the design of a cross-sectional study on the next slide.

� We define a population and determine the presence or absence of exposure and the presence or absence of disease for each individual.

� Each subject can then be categorized into one of four possible subgroups.

Advantages of cross-sectional

studies � Cross-sectional studies are important for public

health planning.

� By examining exposures and diseases in a population, we can estimate how many hospital beds we’ll need, if we should implement a nutrition and physical activity campaign.

� Some very popular cross-sectional studies are NHANES – National Health and Nutrition Examination Survey and the California Health Interview Survey (CHIS).