| Sign In to gain access to subscriptions and/or personal tools. |
Considerations of Study DesignDepartment of Medicine, Olive View–UCLA Medical Center, Sylmar, California Correspondence: Ronald L. Koretz, MD, Department of Medicine, Olive View–UCLA Medical Center, 14445 Olive View Drive, Sylmar, CA 91342. Electronic mail may be sent to rkoretz{at}ladhs.org. Research projects attempt to answer specific questions. The particular study design that is selected will depend in large measure on the nature of the question and the time and resources available. There are 5 common categories of clinical questions; they relate to etiology, prognosis, utility of diagnostic tests, efficacy of proposed interventions, and cost of treatment in specific disease states. A number of study designs can be used. Case reports serve to memorialize unusual or novel aspects of diseases. Retrospective case series are useful for defining natural history. Case-control studies are used by epidemiologists to elucidate potential etiologies of diseases. Prospective cohort studies can be used to assess natural history or to assess potential disease etiologies. Controlled trials are designed to assess the efficacy of therapeutic interventions. Studies that define the sensitivity and specificity of diagnostic tests can be used to assess the utility of those tests. Economic analyses estimate the costs that particular diseases or therapies will require. Each of these study designs has limitations; with the exception of high-quality randomized trials, none of these study designs can establish a causative relationship between putative etiologic (or therapeutic) factors and disease (outcomes).
Research: critical and exhaustive investigation or experimentation having for its aim the discovery of new facts and their correct interpretation, the revision of accepted conclusions, theories, or laws in the light of newly discovered facts, or the practical applications of such new or revised conclusions, theories, or laws. —Webster's Third International Dictionary Any particular research project is undertaken with the aim of addressing a specific question. Within the area of clinical medicine, there are a multitude of questions that could be posed. The type of study (ie, the study design) that is used depends on the nature of the question. If one were to ask what kind of research is the most important in clinical medicine, it is likely that the most popular answer would be "the randomized, controlled trial." However, it is important to understand that many questions do not require the performance of randomized trials. The objective of the material that follows is to describe the different types of commonly used research designs and to put each design into context. For our purposes, we will restrict our discussion to clinical research, that is, projects that assess humans and do not require sophisticated laboratory methodology. Five categories of questions are commonly asked and are enumerated in Table 1.
It is important to realize that practical constraints will limit the performance of research. Although it might be desirable always to enroll large numbers of patients and, in the case of assessing interventions, only to undertake randomized trials, the cost and time that would be required are frequently beyond the capability of the investigator. For example, if the investigator is a trainee in a cardiology program who is assigned a 6-month period to undertake a research project, asking a question that takes decades to answer, such as the effect of statins on survival, would be inappropriate. On the other hand, a retrospective chart review, perhaps comparing the outcomes in those who did or did not receive the drug, could be accomplished in the time allotted (even if the data could, at best, only demonstrate an association). Even if a junior faculty member of that department (who had 30 years to do research) wanted to ask the former question, such a trial would require the enrollment of hundreds if not thousands of patients. The financial investment that would be required would very likely be beyond his or her resources. Furthermore, even if that faculty person obtained the funding, undertaking a study that would result in 1 publication some 20 years later would not allow academic promotion. In fact, the "publish or perish" mentality has had the unintended consequences of both a growth of the medical literature (with a subsequent requirement for more and more papers from investigators) and an increased pressure for the investigators to perform research and write papers in shorter time periods. Thus, when reading the medical literature, one encounters a variety of different types of study designs. The varieties we will consider are enumerated in Table 2. (Although some evidence-based medicine experts have categorized these trials as observational or experimental, depending the absence or presence of an intervention, we will not use this classification.) Each of these techniques has distinct advantages and disadvantages that we will now consider.
This is the simplest study design we will consider. The investigators, who are often practicing clinicians, describe unusual or thought-provoking aspects about one (or a few) case(s) that they saw. The issue could be a rare disease, an unusual presentation of a more common disease, a proposed relationship of a disease to an exposure, a previously unrecognized complication (of a disease or a therapy), or even a new therapeutic approach. Ideally, the authors of the paper review the past medical literature to see what has been previously reported, then put their observation into perspective. However, because only 1 patient (or a few patients) is described, no conclusion can be made about the absolute frequency of the event because there is no denominator. The value of this type of report is the memorialization of the event so that, should another clinician see a similar thing, he or she could take comfort that this has been seen before and perhaps avoid a lengthy evaluation assessing other issues. With regard to reports of new therapies, the case report can serve as the basis of a therapeutic hypothesis. At times, such reports are even of historic importance; for example, the first publication about acquired immunodeficiency syndrome was a description of 4 afflicted patients.1
These reports typically come from academic institutions or other centers where records of patients are systematically stored. This methodology is typically used to describe the natural history of a disease. In such studies, all of the patients with the disease of interest who were seen at that center are identified. Their records are then reviewed and various outcomes are ascertained. By definition, all of the events have already happened; we refer to such looks into the past as "retrospective" studies. The advantage to this approach is that a denominator is available. Thus, the investigator can calculate an incidence rate for the outcome (whether it is favorable or unfavorable). These types of studies are relatively easy to do; the investigator simply reviews a number (often a large number) of records. There are limitations to these reports. First, special centers may do more tests and thus make diagnoses, particularly of asymptomatic conditions, more frequently (ascertainment bias). More important, there is commonly an element of referral bias. It is usually unknown why each particular patient came to the institution; if the center is recognized as being staffed by experts in a particular disease, patients with that disease who are seen may represent those with more complicated disease. Thus, the incidences of observed complications may be exaggerated (especially if some of the patients specifically came to the center because they had the complication). An example of this latter phenomenon exists in the hepatitis C literature. The rate of development of decompensated liver disease in patients with hepatitis C seen in tertiary referral centers is 5–20 times higher2 than the rate observed in cohorts identified at the time of infection.3 This methodology has also been used to support the use of therapeutic interventions. However, this is a more problematic application of the technique. Various biases can create problems in the interpretation of such reports. The reason (or reasons) for treating these patients is rarely provided. In centers of excellence, the outcomes from treatment would be expected to be better than what could be achieved by healthcare workers with less expertise ("guru" bias). Because the observations were made by patients or healthcare personnel who were fully aware of all of the clinical details, their own biases may have had an influence. Publication bias (the preferential reporting of favorable or dramatic effects) may also result in an intervention appearing to be more effective than it really is. This is the problem of validation; even if these results are true at 1 center, it is unknown (because of the reasons just cited) if the data can be extrapolated to other centers. Finally, these reports do not contain control groups; thus, the background rates of the outcomes in question in untreated patients are unknown. (We will consider reports that included untreated "controls" in the section on controlled trials.)
The case-control methodology was originally used by epidemiologists to identify etiologies of disease. The process involves identifying a group of patients with a particular disease and then matching them (usually at least by gender and age) to controls who do not have the disease. Records are then reviewed, looking for factors that are suspected as being important in causing the disease. Factors that do appear more often in the disease group are potential causes; factors that appear less often can be classified as potential protectors. This type of study is also retrospective in nature; all of the important events have already occurred. It is to be noted that a case-control study can only establish an association between the factor and the disease. Association is not the same thing as causation. Two factors, A and B, may be associated because A caused B (in which case the association is causative). However, they would also be associated if B caused A or (perhaps more commonly) some other factor such as C, caused both A and B. This has been a common source of confusion in the nutrition support literature. Numerous (not limited to case-control) studies have observed that malnourished patients with a certain disease have poorer clinical outcomes than do nourished patients with the same diagnosis. However, the randomized trials of nutrition support vs no nutrition support have generally found that improving the nutrition parameters did not improve the clinical outcomes,4 suggesting that the association is not causative. When one is considering etiology of disease, the demonstration of association is sufficient to stimulate further research into the issue. A classic example of this situation is the putative relation between smoking and lung cancer. The conclusion that smoking causes cancer is based on observations of association; it would be unethical to randomize patients to smoking or no smoking to prove that smoking does cause cancer. Another helpful use of case-control methodology is in detecting rare events, particularly related to therapeutic interventions. In order to demonstrate such rare events prospectively, thousands of subjects would have to be followed. Case-control studies are less suited to draw conclusions about the efficacy of therapeutic interventions. One example of this was a report that associated dying of distal colorectal cancer with the failure to have a screening sigmoidoscopy in the preceding 10 years.5 This paper highlights one of the important flaws of the case-control study design. All of the relevant factors have to be considered (even if we rarely know what all of those factors are). In the sigmoidoscopy study, all of those who died of colorectal carcinoma, but none of the controls, had colon cancer. The major relationship that this study demonstrated was that dying of cancer was strongly associated with having cancer. Another example of the inability of case-control methodology to predict therapeutic efficacy was the reported association between hormone replacement therapy and favorable cardiac outcomes in women; it took a large randomized trial, the Women's Health Initiative, to demonstrate that such treatment actually caused harm.6
Investigators undertaking cohort studies identify a group of patients at a point in time and then follow them forward (into the future, or "prospectively"). Such studies can be used to define the natural history of a disease process. In these instances, the patients all have a particular condition and no control group is required. More commonly, cohort studies are used to establish associations between a particular preexisting condition (risk factor) and a subsequent event (ie, looking for potential etiologies of disease). In this methodology, the group to be followed is divided into those who did or did not have the condition (or the population is divided into percentiles, depending on the severity of the condition). Over time, the incidences of the event in question are assessed in the different groups. The major advantage of the prospective design is that decisions can be made at the start of the trial regarding what data will be subsequently collected. However, because the initial condition was a state that was out of the control of the investigator, any differences between the groups can only be considered to be associative; this type of study is incapable of establishing causation. Cohort trials can be very large. The Framingham Heart Study began in the late 1940s and included over 5000 inhabitants of Framingham, MA. It initially focused on lifestyles and the subsequent risk of cardiovascular disease. Two Nurses' Health Studies (the first beginning in 1976 and the second one in 1989) prospectively followed thousands of nurses and generated data regarding risk factors for major chronic diseases. A "cross-sectional" study resembles both the cohort study design and the case control study design. However, neither a retrospective record search nor a prospective follow-up is undertaken. Rather, the population is divided into groups depending on the presence or absence of a disease, and the prevalence of putative risk factors at that time is determined in each group.
This design is typically used to demonstrate the efficacy of a particular intervention. "Untreated" controls are compared with individuals who received the treatment of interest; such control subjects can be identified in several ways. However, the important distinction is whether or not the controls were randomized. Nonrandomized controls may be retrospectively identified ("historical" controls) or simply may be patients who, for some reason, were not treated and were followed prospectively. In principle, the purpose of the control is to identify a subject who differed from the treated patient solely on the basis of not having received the treatment. If that variable only were to be changed, any subsequent difference in outcome could be attributed to this single change (and thus a causative relationship between the intervention and the outcome can be established). In a randomized trial, a homogeneous group of patients eligible for the experimental therapy is identified and then each one of them is, by chance, either given or not given that treatment. Unless the randomization scheme breaks down, only 1 variable is changed. The different demographic features are equally likely to be randomized into one arm as into the other. To prove that such a balanced distribution of the demographic features did occur, reports of randomized trials typically present the characteristics of the 2 groups in an introductory table. A nonrandomized control patient (regardless of whether he or she is retrospectively or prospectively identified) must be different from the treated subject in at least 2 aspects: nonreceipt of the therapy and reason why the treatment was not provided. Regardless of what that reason or reasons was, there are now at least 2 variables that are different between the 2 groups. Any observed difference in outcome between the groups cannot automatically be ascribed to the provision or nonprovision of the treatment. As a consequence, if a difference in outcome between the 2 groups is found, that difference can only be considered as being associated with the treatment. A causative relationship cannot be inferred. In other words, such data can be used to create the hypothesis that the treatment may have an effect, but they cannot prove it. We refer to the baseline differences between the treated and control groups as demographic confounders. Randomization eliminates this confounding bias. However, randomization alone does not eliminate all bias. These other biases are summarized on Table 3. Randomized trials can be graded by the degree to which the investigators have used methodologic rigor to avoid bias; trials that are more rigorously designed and executed are referred to as being of "high quality." Empiric observations have suggested that, when the same question is addressed, high-quality trials usually demonstrate a smaller treatment effect.7,8 The enhanced effect in the lower-quality trials is attributed to the presence of bias, not to a true effect of the intervention.
Because randomized trials eliminate more bias than nonrandomized trials, they have been labeled as the gold standard for assessing efficacy.9,10 In the past decade, an even higher standard has been proposed, namely, a systematic review of randomized trials.11 Systematic reviews are organized efforts to consider all of the available evidence. Systematic reviewers design a strategy to find all of the relevant randomized trials, a technique to abstract specific data from those trials, and a mechanism to use those data to formulate the conclusion. The methodology used to perform each of these steps is defined before the searching, data abstraction, and analysis are even begun. The data drive the conclusions. Before leaving the subject of controlled trials, a comment should be made regarding when controls are not necessary. There are a few medical scenarios for which the outcome in any given patient is absolutely predictable and unfavorable. In such instances, if an intervention favorably alters that prediction, efficacy is established. Examples of such scenarios are the use of cardiopulmonary resuscitation in cardiac arrest or the use of hemodialysis in end-stage renal disease.
Some papers deal with the value of a diagnostic test. Typically, patients with and without the disease are assessed. Sensitivity is defined as the percentage of patients with the disease who have a positive test. Specificity is defined as the percentage of patients without the disease who have a negative test. Obviously, in order to estimate these test characteristics, there must be an independent gold standard that will reliably define the presence or absence of the disease. When deciding whether or not to use the test, however, knowledge of the sensitivity and specificity alone is of limited value because one does not know up front whether or not the patient has the disease in question. (If one did know, the test would not be needed.) Knowing the sensitivity and specificity is useful when likelihood of the disease is known for the patient. We refer to this as the pretest probability, and it is equivalent to the prevalence of the disease in a population of individuals similar to the patient in question. If the pretest probability is low, the limiting factor is the specificity. This can be illustrated with an example in clinical medicine. It has become common for emergency room physicians to do some diagnostic tests to "rule out" a myocardial infarction in any patient who complains of chest pain, regardless of how atypical the pain really is for cardiac ischemia. Consider a middle-aged woman who is seen for pain in her shoulder (interpreted by the physician as left upper chest area) that only occurs when she moves her arm; furthermore, she has point tenderness over that shoulder that precisely duplicates the pain about which she is complaining. The likely diagnosis is some type of shoulder injury but, because the pain was localized in the "left chest," the emergency room physician may order some blood test, such as a serum troponin level. Let us assume that the serum troponin level is 95% sensitive and 95% specific (which is actually too high an estimate for either). It should be clear that the pretest probability of the patient having a myocardial infarction is close to 0%; for purposes of this discussion, we will make that probability 0.1%. Thus, if 100,000 patients with this scenario were seen, 100 of them would have had an infarct and 95 of those 100 would have a positive test (95% sensitivity). However, because the test is also positive in 5% of those without the disease (95% specificity), 4995 of the 99,900 patients without an infarct would also have a positive test. Thus, of the total of 5090 positive tests, only 95 of them (1.9%) will be true positives (ie, reflecting the occurrence of a myocardial infarction); the remaining 98.1% of the patients with positive tests would be subjected to further unnecessary diagnostic assessments. On the other hand, when there is a high pretest probability, the limiting factor is the occurrence of false-negative tests. False negativity is a problem related to sensitivity.
These papers are usually more difficult to read because they do not simply present data derived from clinical observations. Rather, the investigators survey the published literature to assemble the best evidence regarding what is known about the natural history, response to treatment, and individual costs, and then create a model of a hypothetical patient with the disease. This model is then used to generate cost estimates. These analyses can be used to compare 2 competing forms of therapy. They can also derive the absolute cost for achieving a particular outcome with a single therapy or the relative cost of using or not using the therapy. The information is most useful when considering policy decisions regarding how to expend scarce resources. The major problem with these analyses relates to the amount of information that is really known. When important information is not known, assumptions have to be made. The reader has to pay particular attention to these assumptions which are often not realistic. The costs derived from such analyses should be particularly suspect when the authors have vested interests in the therapy.12,13
There is a vast medical literature from which to select papers to read. Methodology varies enormously. For any paper, it is important to understand what question is being asked and the utility of the study design used to answer such a question. The best way to develop this skill is to read the medical literature. A helpful aid in this process for most healthcare workers is to gather and discuss particular papers in depth, an exercise usually referred to as a journal club. The important thing to remember about such exercises is that the reading is not being done to keep up with "what is new"; rather, it is to develop critical reading skills. Even if local expertise in critical reading is missing, a good-faith effort to identify the strengths and weaknesses of the methodology will be beneficial. Such journal clubs have to focus on the methodology, not on what the authors wrote in the abstract.
Nutrition in Clinical Practice, Vol. 22, No. 6,
593-598 (2007) This article has been cited by other articles:
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


