Tips On Critical Appraisal of Evidence

Tips on critical appraisal of evidence: Diagnosis
Clinical scenario: Elderly woman with possible iron deficiency anaemia
Are the results of this study valid?

Returning to our clinical scenario from the question formulation tutorial: You admit a 75 year old woman with community-acquired pneumonia. She responds nicely to appropriate antibiotics but her hemoglobin remains at 100 g/l with an MCV of 80. Her peripheral blood smear shows hypochromia, she is otherwise well and is on no incriminating medications. You contact her family physician and find out that her Hgb was 105 g/l 6 months ago. She has never been investigated for anaemia. A ferritin has been ordered and comes back at 40 mmol/l. You admit to yourself that you're unsure how to interpret a ferritin result and don't know how precise and accurate it is. In the tutorial on clinical questions we formulated the following question: In an elderly woman with hypochromic, microcytic anaemia, can a low ferritin diagnose iron deficiency anaemia? Our search of the literature to answer this question retrieved an article from the Am J of Medicine (1990;88:205-9). How do we critically appraise this diagnosis paper? We'll start off by considering validity first and the following list outlines the questions that we need to consider when deciding if a diagnosis paper is valid. 1. Was there an independent, blind comparison with a reference ("gold") standard of diagnosis? 2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? 3. Was the reference standard applied regardless of the diagnostic test result? 4. Was the test (or cluster of tests) validated in a second, independent group of patients?
Was there an independent, blind comparison with a reference ('gold') standard of diagnosis?
In considering this question, we need to determine whether all patients in the study underwent both the diagnostic test under evaluation (in our scenario, the serum ferritin) and the reference standard (in our scenario, bone marrow biopsy) to show that they definitely do or do not have the target disorder. We should also ensure that those investigators who are applying and interpreting the reference standard do not know the results from the diagnostic test. We also need to consider if the reference standard is appropriate. Sometimes a reference standard may not be clear cut, (such as in the diagnosis of delirium) and in this case, we'd need to review the rationale for the choice of reference standard as outlined by the study authors. All patients in the study we found underwent serum ferritin testing and bone marrow biopsy.
Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?
1
The study should include both patients with common presentations of the target disorder and those with conditions that are commonly confused with the target disorder of interest. If the study only includes patients with severe symptoms of the target disorder (and who would be very obvious to diagnose) it is not likely to be useful to us. We need to find out if patients with varying severity of the disease were included in the study and also whether it includes patients with target disorders that are often confused with this one. For example, anaemic patients can be symptomatic or asymptomatic and the anaemia can result from a number of causes - we would want to ensure that the study we retrieved included patients with a variety of presentations and symptoms. Reviewing the ferritin study, it included consecutive patients over the age of 65 who were admitted with anaemia to a university-affiliated hospital in Canada. It excluded patients from institutions and patients who were too ill or who had severe dementia. No details are provided on the definitions used for 'too ill' or 'severe dementia'.
Was the reference standard applied regardless of the diagnostic test result?
We need to check to see that even if a patient's serum ferritin was normal, the study investigators performed the reference standard. Sometimes if the reference standard is invasive, it may be considered unethical to perform it on patients with a negative test result. For example, if a patient with chest pain is suspected to be at low risk of a pulmonary embolism and has a negative V/Q scan, an investigator (who is performing a study looking at the accuracy of the V/Q scan in diagnosing pulmonary embolism) may not want to subject the patient to pulmonary angiography which is not without morbidity and mortality. Indeed, this was what the investigators did in the PIOPED study - if patients were considered to be at a low risk of a pulmonary embolism and had a negative V/Q scan, rather than undergoing a pulmonary angiogram, they were followed up clinically for several months, without receiving antithrombotic therapy to see if an event occurred. In the ferritin study, all patients received both the diagnostic test and the reference standard.
Was the test (or cluster of tests) validated in a second, independent group of patients?
The tests should be assessed in an independent 'test' set of patients. This question is important in studies looking at multiple diagnostic elements. If the study fails any of the above criteria, we need to consider if the flaw is significant and threatens the validity of the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all of the above criteria and we will proceed to assessing it for importance.
Are the results of this study important?

Let's begin by drawing a 2x2 table, using the results from the study that we identified:
Target Disorder (iron deficiency anaemia) Present 70 a 15 c a+c Totals 85 15 b 135 d b+d 150 Absent 85
Totals
Diagnostic test result (serum ferritin)
Test Positive ( 45 mmol/l)
a+b 150 c+d a+b+c+ d 235
Test Negative (>45 mmol/l)
Our patient's serum ferritin comes back at 40 mmol/l and looking at the Table, we can see that she fits in somewhere in the top row (either cell 'a' or cell 'b'). From the Table we can also see that 82% (70/85) of people who have iron deficiency anaemia have a serum ferritin in the same range as our patient - this is called the sensitivity of a test. And, 10% (15/150) of people without this diagnosis have a serum ferritin in the same range as our patient - this is the complement of the specificity (1-specificity). The specificity is the proportion of people without iron deficiency anemia who have a negative or normal test result. We're interested in how likely a serum ferritin of 40 mmol/l is in a patient with iron deficiency anaemia as compared to someone without this target disorder. Our patient's serum ferritin is 8 (82%/10%) times as likely to occur in a patient with iron deficiency than in someone without iron deficiency anaemia - this is called the likelihood ratio for a positive test. We can now use this likelihood ratio to calculate our patient's posttest probability of having iron deficiency anaemia. Our patient's posttest probability of having iron deficiency anaemia is obtained by calculating: posttest odds/(posttest odds + 1) where posttest odds = pretest odds x likelihood ratio The pretest odds are calculated as pretest probability/1-pretest probability. We judge our patient's pretest probability of having iron deficiency anaemia as being similar to that of the patients in this study (a+c/a+b+c+d = 85/235 = 36%) and therefore: pretest odds = (0.36/(1-0.36) = 0.56 Using this we can calculate
posttest odds = 0.56 x 8 = 4.5 And, finally, posttest probability = 4.5/5.5 = 82% With this information, we can conclude that based on our patient's serum ferritin, it is very likely that she has iron deficiency anaemia (posttest probability > 80%) and that our posttest probability is sufficiently high that we would want to work our patient up for causes of this target disorder. Instead of doing all of the above calculations, we could simply use the likelihood ratio nomogram. Considering that our patient's pretest probability of iron deficiency anaemia was 36%, and that the likelihood ratio for a serum ferritin of 40 mmol/l was 8, we can see that her posttest probability of iron deficiency anaemia is just over 80%.
Multilevel tests
In the paper we found, the serum ferritin results are divided into 3 levels: =45 mmol/l, 46-100 mmol/l and >100 mmol/l. We can see that more information about the diagnostic test is available when results are presented in multilevels:
Target Disorder (iron deficiency anaemia) Diagnostic test result Present 45 mmol/l > 45 100 mmol/l 70/85 7/85 Absent 15/150 27/150 8 0.4 Likelihood ratio
> 100 mmol/l
8/85
108/150
0.1
If our patient's serum ferritin was 110 mmol/l (and using her pretest probability of 36% and the likelihood ratio of 0.1), her posttest probability of iron deficiency anaemia would be less than 3%, virtually ruling out the possibility of this diagnosis. However, if her serum ferritin came back at 65, her posttest probability would be 10% and we'd have to decide if this was sufficiently low to stop testing or if we needed to do further investigations.
Can you apply this valid, important evidence about a diagnostic test in caring for your patient?
Is the diagnostic test available, affordable, accurate, and precise in your setting?
4
Can you generate a clinically sensible estimate of your patients pre-test probability (from personal experience, prevalence statistics, practice databases, or primary studies)? Are the study patients similar to your own? Is it unlikely that the disease possibilities or probabilities have changed since the evidence was gathered?
Will the resulting post-test probabilities affect your management and help your patient? Could it move you across a test-treatment threshold? Would your patient be a willing partner in carrying it out?
Would the consequences of the test help your patient?
Tips on critical appraisal of evidence: Therapy - Single Trials

Clinical scenario: man with stroke, moderate carotid stenosis

Returning to our clinical scenario from the question formulation tutorial: You admit a 65 year old man with a stroke. On examination you find that he has mild weakness of the right arm and right leg and bilateral carotid bruits. You send the patient for carotid doppler ultrasonography and subsequently receive the report that he has moderate stenosis (50-69% by NASCET criteria) of the ipsilateral carotid artery. You've noticed in the pile of journals that is accumulating in your office that there has been some recent literature addressing surgical versus medical therapy for patients with symptomatic carotid stenosis but you are unsure of what the results of these studies indicate. In the tutorial on clinical questions, we formulated the following question: In a 65 year old man with stroke and moderate carotid stenosis, can carotid endarterectomy decrease the risk of stroke compared with medical therapy? Our search of the literature found article from the Best Evidence (1999;130:33). How do we critically appraise this therapy paper? We'll start off by considering validity first and the following list outlines the questions that we need to consider when deciding if a therapy paper is valid. 1. Was the assignment of patients to treatment randomized? And, was the randomization list concealed? 2. Was follow-up of patients sufficiently long and complete? 3. Were all patients analyzed in the groups to which they were randomized? And some less important points: 4. Were patients and clinicians kept blind to treatment? 5. Were groups treated equally, apart from the experimental therapy? 6. Were the groups similar at the start of the trial?
Was the assignment of patients to treatment randomized? And, was the randomization list concealed?
Randomisation helps ensure that patients in treatment groups are identical at the study onset in their risk of the event we are hoping to prevent. It balances groups for prognostic factors (good or bad) that if they were unequally distributed amongst the groups, could increase, decrease or nullify the effect of the therapy. We need to check if the randomisation list has been concealed from the clinicians who entered patients into the trial. This is done so that the clinicians won't be aware of which treatment the next patient would receive.
The study that we found was randomised (which is one of the inclusion criteria for a therapy article in Best Evidence). From the original article we can see that the randomisation list was concealed and details on the randomisation process were also provided.
Was follow-up of patients sufficiently long and complete?

We'd want to see that the duration of follow-up was sufficiently long to see the outcomes of interest. It is also important that the investigators provide details on the number of patients followed up and if possible, on the outcomes of patients who dropped out of the study. If we are unsure of what effect the dropouts may have on the study result, we can perform a 'sensitivity analysis' for a 'worst case scenario'. For the group that did better, assume that all the people who were lost to follow-up did poorly. For the group that did worse, assume all the people who were lost to follow-up fared well. If the result still supports the original conclusion, than the follow-up was sufficiently complete. It would be unusual for a study to be able to withstand more than a 20% loss of follow-up and therefore most journals of secondary publication (including ACP Journal Club and EBM) use this as an exclusion criteria for article selection. From the abstract we identified in Best Evidence, 99.7%!! of patients were followed up for 5 years.
Were all patients analyzed in the groups to which they were randomized?
Anything that happens after randomisation can affect the chance that a study patient has an outcome event. Therefore, we need to see if the investigators analysed the patients in the groups to which they were randomised, even if they crossed over to the other treatment group. This 'intention to treat' analysis preserves the value of randomisation. An intention to treat analysis was done in the study that we identified. (This information was provided in the abstract available on Best Evidence.)
Were patients and clinicians kept blind to treatment? And, were groups treated equally, apart from the experimental therapy?
Blinding of clinicians and patients helps to prevent additional treatment. The provision of treatment (received in addition to the experimental treatment) to just one of the groups is called cointervention. If either the patients or the clinicians weren't blinded it could lead to the reporting of symptoms or the interpretation of these symptoms to be affected by suspicion about the effectiveness of the treatment under investigation. In the NASCET study, all patients received antiplatelet therapy (this was usually ASA and the dose was left to the discretion of the neurologist at each study centre), and when indicated they received antihypertensive and or antilipidemic medications. Blinding is not always possible (such as in surgery trials) and in these situations we should check to see if outcome events were assessed by blinded investigators. For example in NASCET, outcome events were assessed by 4 groups: the participating neurologist and surgeon; the neurologist at the study centre; by 'blinded' members of the steering committee; and by 'blinded' external adjudicators.
Were the groups similar at the start of the trial?

This is usually reported in the 'Table 1' of the article. If the groups aren't similar, we need to see if there was an adjustment made for the potentially important prognostic factors.
The medical and surgical groups were similar in NASCET. For example, the percentages of patients who were prescribed antihypertensive or antilipidemic medications were similar. If the study fails any of the above criteria, we need to decide if the flaw is significant and threatens the validity of the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all the above criteria and we will proceed to assessing it for importance.

What is the magnitude of the treatment effect?
There are several ways that information about treatment effects can be presented. This discussion will be illustrated using the results of NASCET (for any stroke at 5 years) as shown in the first row of numbers in the table below.
Control Event Rate 0.264 0.000000264 Experimental Event Rate 0.198 0.000000198 Relative Risk Reduction 25% 25% Absolute Risk Reduction 0.066 0.000000066 Number Needed to Treat 15 15,000,000
The control event rate (CER) is the proportion of patients in the control group (in this study, the group that received medical care) that had the outcome event of interest (in our scenario, this would be any stroke). The experimental event rate (EER) is the proportion of patients in the experimental group (patients in the carotid endarterectomy group) that had the outcome of interest. The relative risk reduction (RRR) is one way of describing the treatment effects and is calculated as: RRR = |EER-CER|/CER = |0.198-0.264|/0.264 = 25% Applying this, we can say that if we treat people who have moderate carotid stenosis with carotid endarterectomy we can decrease their risk of future stroke by 25% compared to those people who receive medical therapy only. If the experimental treatment increases the risk of a good event, we can use this same equation to calculate the relative benefit increase (RBI). Similarly, if the experimental treatment increases the risk of an adverse event we can use the equation to calculate the relative risk increase (RRI). The RRR has limitations. Consider the second row of numbers in the table above - when the CER was incredibly small (0.000000264) the RRR remains at 25%. The RRR is unable to discriminate between small treatment effects and large ones and doesn't reflect the baseline risk of the event. One measure that overcomes this is the absolute difference between the CER and EER or the absolute risk reduction (ARR). It is calculated as:
ARR = |EER-CER| = |0.198-0.264| = 0.066 If the experimental treatment increased the risk of a good event, we can use this same equation to calculate the absolute benefit increase (ABI). Or, if the experimental treatment increases the risk of an adverse event, we can use the equation to calculate the absolute risk increase (ARI). Returning to the data in the table, we can see that the ARR reflects the baseline risk of the event and that it discriminates between small and large treatment effects. However, because it is not a whole number, it is often difficult to remember and to translate to patients. To overcome these difficulties, we can take the inverse of the ARR which tells us the number of patients that we'd need to treat with the experimental therapy in order to prevent one additional bad event. This is called the number needed to treat (NNT) and in our example, the NNT is 15. We can see from the table that the NNT (like the ARR) is able to differentiate between small and large treatment effects - in the second row of the table, when the CER and EER are very small, the NNT is over 15 million! When the treatment increases the risk of adverse events, we can calculate the number of patients that we'd need to treat with this therapy to cause one additional bad event and this term is called the number needed to harm (NNH). The NNH is calculated as 1/ARI. How big should an NNT be for us to be impressed? Consider some examples. We'd need to treat 40 people who have suspected MI with aspirin to prevent 1 additional death. And, we'd only need to treat 20 people who have suspected MI with aspirin and thrombolysis to prevent 1 additional death. If you want to see more examples of NNTs, please click here.
What is the precision of the treatment effect?

The confidence interval around the NNT can be calculated as the inverse of the confidence interval for the ARR. The smaller the number of patients who have the event of interest, the wider the confidence interval.
Can you apply this valid, important evidence about therapy in caring for your patient?
Do these results apply to your patient? Is your patient so different from those in the study that its results cannot apply? Is the treatment feasible in your setting? What are your patients potential benefits and harms from the therapy?
Method I: f
Risk of the outcome in your patient, relative to patients in the trial. Expressed as a decimal: ______ NNT/f = ______ / ______ = ______ (NNT for patients like yours)
Method II: 1/(PEER x RRR)
Your patients expected event rate if they received the control treatment (PEER) = ______ 1/(PEERxRRR) = 1/________ = ______ (NNT for patients like yours) Are your patients values and preferences satisfied by the regimen and its consequences?
Do your patient and you have a clear assessment of their values and preferences? Are they met by this regimen and its consequences?
10
Tips on critical appraisal of evidence: Harm

Clinical scenario: Man with extrasystoles on sotalol

Evidence about harm can come from a number of different study types. Ideally we'd like to see a high quality systematic review of randomised trials but these aren't easy to find because RCTs aren't always feasible for issues of harm. As a result, we usually find evidence about harm in cohort studies (groups of patients who are and aren't exposed to the treatment are followed up for the outcome of interest) and case-control studies (patients with the outcome of interest are matched with patients without the outcome and investigators look retrospectively to determine exposure). Case-control studies are useful when the outcome of interest is rare or when the required follow-up is long. The strength of inference that can be drawn from a case-control study is limited because they are more susceptible to bias. Returning to our clinical scenario from the question formulation tutorial: You see a 50 year old man who asks for a repeat prescription of sotalol which he has been taking for extrasystoles for several years. He has a remote history of an MI. You haven't seen him previously and are concerned about the proarrhythmic properties of sotalol given what is known about other antiarrhythmics. During the tutorial on clinical questions we formulated the question: In a man with extrasystoles and a remote history of MI, does treatment with sotalol increase his risk of death? Searching the literature we found an RCT from the Lancet (1996;348:7-12). How do we critically appraise this harm paper? We'll start off by considering validity first and the following list outlines the questions that we need to consider when deciding if a harm paper is valid. 1. Were there clearly defined groups of patients, similar in all important ways other than exposure to the treatment or other cause? 2. Were treatments/exposures and clinical outcomes measured in the same ways in both groups? (Was the assessment of outcomes either objective or blinded to exposure?) 3. Was the follow-up of the study patients sufficiently long (for the outcome to occur and complete)? 4. Do the results of the harm study fulfil some of the diagnostic tests for causation?
o o o o o
Is it clear that the exposure preceded the onset of the outcome? Is there a dose-response gradient? Is there any positive evidence from a 'dechallenge-rechallenge' study? Is the association consistent from study to study? Does the association make biological sense?
Were there clearly defined groups of patients, similar in all important ways other than exposure to the treatment or other cause?
11
Consider the following table:

Adverse Event Totals Present (Case) Exposure to treatment (RCT or cohort) No exposure to treatment (RCT or cohort) Totals a c a+c Absent (Control) b d b+d a+b c+d a+b+c+d
1. This first question is easy to answer if we've been able to find an RCT during our search. Randomisation should make the 2 groups of patients similar for all causes of the outcome that we are interested in. In an RCT, patients in the experimental treatment group would be in cells a or b in the table above and patients in the control group would be in cells c or d. 2. Returning to our clinical scenario, we have been fortunate in our search and have managed to find an RCT and are satisfied that patients are similar in all important ways other than exposure to sotalol. 3. However, there's not always an RCT available to answer our questions and indeed more frequently we find cohort or case control studies to answer our questions about harm and etiology. In a cohort study, 2 groups of patients are followed - one group with the exposure to the treatment (a+b in the table) and one group without the exposure (c+d) - for the development of the outcome of interest (either a or c ). Because the decision about who receives treatment is not randomised, exposed patients may differ from nonexposed patients for important determinants of the outcome (these determinants are called confounders). Investigators should document characteristics of patients and either show that they are similar or adjust for the confounders that they identify. This is limited by the fact that investigators can only adjust for confounders that are known and that have been measured. 4. In case control studies, people with the outcome of interest (cases = a+c) are identified along with those without it (controls = b+d). The proportion of each group who were exposed to the putative agent is assessed. Case control studies are susceptible to more bias than cohort studies because confounders that are transient or that lead to early death won't get measured. We also need to ensure when reading a case control study, that people in the control group had the same opportunity for exposure as people in the case group. For example, if we found a case control study looking at the association between sotalol and sudden cardiac death and its investigators assembled people with sudden cardiac death as the cases but excluded patients with atrial fibrillation from the control group, we'd be concerned that the association found between sotalol and sudden cardiac death could be spurious.
Were treatments/exposures and clinical outcomes measured in the same ways in both groups? (Was the assessment of outcomes either objective or blinded to exposure?)
The application of explicit criteria for the outcomes of interest, a discussion of how they were applied and evidence that they were applied without knowledge of which group the patient was in is important. Blinding is crucial if any judgment is required to assess the outcome (in RCTs and cohorts studies) or the exposure (in case control studies). For example, an unblinded investigator may search more aggressively for outcomes in people with exposure to the putative agent. Similarly, people with the adverse outcome may be more likely to have brooded about their situation and may have greater incentive to recall possible exposure. Therefore we would want patients and interviewers to be blind to the study hypothesis.
12
In the RCT that we retrieved, the outcome was death and was the same for both groups.
Was the follow-up of the study patients sufficiently long (for the outcome to occur and complete)?
If follow-up is short, it may be that too few study patients will have the outcome of interest, thus providing little information of use to a patient. For example, if investigators were looking at the association between cancer and a particular agent and the follow-up time was 1 month, this would be too short for the investigators to see a clinically important effect. The more people who are unavailable for follow-up, the less accurate the estimate of the risk of the outcome is. Losses may occur because patients are too ill (or too well) to be followed or may have died, and the failure to document these losses threatens the validity of the study. The RCT that we found was stopped early because an increased risk of death was noted.
Do the results of the harm study fulfil some of the diagnostic tests for causation?
Is it clear that the exposure preceded the onset of the outcome?
We'd want to make sure that the exposure occurred before the outcome and that it wasn't just a marker that the outcome was already underway. With an RCT, the exposure clearly precedes the outcome as with the trial that we found. If it's a case control study, this question becomes more difficult to answer, and more important to ascertain.
Is there a dose-response gradient?

With larger doses of the agent, was there an increased risk of the outcome event? In the study we retrieved, this wasn't tested since the investigators looked at one dose of sotalol.
Is there positive evidence from a dechallenge-rechallenge study?

This occurs when the outcome event disappears (or decreases in intensity) when the putative agent is withdrawn and reappears when it is reinstituted. This couldn't be done in the RCT we found because the outcome was death.
Is the association consistent from study to study?

Or, is this the only study where the association has been identified? We would be happy to see that several studies have looked at this question and have come to the same conclusion (or even better, if there was a systematic review of the topic). Only 1 RCT has had sufficient power to look at the use of sotalol and the risk of death.
Does the association make biological sense?

If the association between outcome and exposure makes biological sense, a causal relationship is more plausible. The results of the sotalol RCT are consistent with findings from studies that have looked at other antiarrythmics (e.g. CAST).
13
If the study fails any of the above criteria, we need to decide if the flaw is significant and threatens the validity of the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all of the above criteria and we will proceed to assessing it for importance.

What is the magnitude and precision of the association between the exposure and the outcome?
Let's begin by drawing a 2x2 table using the data from the RCT that we found.
Adverse Event Totals Present (Case) 78 Experimental group (d-solatol) a 48 Control group (placebo) c a+c Totals 126 2995 6242 d b+d c+d a+b+c+d b 1524 a+b 1572 Absent (Control) 1471 1549
For RCTs and cohort studies, we look at the risk of the event in the treatment group relative to the risk of the event in the untreated patient. This 'relative risk' is calculated as: RR = [ a/(a+b) ] / [ c/(c+d) ] Using the values in the table, the relative risk for death in patients receiving d-sotalol is: RR = [ 78/1549 ] / [ 48/1572 ] = 1.65 Case control studies sample outcomes, not exposure and therefore we can't calculate the relative risk. Instead, the strength of association is estimated indirectly using the odds ratio = ad/bc. How big should the relative risk (RR) or odds ratio (OR) be for us to be impressed by it? OR and RR > 1 indicate that there is an increased risk of the adverse outcome with the exposure. Because cohort studies and case control studies are susceptible to many biases, we need to ensure that the OR/RR is greater than that which could occur from bias alone. We also need to look at the confidence interval around the OR and RR to see how precise the estimate is. A more clinically useful measure than the OR and RR is the number of patients that we'd need to treat with the putative agent in order to cause 1 additional harmful event (number needed to harm or NNH). Using the OR, the NNH can be calculated as: 14
NNH = [ PEER (OR-1) + 1 ] / [ PEER (OR-1) x (1-PEER) ] Where PEER = the patient's expected event rate Alternatively, we can refer to the tables below for this information. We can see from these tables that for different PEER, the same OR can generate very different NNHs. When OR < 1:
Adapted from John Geddes, 1999
For Odds Ratios LESS than 1 0.9 0.05 0.10 0.20 0.30 Patient Expected Event Rate (PEER) 0.40 0.50 0.70 0.90 40 38 44 101 19 18 20 46 12 11 13 27 9 8 9 18 7 6 6 12 6 5 5 9 4 4 4 4 209 110 61 46 0.8 104 54 30 22 0.7 69 36 20 14 0.6 52 27 14 10 0.5 41 21 11 8 0.4 34 18 10 7 0.3 29 15 8 5
When OR > 1:
For Odds Ratios GREATER than 1 1.1 0.05 0.10 Patient Expected Event Rate (PEER) 0.20 0.30 0.40 212 113 64 50 44 1.25 86 46 27 21 19 1.5 44 24 14 11 10 1.75 30 16 10 8 8 2 23 13 8 7 6 2.25 18 10 7 6 5 2.5 16 9 6 5 5
15
For Odds Ratios GREATER than 1 1.1 0.50 0.70 0.90 42 51 121 1.25 18 23 55 1.5 10 13 33 1.75 8 10 25 2 6 9 22 2.25 5 8 19 2.5 4 7 18
We can also convert the RR to an NNT/NNH using the following equations: For RR < 1 NNT = 1/(1-RR) x PEER For RR > 1 NNT (or NNH) = 1/(RR-1) x PEER Using the PEER (3.1%) from the study we found and the RR (1.65) that we calculated, the NNH for death from dsotalol in the study is: NNH = 1/(1.65-1) x 0.031 = 50 Therefore we would need to treat 50 people with d-sotalol to cause 1 additional death. We can also calculate the confidence interval around this estimate using the inverse of the confidence interval for the absolute risk increase.
Should these valid, potentially important results change the treatment of your patient?
Is your patient so different from those in the study that its results dont apply? What are your patients risks of the adverse event? To calculate the NNH (number of patients you need to treat to harm one of them) for any odds ratio (OR) and your patients expected event rate for this adverse event if they were not exposed to this treatment (PEER): PEER (OR 1) 1 NNH PEER (OR 1) (1 PEER )
16
What are your patients preferences, concerns and expectations from this treatment?
What alternative treatments are available?
17
Tips on critical appraisal of evidence: Prognosis

Clinical scenario: Man with a history of a stroke who is concerned about his risk of seizure

Information about prognosis can come from a variety of study types. Cohort studies (investigators follow 1 or more groups of individuals over time and monitor for the occurrence of the outcome of interest) are the best source of evidence about prognosis. Randomised control trials can also provide information about prognosis although trial participants may not be representative of the population with the disorder. Case-control studies (investigators retrospectively determine prognostic factors by defining the exposure of cases who have already experienced the outcome of interest and of controls who haven't) are useful when the outcome of interest is rare or when the required follow-up is long. The strength of inference that can be drawn from a case-control study is limited because they are more susceptible to bias. Returning to our clinical scenario from the question formulation tutorial: You see a 70 year old man in your outpatient clinic 3 months after he was discharged from your service with an ischemic stroke. He is in sinus rhythm, has mild residual left-sided weakness but is otherwise well. His only medication is ASA and he has no allergies. He recently saw an article on the BMJ website describing the risk of seizure after a stroke and is concerned that this will happen to him. In the tutorial on clinical questions, we formulated the following question: In a 70 year old man does a history of stroke increase his risk for seizure? Our search of the literature to answer this question retrieved an article from the BMJ(1997;315:1582-7). How do we critically appraise this prognosis paper? We'll start by considering validity first and the following list outlines the questions that we need to consider when deciding if a prognosis paper is valid. 1. Was a defined, representative sample of patients assembled at a common (usually early) point in the course of their disease? 2. Was patient follow-up sufficiently long and complete? 3. Were objective outcome criteria applied in a "blind" fashion? 4. If subgroups with different prognoses are identified:
o o
Was there adjustment for important prognostic factors? Was there validation in an independent group of "test-set" patients?
Was a defined, representative sample of patients assembled at a common (usually early) point in the course of their disease?
We hope to find that the individuals included in the study are representative of the underlying population (and reflect the spectrum of illness). But, from what point in the target disorder should patients be followed? Above, we state 'usually early' implying an inception cohort (a group of people who are assembled at an early point in their disease), but clinicians may want information about prognosis in later stages of a target disorder. Thus, a study that
18
assembled patients at a later point in the disease may provide useful information. However, if observations are made at different points in the course of disease for various people in the cohort, the relative timing of outcome events would be difficult to interpret. Thus, the ideal cohort is one in which participants are all at a similar stage in the course of the same disease. Returning to the paper we found, the study included patients who were entered after their first stroke. Further details on entry procedures aren't included in the study.
Was patient follow-up sufficiently long and complete?

Ideally, we'd like to see a follow-up period for a study that lasts until every patient recovers or has one of the other outcomes of interest, or until the elapsed time of observation is of clinical interest to clinicians or patients. If followup is short, it may be that too few study patients will have the outcome of interest, thus providing little information of use to a patient. The more patients who are unavailable for follow-up, the less accurate the estimate of the risk of the outcome. Losses may occur because patients are too ill (or too well) to be followed or may have died, and the failure to document these losses threatens the validity of the study. Sometimes, however, losses to follow-up are unavoidable and unrelated to prognosis. Although an analysis showing that the baseline demographics of these patients are similar to those followed up provides some reassurance that certain types of participants were not selectively lost, such an analysis is limited by those characteristics that were measured at baseline. Investigators cannot control for unmeasured traits that may be important prognostically, and that may have been more or less prevalent in the lost participants than in the followed-up participants. most evidence-based journals of secondary publication (like ACP Journal Club and Evidence Based Medicine) require at least 80% follow-up for a prognosis study to be considered valid. In the study we retrieved, follow-up was sufficiently complete and patients were followed from 2 to 6.5 years.
Were objective outcome criteria applied in a "blind" fashion?

We need to assess whether and how explicit criteria for each outcome of interest were applied and if there is evidence that they were applied without knowledge of the prognostic factors under consideration. Blinding is crucial if any judgement is required to assess the outcome because unblinded investigators may search more aggressively for outcomes in people with the characteristic(s) felt to be of prognostic importance than in other individuals. Blinding may be unnecessary if the assessments are preplanned for all patients and/or are unequivocal, such as total mortality. However, judging the underlying cause of death is difficult and requires blinding to the presence of the risk factor to ensure that it is unbiased. In the study we identified, patients were asked at follow-up if they had a seizure and if they said "yes", a study neurologist subsequently assessed them. It is unclear if the study neurologist was "blind".
If subgroups with different prognoses are identified, was there adjustment for important prognostic factors and was there validation in an independent, "test set" of patients?
We often want to know if patients with certain characteristics will have a different prognosis. For example, are patients with an intracranial hemorrhage at increased risk of seizure? Demographic, disease-specific or comorbid
19
variables that are associated with the outcome of interest are called prognostic factors. They need not be causal but must be strongly enough associated with the development of an outcome to predict its occurrence. The identification of a prognostic factor for the first time could be the result of a chance difference in its distribution between patients with different prognoses. Therefore, the initial patient group in which the variable was identified as a prognostic factor may be considered to be a training set or a hypothesis generation set. Indeed, if investigators were to search for multiple potential prognostic factors in the same data set, a few would likely emerge on the basis of chance alone. Ideally, therefore, data from a second independent patient group, or a "test set" would be required to confirm the importance of a prognostic factor. Although this degree of evidence has often not been collected in the past, an increasing number of reports are describing a second, independent study validating the predictive power of prognostic factors. If a second, independent study validates these prognostic factors, it can be called a clinical prediction guide. In the study we found, the investigators looked at patients with different stroke types and identified that patients in these groups had different risks of seizures. This was not tested in an independent group of patients to see if it holds true. If the study fails any of the above criteria, we need to consider if the flaw is significant and threatens the validity of the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all of the above criteria and we will proceed to assessing it for importance.

How likely are the outcomes over time?
Typically, results of prognosis studies are reported in one of three ways: as a percentage of the outcome of interest at a particular point in time (e.g. 1 year survival rates), as median time to the outcome (e.g. the length of follow-up by which 50% of patients have died) or as event curves (e.g. survival curves) that illustrate, at each point in time, the proportion of the original study sample who have not yet had a specified outcome. From the study we found, the risk of seizure after any type of stroke is 5.7% at 1 year.
How precise is this prognostic estimate?

The precision of the estimate is best reflected by its 95% confidence interval; the range of values within which we can be 95% sure that the population value lies. The narrower the confidence interval, the more precise is the estimate. If survival over time is the outcome of interest, earlier follow-up periods usually include results from more patients than later periods, so that survival curves are more precise (i.e. have narrower confidence intervals) earlier in follow-up. To calculate the 95% confidence interval for the study we identified, we can use the following equation: 95% Confidence Interval = p +/- 1.96 x SE where:
Standard Error (SE) =
20
And 'p' is a proportion of people with the outcome of interest and 'n' is the sample size. From the study, n = 675 and p = 0.057
SE = = 0.009 Therefore the 95% CI is: 0.057 +/- 1.96 x 0.009 = 3.9% to 7.5%
Can you apply this valid, important evidence about prognosis in caring for your patient?
Were the study patients similar to your own? Will this evidence make a clinically important impact on your conclusions about what to offer or tell your patient?
Source : http://ktclearinghouse.ca/cebm/practise/ca
21

Tips On Critical Appraisal of Evidence

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Tips On Critical Appraisal of Evidence

Enviado por

Direitos autorais:

Formatos disponíveis

Tips on critical appraisal of evidence: Diagnosis

Clinical scenario: Elderly woman with possible iron deficiency anaemia

Are the results of this study valid?

Are the results of this study important?

Diagnostic test result (serum ferritin)

Test Positive ( 45 mmol/l)

a+b 150 c+d a+b+c+ d 235

Test Negative (>45 mmol/l)

> 100 mmol/l

Would the consequences of the test help your patient?