Strong (Grade 1) Recommendations
- Authors and editors should make a strong
recommendation when very certain that benefits outweigh risks and burdens (such as difficulties of therapy and costs), or vice versa. - For a strong recommendation, write, "We recommend."
Weak (Grade 2) Recommendations
- Authors and editors should make weak recommendations when risks and burdens appear to be finely balanced, or when there is appreciable uncertainty about the magnitude of benefits and risks.
- For a weak recommendation, write, "We suggest."
Balancing risks and benefits
The first way to interpret strong and weak
recommendations is the one already mentioned:
- A strong recommendation means that benefits
clearly outweigh risks and burdens or vice versa - A weak recommendation means that benefits, risks,
and burdens are closely balanced or uncertain
Patient values
A second way to interpret strong and weak recommendations
is in terms of the importance of individual patient values and preferences:
- A strong recommendation implies that virtually
all informed patients will make the same choice - A weak recommendation implies that values and preferences play a
crucial role in individual decisions
Example
A recommendation might say:
We recommend/suggest that patients suffering an
acute myocardial infarction be treated with aspirin
Should this recommendation be:
(click the best answer)
Example 2
Another recommendation might say:
We recommend/suggest that young patients with idiopathic
DVT discontinue anticoagulation after one year of therapy.
Should this recommendation be:
(click the best answer)
One more interpretation
A third way to interpret strong and weak recommendations is that for typical patients a strong recommendation means, "just do it."
In contrast, a weak recommendation means, "you may want to think about this."
Confidence in estimates
When there is less confidence in estimates of benefits and harms of an intervention, we are more likely to make weaker recommendations.
Recommendations are an important part of UpToDate
- Clear recommendations allow clinicians to follow what we consider best practice and thus improve care
- Recommendations should always be based on evidence
Importance of outcomes
In general, prevention of outcomes with high importance to patients will lead to stronger recommendations.
Which situation is more likely to warrant a strong recommendation?
(click the best answer)
In low risk patients after MI, one may need to treat 100 patients with aspirin to extend one life.
Which of the following are evidence?
A) Randomized clinical trials
B) Nonrandomized trials
C) Observational studies
D) Published case series
E) Clinical experience
(click the best answer) | ||||
A | A and B | A,B,C | A,B,C,D | A,B,C,D,E |
Many early systems for grading methodologic quality relied
primarily on the basic study design—whether the evidence
came from a randomized trial, a cohort study, a case-control
study, or a case series, for example.
The study design maintains a critical role in determining our confidence in estimates of benefits, risks, and burdens of the interventions we are recommending.
Because of the risks of bias and confounding, evidence from observational studies is usually much weaker than that from RCTs.
However, certain factors may increase our confidence in observational evidence or weaken our confidence in evidence from an RCT.
Levels of evidence
UpToDate has chosen a system of grading with three
levels of evidence quality:
- High (Grade A)
- Moderate (Grade B)
- Low (Grade C)
High quality evidence
In general, high quality (Grade A) evidence comes from well-designed and well executed RCTs yielding consistent directly applicable results, or from systematic reviews summarizing the evidence from such RCTs.
However, overwhelming evidence of some other sort (such as observational trials with very large effects) may also be Grade A evidence.
Moderate quality evidence
The moderate quality (Grade B) evidence label is mostly applied to RCTs with important limitations. Often this occurs when there have been inconsistent results among RCTs, or the evidence from the RCTs is not directly applicable to the relevant patient population.
Very strong evidence from other types of studies and observations (such as observational studies with large measures of effect) can also be Grade B.
Low quality evidence
Low quality (Grade C) evidence mainly comes from observational studies. RCTs with very serious limitations can occasionally be Grade C as well.
Note that most observational evidence will be grade C whether it comes from case control studies, cohort studies, or other forms of observation.
Randomized trials
In general, evidence from randomized trials is high quality evidence (Grade A).
What factors lower the quality of evidence from randomized trials?
What is grading?
- Grading is a structured format for describing a recommendation or a study
- Most systems grade two characteristics:
- The strength of the recommendation (how important it is to carry out the recommendation)
- The strength of the supporting evidence
Methodologic flaws
Serious flaws in the conduct of an RCT may lower the quality of evidence.
Such flaws include:
- Large loss to follow-up
- Unblinded studies with subjective outcomes that are highly susceptible to bias
Heterogeneity
Widely differing estimates of treatment effects across studies (heterogeneity or variability in results) typically leads investigators to look for explanations for that heterogeneity, such as differing effects in sicker or healthier patient populations.
If no plausible explanation for heterogeneity can be identified, the grade for the quality of evidence must be reduced even if the underlying RCTs all appear to have been well performed.
Indirectness
Evidence grades may be reduced because the RCTs provide only indirect evidence for the specific recommendation being made. Evidence may be indirect in a number of ways.
Indirect because of different populations
The population of interest may be different from the one studied.
For example:
Compression stockings have been studied for prevention of DVT in a number of populations, but not in trauma patients. The evidence grade on a recommendation about use of stockings in trauma patients might be reduced for indirectness.
Indirect because of different interventions
The intervention being discussed may be different from the one studied. For instance, the dose or preparation of a drug may be different.
For example:
A number of ACE inhibitors have been shown both to decrease blood pressure and to reduce mortality in HF. If a new inexpensive ACE inhibitor is released but has only been studied for hypertension, the evidence for its benefits in HF would be somewhat indirect. The benefits of an antihypertensive agent in another class would be far more indirect.
Indirect because of different outcomes
Studies often look at surrogate endpoints rather than the clinical
endpoints of interest.
For example, they look at a reduction in blood pressure rather than a
reduction in cardiovascular events, or at a decrease in HIV viral load rather than at a reduction in progression to AIDS. When clinical outcomes have not been studied, the evidence is indirect and this may lower the graded quality of evidence.
Few events
Small RCTs may be all that are available for unusual diseases, and these may include very few clinical events. For example, an RCT of a low molecular weight heparin for cerebral venous sinus thrombosis found that 3 of 30 treated patients, and 6 of 29 control patients had a poor outcome. The 38% relative risk reduction was not statistically significant.
The grade for evidence supporting a recommendation for this low molecular weight heparin for cerebral venous sinus thrombosis would need to take into account the small numbers of events.
To review, the general categories that lower the quality
of evidence from RCTs are:
- Methodologic problems likely to cause bias
- Inconsistent results
- Indirectness of evidence
- Few observed events
Observational studies
Evidence from observations and observational studies is generally low quality (Grade C).
What factors can increase the quality of evidence from observational studies?
Magnitude of Effect
The magnitude of the treatment effect is generally the most important factor in assessing the quality of evidence from observational studies.
The results of even well-performed observational studies are
susceptible to bias and confounding when treatment effects are small.
Consider the enormous and well-performed Nurses Health Study, which erroneously concluded that hormone replacement therapy decreased the risk of CHD. The estimate of benefit in the NHS was only about 30%.
A 2002 AHRQ review found that many grading schemes have been developed:
- 20 systems to evaluate systematic reviews
- 49 for RCTs
- 19 for observational studies
- 18 for diagnostic test studies
- 40 for a body of evidence
Very large effects
On the rare occasions when observational studies yield extremely large
and consistent estimates of a treatment effect, we may be confident in the
results.
For example:
Oral anticoagulation in mechanical heart valves has not been compared
to placebo in an RCT. However, observational studies suggest the
probability of a thromboembolic event without anticoagulation is 12.3%
annually in bileaflet prosthetic aortic valves, and higher for other valve
types. Estimates of the relative risk reduction with oral anticoagulation
are in the range of 80%.
While the observational studies are likely to overestimate the true effect,
the weak study design is very unlikely to explain the entire benefit.
An ACCP guideline panel concluded that these data constitute high
quality evidence of the effectiveness of anticoagulation in bileaflet
prosthetic aortic valves.
Effect sizes
As discussed, large effect sizes may promote observational evidence to be moderate
quality, and very large effect sizes may promote observational evidence to be high
quality.
In general, for consistent, well-obtained observational evidence:
- Large effect sizes are at least two-fold to
three-fold relative effects (ie, a relative risk of 2.0-3.0 or 0.33-0.5) - Very large effect sizes are at least five-fold to ten-fold relative
effects (ie, a relative risk of 5.0-10.0 or 0.1-0.2)
Direction of biases enhance observation
On other occasions, all
plausible biases from observational studies may be working to underestimate an apparent treatment
effect.
For instance, a rigorous systematic review of observational studies
compared for-profit and not-for-profit hospital care and found higher
death rates in private for-profit hospitals. The investigators postulated
two possible sources of bias. First, residual confounding from disease
severity was possible, but patients in NFP hospitals were sicker than
those in FP hospitals, so if there were residual confounding it would bias
the results against NFP hospitals. Second, a higher number of patients
with excellent private insurance could lead to more hospital resources
that would spill over to benefit those without such coverage. Again, this
bias would tend to be against NFP hospitals that are likely to admit
a lower proportion of well-insured patients.
Because the plausible biases would diminish the demonstrated effect,
one might consider the evidence from these observational studies as
moderate rather than low quality.
Dose response
Another factor that could increase the quality of
evidence from observational studies is when a clear dose response effect is
seen. In such a circumstance the evidence might increase from low to moderate
quality.
To review, the factors that may raise the quality of evidence from observational studies are:
- Large magnitude of effect
- All plausible biases would reduce a demonstrated effect
- Dose-response gradient
Putting it together to grade a recommendation
- Start with PICO — define:
- Population
- Intervention
- Comparator
- Outcomes
- Summarize the relevant evidence
- If RCTs, start by assuming high quality, but then grade down for:
- Serious methodologic limitations
- Indirectness in population, intervention, or outcome
- Inconsistent results
- Imprecision in estimates
- High likelihood of publication bias
- If no RCTs, start by assuming low quality, but then grade up for:
- Large or very large treatment effects
- All plausible biases would diminish the effect of the intervention
- Dose response gradient
Once you have worked through the evidence:
- Decide on the best estimates of benefits, risks, burdens and costs for the relevant population
- Decide whether the benefits are, overall, worth the risks, burdens and costs
- Grade the recommendation as strong or weak, keeping the following in mind:
- Weak evidence will only rarely warrant a strong recommendation
- If unsure it is better to make a weak recommendation
Colchicine and pericarditis
Observational studies suggested that colchicine might prevent recurrence of acute idiopathic/viral pericarditis.
Consider PICO:(Population, Intervention, Comparator, Outcomes)
- Population: Patients with acute pericarditis being treated with an NSAID
- Intervention: Oral colchicine in addition to an NSAID
- Comparator therapy: An NSAID alone
What are the important outcomes we would want to know about from studies of colchicine and pericarditis?
A. Rates of recurrence
B. Time to improvement of pericarditis
C. Gastrointestinal toxicity and rates of discontinuation of therapy
D. Rates of sudden cardiac death
(click the best answer) | ||||
A | A and B | A,B,C | A,B,C,D | |
Picking the right grading system
for UpToDate:
- Many grading systems are not well designed for clinical recommendations because they are:
- Focused on individual studies
- Complex
- Many grading systems use a rigid hierarchy of evidence (RCTs, cohort, case-control, case series) that is simplistic and misunderstands the meaning of evidence
COPE Trial
In addition to observational
evidence, the COPE trial was published in 2005. This was a randomized trial performed in 120 patients with a first episode of
acute pericarditis. Patients were treated with aspirin either alone or in
combination with colchicine. The following findings were noted:
- Recurrence rate lower with colchicine (11 versus 32 percent)
- Lower rate of persistent symptoms at 72 hours (12 versus 37 percent)
- Colchicine discontinued for diarrhea in 8 percent of patients
Grade the evidence
We want to make a
recommendation about whether patients with acute idiopathic pericarditis
who are treated with NSAIDs should also receive colchicine.
Before we decide on the strength of the recommendation, we need to
decide the quality of the evidence for colchicine in this setting.
Is the evidence:
(click the best answer)
Strength of recommendation
Now that we have
decided we have moderate quality evidence, we want to grade the strength
of the recommendation for colchicine in acute pericarditis.
Should this recommendation be:
(click the best answer)
Writing the recommendation
- We've decide we have moderate quality (Grade B)
evidence - We've decided to make a weak recommendation
If the authors decide to
recommend use of colchicine in this setting, we can
write:
"In patients with acute pericarditis that is idiopathic or due to
a presumed viral infection and who are being treated with NSAIDs, we
suggest the addition of colchicine (Grade 2B). A typical dose of colchicine is 1
to 2 mg on the first day, followed by 0.5 mg once or twice daily for three months. Patients who are concerned about gastrointestinal side effects might
reasonably choose not to take colchicine."
Note that if we want to make a recommendation for NSAID therapy in this
situation, we should evaluate the evidence and grade that recommendation
separately. Note also that we did not include the dose of colchicine in
the graded recommendation, since it is very unlikely that we have high
quality evidence for a specific dose and regimen of colchicine. We do not
usually grade doses or regimens.
Some final points to keep in mind:
- Nearly all recommendations for treatment and
screening that appear in the Summary and Recommendations section of an
UpToDate topic should be graded. - We are not grading
diagnostic recommendations at this point. - We also do not need to
grade "slam dunk" recommendations where there is no reasonable
alternative course of action, or safety tips for procedures. For
instance, we would not grade a recommendation to administer oxygen to
someone who is severely hypoxic. - There will be
situations in which reasonable people can disagree about the quality of
the evidence and the strength of the recommendation. We do not need to
be perfect about evidence grades, just transparent. - When in doubt, make
weak recommendations (Grade 2) rather than strong recommendations (Grade
1).
Gordon Guyatt and the GRADE collaborative
- In 2001 we began a collaboration with Gordon Guyatt and some of his colleagues from McMaster University
- Dr. Guyatt coined the term "evidence-based medicine" in the early 1990s and has since become a world leader in this area
- We worked with Dr. Guyatt and his colleagues at the GRADE
collaborative to implement a grading system for UpToDate - The GRADE working group is an international
collaboration begun in 2000, and aimed at developing a sensible and broadly accepted approach to
grading quality of evidence and strength of recommendations
The selected system, GRADE, is straightforward and
works well for clinical recommendations:
- There are two levels of recommendation strength (1 or 2)
- Strong
- Weak
- There are three levels of quality of evidence (A,B,C)
- High
- Moderate
- Low
We are grading treatment and screening
recommendations, but not diagnostic recommendations. We should still make
recommendations about diagnosis, but should not grade these recommendations since we don't feel that
there is currently a good system for grading
the quality of evidence for diagnostic strategies.
Adoption of GRADE
- GRADE is being used by more and more groups including:
- American College of Physicians
- The Cochrane Collaboration
- American College of Chest Physicians
- European Society of Thoracic Surgeons
- American Thoracic Society
- Endocrine Society
- Agency for Healthcare Research and Quality
- Society of Critical Care Medicine
Grading Recommendations in UpToDate
As mentioned, GRADE classifies recommendations into two levels
- Strong recommendations (Grade 1)
- Weak recommendations (Grade 2)
Actually, all of these are types of evidence.
Randomized trials usually provide the highest quality
evidence, but observational studies and sometimes
even unpublished clinical experience can provide
high quality evidence.
Actually, all of these are types of evidence.
Experimental trials generally provide higher quality evidence than observational studies and unpublished clinical experience, but sometimes even clinical experience can provide high quality evidence.
Actually, all of these are types of evidence.
Experimental and observational studies often provide higher quality evidence than published case series and unpublished clinical experience, but sometimes even clinical experience can provide high quality evidence.
You're correct that all these kinds of publications are types of evidence, but so is clinical experience.
Clinical experience is often very low quality evidence, but sometimes even unpublished clinical experience can provide high quality evidence.
Grade of Recommendation | Clarity of risk/benefit | Quality of supporting evidence | Implications |
1A. Strong recommendation. High quality evidence |
Benefits clearly outweigh risk and burdens, or vice versa |
Consistent evidence from well performed randomized, controlled trials or overwhelming evidence of some other form. Further research is unlikely to change our confidence in the estimate of benefit and risk. |
Strong recommendation, can apply to most patients in most circumstances without reservation |
1B. Strong recommendation. Moderate quality evidence |
Benefits clearly outweigh risk and burdens, or vice versa |
Evidence from randomized, controlled trials with important limitations (inconsistent results, methodologic flaws, indirect or imprecise), or very strong evidence of some other form. Further research (if performed) is likely to have an impact on our confidence in the estimate of benefit and risk and may change the estimate. |
Strong recommendation, likely to apply to most patients |
1C. Strong recommendation. Low quality evidence |
Benefits appear to outweigh risk and burdens, or vice versa |
Evidence from observational studies, unsystematic clinical experience, or from randomized, controlled trials with serious flaws. Any estimate of effect is uncertain. |
Relatively strong recommendation; might change when higher quality evidence becomes available |
2A. Weak recommendation. High quality evidence |
Benefits closely balanced with risks and burdens | Consistent evidence from well performed randomized, controlled trials or overwhelming evidence of some other form. Further research is unlikely to change our confidence in the estimate of benefit and risk. |
Weak recommendation, best action may differ depending on circumstances or patients or societal values |
2B. Weak recommendation. Moderate quality evidence |
Benefits closely balanced with risks and burdens, some uncertainly in the estimates of benefits, risks and burdens |
Evidence from randomized, controlled trials with important limitations (inconsistent results, methodologic flaws, indirect or imprecise), or very strong evidence of some other form. Further research (if performed) is likely to have an impact on our confidence in the estimate of benefit and risk and may change the estimate. |
Weak recommendation, alternative approaches likely to be better for some patients under some circumstances |
2C. Weak recommendation. Low quality evidence |
Uncertainty in the estimates of benefits, risks, and burdens; benefits may be closely balanced with risks and burdens |
Evidence from observational studies, unsystematic clinical experience, or from randomized, controlled trials with serious flaws. Any estimate of effect is uncertain. |
Very weak recommendation; other alternatives may be equally reasonable. |
Grading Recommendations in UpToDate®
This tutorial was created for UpToDate authors, section editors, and peer reviewers, but it is available for anyone who wants to learn about GRADE. We expect it will take you about 30 minutes to complete.
Before you start, we suggest you print this Grading Table.
Actually, the recommendation for aspirin after MI is likely to be stronger.
Even though many more patients require aspirin therapy than pulmonary rehab to improve one outcome, the outcome of preventing death is more important to most patients than mild relief of dyspnea.
Correct! The recommendation for aspirin after MI is likely to be stronger.
Even though many more patients require aspirin therapy than pulmonary rehab to improve one outcome, most patients place a higher value on preventing death than
on mild relief of dyspnea.
This is probably not the best choice
We have an apparently well-performed randomized
trial in exactly the population we are interested in. The trial
examined all the important outcomes
we were concerned about. However, the number of events was small, and so
our confidence in the rates of events with and without colchicine must
be reduced. This is best graded as moderate quality (Grade B) evidence.
Right!
We have an apparently well-performed randomized trial in exactly the
population we are interested in. The trial examined all the important
outcomes we were concerned about. However, the number of events
was small, and so our confidence in the rates of events with and
without colchicine must be reduced. This is best graded as
moderate quality (Grade B) evidence.
No, the quality of evidence is better than that.
We have an apparently well-performed randomized trial in exactly the
population we are interested in. The trial examined all the important
outcomes we were concerned about. Because the number of
events was small, and our confidence in the rates of events with
and without colchicine must be reduced, we can downgrade one level from A
to B. This is best graded as moderate quality (Grade
B) evidence.
Yes, but we would like additional information as well.
While lowering the rate of recurrence is the primary goal of
administering colchicine, we might reasonably wonder about the effects of
this therapy on the time course of the pericarditis.
We would also want to know about gastrointestinal toxicity, a common
side effect of colchicine, and rates of discontinuation of
therapy.
Yes, but we would like additional information as well.
Knowing the potentially beneficial effects of therapy on recurrence
rates and time to resolution is clearly important. However, we also want
to know about likely downsides to therapy.
We would also want to know about gastrointestinal toxicity, a common
side effect of colchicine, and rates of discontinuation of
therapy.
Yes, that's correct.
We want to know about the potential benefits of
therapy as well as the likely side effects of therapy.
We probably do not need information on sudden cardiac death.
Knowing the potentially beneficial effects of therapy on recurrence
rates and time to resolution is clearly important, and we also want to
know about likely side effects of therapy such as gastrointestinal
toxicity.
Acute viral pericarditis is generally a benign condition with a very
low risk of death. Similarly, colchicine in other settings (such as acute
gout) has not been associated with sudden death. As such, it would likely
take an enormous study to evaluate any effect of colchcine on sudden
cardiac death. We should be able to evaluate the evidence without
requiring additional information on sudden death.
Whatever
recommendation we make (for or
against colchicine) should probably be weak for two reasons:
First, our confidence in certain patient-important outcomes was reduced
by the small number of events in the study.
Second, colchicine appeared
to decrease the rate of recurrent
pericarditis by about 20 percent, but required an extra pill that could be
expected to cause diarrhea in many patients. In the COPE trial 8 percent
of patients stopped the study pill because of diarrhea, and patients
outside of clinical trials often have worse problems
with side effects than those in such trials.
While
some patients would likely be better off being treated with colchicine, a
substantial subset of fully-informed patients might be expected to
decline treatment with colchicine.
In this situation, a weak recommendation for or against colchicine is the best
choice.
Right!
This should be a weak recommendation for two reasons:
First, our confidence in certain patient-important outcomes was reduced
by the small number of events in the study.
Second, colchicine appeared to decrease the rate of recurrent
pericarditis by about 20 percent, but required an extra pill that could be
expected to cause diarrhea in many patients. In the COPE trial 8 percent
of patients stopped the study pill because of diarrhea, and patients
outside of clinical trials often have worse problems with side effects
than those in such trials.
While some patients would likely be better off being treated with
colchicine, a substantial subset of fully-informed patients might be
expected to decline treatment with colchicine.
In this situation, a weak recommendation for or against colchicine
is the best choice.
Strong, Right!
Short-term aspirin reduces the relative risk of death after MI by approximately 25% with minimal side effects and very low cost. If they understood the choice they were making, virtually all patients suffering an MI would choose to receive aspirin.
No, it should be a weak recommendation.
Long-term treatment with warfarin will decrease the risk of recurrent DVT (by about 10% per year), but warfarin therapy has burdens of taking a daily pill, maintaining a constant dietary intake of vitamin K, monitoring INR, and carries the increased risk of minor and major bleeding. Patients who strongly prefer to avoid the risk of DVT may choose to continue warfarin. Others are likely to consider the benefit not worth the risks and inconvenience.
No, it should be a strong recommendation.
Short-term aspirin reduces the relative risk of death after MI by approximately 25% with minimal side effects and very low cost. If they understood the choice they were making, virtually all patients suffering an MI would choose to receive aspirin.
Weak, Right!
Long-term treatment with warfarin will decrease
the risk of recurrent DVT (by about 10% per year), but warfarin therapy
has burdens of taking a daily pill, maintaining a constant dietary intake
of vitamin K, monitoring INR, and carries the increased risk of minor and
major bleeding. Patients who strongly prefer to avoid the risk of DVT may
choose to continue warfarin. Others are likely to consider the benefit not
worth the risks and inconvenience.