Skip main navigation

Fragility Index in Cardiovascular Randomized Controlled Trials

Originally published Cardiovascular Quality and Outcomes. 2019;12:e005755



    Efficacy of an intervention is commonly evaluated using P values, in addition to effect size measures such as absolute risk reduction, relative risk reduction, and numbers needed to treat. However, these measures are not always intuitive to clinicians. The fragility index (FI) is a more intuitive number that can facilitate interpretation but can only be used with binary outcomes. FI is the minimum number of patients who must be moved from the nonevent group to the event group to turn a significant result nonsignificant. In this retrospective analysis, we assessed the robustness of cardiovascular randomized controlled trials (RCTs), which report a positive (statistically significant) primary outcome by using the FI.

    Methods and Results:

    We searched Medline from 2007 to 2017 to identify cardiovascular RCTs published in 6 high impact journals (The Lancet, New England Journal of Medicine, Journal of the American Medical Association, Circulation, Journal of the American College of Cardiology and European Heart Journal). Only RCTs with sample sizes >500 and a 2-by-2 factorial design or dichotomous primary outcomes were selected. FI was calculated using a defined approach. Among the cohort of 123 RCTs that met inclusion criteria, median FI was 13 (interquartile range, 5–26). In 28 trials (22.8%), FI ranged between 1 and 4. In 37 trials (30.1%), number of patients lost to follow-up was higher than the FI. Pharmaceutical interventions had higher FI compared with other interventions, FI=19 (7–52; P=0.002). Median FI varied according to subspecialty (electrophysiology=2; heart failure=11; interventional cardiology=8; P=0.020) and multiregional RCTs had higher FI=22 (12–53.25; P=0.023). FI did not differ based on risk of bias indicators, funding, or publication year.


    Considerable variations in FI were observed among cardiovascular trials, suggesting the need for careful interpretation of results, particularly when number of patients lost to follow-up exceeds FI.

    What Is New?

    • This is the first study to assess the robustness of cardiovascular randomized controlled trials using the fragility index (FI).

    • More than one-third of cardiovascular randomized controlled trials in 6 top-tier journals had FI below 8, suggesting that the results would importantly change if a handful of patients had different outcomes.

    • In ≈30% of randomized controlled trials, patients lost to follow-up were greater than the FI.

    What Are the Clinical Implications?

    • The FI is an easy to calculate index that provides clinicians and readers with an alternative, more intuitive way to understand the precision of trial results in addition to traditionally used metrics.

    • The comparison of the FI to numbers lost to follow-up can help patients and clinicians understand the uncertainty of the evidence and subsequently make more informed decisions when evaluating the efficacy and safety of an intervention.


    See Editorial by Seligman et al

    Randomized controlled trials (RCTs) are considered the most reliable source of evidence by which to guide clinical care.1 Therefore, it is critical for a trial to report results in a manner that is easy to interpret. Summary estimates are traditionally reported with 95% confidence intervals, whereby a P≤0.05 is considered to be of sufficient statistical significance to reject the null hypothesis. However, in some cases, a different P threshold might also be used. Nevertheless, such statistics are subject to misinterpretation, especially when event rates and sample sizes are small and follow-up times are limited.2,3 Multiple factors such as power of the trial to detect outcome differences, overall event rates, patient loss to follow-up, and the subject’s exposure time to the intervention for a relatively rare outcome to emerge can greatly affect the P value.4,5 The fragility index (FI) has emerged as an innovative tool to ease interpretation of statistical findings, which may provide additional value over the commonly reported P value, risk reductions, and confidence interval.6,7 However, nondichotomous outcomes cannot be assessed using this index. The FI is defined as the minimum number of patients who must be moved from the nonevent group to the event group to turn a significant result to a nonsignificant one.1 While the FI of trials in multiple specialties has been assessed, individual trials in all fields including cardiovascular medicine have been slow to adopt the use of FI.6–13 To fill this knowledge gap, we conducted this study to assess the FI of cardiovascular RCTs published in high-impact journals.


    The authors declare that all supporting data are available within the article and its Data Supplement.

    Data Sources

    In this retrospective analysis, we conducted a Medline search using the search specification: (“Lancet (London, England)”[Journal] OR “The New England journal of medicine”[Journal]) OR “Journal of the American College of Cardiology”[Journal]) OR “Journal of the American Medical Association”[Journal] OR “Circulation”[Journal] OR “European heart journal”[Journal] AND (Randomized Controlled Trial[ptyp] AND (“2007/01/01”[PDAT]: “2017/12/31”[PDAT])). No search restrictions were applied. Since publicly available data were utilized, institutional review board approval was not applicable.

    Journal Selection

    Journals were selected for the present study based on a combination of the following features: impact factor, range of circulation of studies, specialization in publication of cardiovascular RCTs, and global recognition for consistent publication of influential RCTs over the last several decades. The New England Journal of Medicine, The Lancet and Journal of the American Medical Association were selected for having the highest impact factors in General Medicine, while Journal of the American College of Cardiology, European Heart Journal, and Circulation were selected for having the highest impact factors in the field of cardiovascular medicine and being acclaimed for focusing primarily on the publication of cardiovascular studies.

    Study Selection

    All RCTs were assessed for inclusion from the 3 cardiovascular journals, namely Circulation, Journal of the American College of Cardiology, and European Heart Journal. RCTs from The Lancet, New England Journal of Medicine, and Journal of the American Medical Association, the 3 non-cardiovascular journals, were first screened at abstract and title level for determination of possible cardiovascular nature. RCTs which mentioned an intervention, outcome, or recruited sample population as being cardiovascular (related to any of the following cardiovascular clinical subspecialties: heart failure, interventional cardiology, preventive cardiology, electrophysiology, cardiac imaging, and other) or related to stroke, anywhere in the title or abstract, were included.

    The following eligibility criteria were applied: (1) phase 3 or 4 RCTs studying a cardiovascular intervention or outcome; (2) sample size ≥500 patients (an arbitrary cutoff to delineate larger trials that are more likely to impact practice; (3) parallel arm study design; (4) at least 1 statistically significant binary outcome or time-to-event outcome. At abstract and title level screening, letters, editorials, systematic reviews/meta-analyses, opinions, observational studies, economic/cost effective analyses of RCTs, cohort nonrandomized studies, quasi-randomized trials, and post hoc/secondary analyses of previously reported RCTs were excluded.

    Data Extraction

    Two reviewers (Drs Ochani and Shaikh) independently screened all the RCT abstracts based on a priori eligibility criteria. Data were extracted on a prespecified data collection form, and discrepancies were resolved by referring to a third investigator (Dr Khan). The data collection form focused on details of statistically significant primary outcomes, including outcome type, primary study outcome, event rates and sample sizes of comparative groups, number of participants lost to follow-up, and follow-up duration of each trial. Following the statistical hierarchy, prespecified primary outcome was given preference if a trial had reported multiple outcomes as statistically significant.

    Additional data were also extracted for the location of the trial (Asia, Europe, North America, multiple countries, and other), blinding (double-blind, unblinded, single-blinded), centers (single center, multicenter), type of intervention (pharmaceutical, surgical and other), control (placebo-controlled, active comparator), type of funding (government or private), and whether the trial was conducted on intention to treat principle.

    Statistical Analysis

    The Statistical Package for the Social Sciences (v.23, International Business Machines Corporation, New York, NY) was used to conduct the analysis. Data from each trial were presented on a 2-by-2 contingency table, and the FI was calculated in the manner described by Walsh et al.6 For time-to-event outcomes, total number of events in each group over the entire follow-up time was included.6 Events were added to the smaller event group and nonevents were simultaneously subtracted, while maintaining a constant patient population. The Fisher exact test was then used to recalculate the 2-sided P value, while iteratively adding of events until the P value reached or exceeded 0.05. The number of additional events required to reach a P of ≥0.05 was defined as the FI. Loss to follow-up was compared with FI for each trial, as it affects both the number of study participants at risk and the number of recorded events.1 We reported the overall FI, FIs by subgroups (clinical subspecialty, year of publication and country of origin, overall sample sizes, and event rates per study outcome) as medians with their interquartile ranges. We used the Pearson correlation to assess for a relationship between FI and impact factor. The Kruskal-Wallis test was applied to find possible statistically significant relationships between FI and nominal variables when there were >2 groups as opposed to the Mann-Whitney U test when there were 2 groups. In addition, we also used the Pearson correlation to assess for a relationship between sample size of the RCT and FI. Last, we conducted a descriptive analysis of additional data that were collected and compared frequencies for each item by journal and median FIs of those journals. The FI is an absolute measure, and it does not account for the sample size; thereby making it difficult to compare fragility of different RCTs or set a standard value that can be defined as fragile results. To overcome this, we calculated the fragility quotient (FQ), as previously reported in the literature,14–17 which is the FI divided by the sample size. This would enable us to see what proportion of events must be moved to nonevents to make the results nonsignificant.


    Figure 1 shows the detailed literature search process. Out of 4994 search results, 123 RCTs met the eligibility criteria (Data Supplement). The Table displays the number of RCTs in each subgroup, along with median FIs. The range of sample size in the analyzed trials was 500 to 50 156. Of the 123 RCTs, 59 (48%) were published in New England Journal of Medicine, 14 (11.4%) in Journal of the American College of Cardiology, 24 (19.5%) in The Lancet, 9 (7.3%) in Journal of the American Medical Association, 12 (9.8%) in Circulation, and 5 (4.1%) in European Heart Journal. All included trials used a P of ≤0.05 as the threshold for significance.

    Table. Number of RCTs in Each Subgroup, Along With Their Median Fragility Index

    SubgroupNumber of TrialsMedian Fragility Index (Interquartile Range)P Value
    JournalNew England Journal of Medicine5917.00 (6.00–34.00)0.440
    The Lancet2411.50 (5.00–21.50)
    Journal of American Medical Association98.00 (2.50–19.00)
    Journal of the American College of Cardiology147.50 (4.75–23.00)
    European Heart Journal521.00 (10.00–27.50)
    Circulation1212.50 (2.25–23.50)
    Country of originMultiple countries4422.00 (12.00–53.25)0.023*
    United States2011.00 (4.25–35.00)
    Europe458.00 (3.00–20.25)
    Asia811.00 (4.25–25.25)
    Others511.00 (4.50–21.50)
    Blinding statusSingle-blinded228.00 (5.75–21.25)0.097
    Double-blinded5019.00 (5.50–56.50)
    Unblinded309.00 (4.75–18.50)
    Not reported2115.00 (3.50–23.50)
    CentersSingle center118.00 (3.00–19.00)0.170
    Multicenter11014.50 (5.00–28.25)
    Not reported25.50 (…)
    Trial phasePhase 39912.00(4.00–26.00)0.733
    Phase 42414.50 (6.00–26.00)
    Control groupPlacebo5113.50 (5.00–31.50)0.747
    Active7212.00 (5.00–25.00)
    InterventionSurgical258.00 (4.50–13.00)0.002*
    Pharmaceutical7119.00 (7.00–52.00)
    Other277.00 (3.00–20.00)
    Funding typeGovernment259.00 (3.50–19.00)0.136
    Private9117.00 (6.00–34.00)
    No funding29.50 (…)
    Not reported54.00 (1.00–88.50)
    Treatment principleIntention to treat used10512.00 (5.00–25.00)0.319
    Intention to treat not used1814.50 (7.50–32.25)

    RCT indicates randomized controlled trial.

    *Statistically significant P value for Mann-Whitney U test or Kruskal-Wallis test.

    Figure 1.

    Figure 1. Details of the literature search. RCT indicates randomized controlled trial.

    The median FI was 13 (5–26). The median total sample size was 2466 (1005–7513), of which 1035 (485–3264) were controls and 1043 (499–3241) were interventions. The median total event rate was 241 (92–605), 128 (51–355) for controls, and 102 (40–262) for interventions. The median follow-up time was 12 months (3.5–36), while the median loss to follow-up was 17 (3–32.5).

    Figure 2 shows the number of RCTs within each range of FI. We observed that 13% of the RCTs (n=16) had an FI between 1 and 2, 9.8% (n=12) between 2 and 4, 17.1% (n=21) between 4 and 8, 17.1% (n=21) between 8 and 16, 22.8% (n=28) between 16 and 32, 8.9% (n=11) between 32 and 64, 9.8% (n=12) between 64 and 128, while 1.6% (n=2) had FI between 128 and 256. In 37 RCTs (30.1%), patients lost to follow-up were greater than the FI. In the studies with loss to follow-up greater than the FI, the median FI was 8 (3.5–15.5) and the median loss to follow-up was 29 (17–115.5).

    Figure 2.

    Figure 2. Number of randomized controlled trials (RCTs) within each range of fragility index.

    FI and impact factor (r=−0.064; P=0.485) were not statistically related; however, FI and sample size demonstrated a weak statistically significant positive correlation (r=0.324; P<0.001; Figure 3). The median FQ was 0.0042 (0.0020–0.0110). The median FQ of studies with a FI lesser than loss to follow-up was 0.0031 (0.0010–0.0072), while the median of FQ of the remaining studies was 0.0050 (0.0023–0.0127).

    Figure 3.

    Figure 3. Scatterplot of fragility index against sample size. *The x axis of the graph is a logarithmic scale of the sample size.

    We found no differences in the FI among journals: FI was 21 (10–27.5) for European Heart Journal, 17 (6–34) for New England Journal of Medicine, 12.5 (2.25–23.5) for Circulation, 11.5 (5–21.5) for The Lancet, 7.5 (4.75–23) for Journal of the American College of Cardiology, and 8 (2.5–19) for Journal of the American Medical Association. These differences did not demonstrate statistical significance (P=0.440). Multiregional RCTs were the most robust, FI=22 (12–53.25). Trials conducted exclusively in Asia and other countries had findings that were as robust (FI=11 [4.25–25.25] and FI=11 [4.5–21.5], respectively) as trials conducted in United States (FI=11 [4.25–35]; P=0.023).

    Results of RCTs based on study design, double-blind, single-blind and unblinded, did not differ in degree of robustness (FI=19 [5.5–56.5], FI=8 [5.75–21.25] and FI=9 [4.75–18.5], respectively; P=0.097). Locations where RCTs were conducted, multicenter or single center, did not have an impact on the degree of their robustness (FI=14.5 [5–28.25], FI=8 [3–19], respectively; P=0.170). The type of the trial, placebo controlled or active-comparator, also did not predict a difference in robustness (FI=13.5 [5–31.5] and FI=12 [5–25], respectively; P=0.747).

    Intervention wise, trials reporting surgical interventions (FI=8 [4.5–13]), and other interventions, such as offering financial incentives, etc (FI=7 [3–20]) had less robust results, whereas RCTs with pharmaceutical interventions were considerably more robust, FI=19 (7–52; P=0.002).

    Clinical subspecialty analysis yielded substantial differences in FIs. Studies in preventive cardiology had the most robust results (FI=19 [8.5–43]), followed by electrophysiology (FI=12 [7–22]), heart failure (FI=11 [3.25–51.5]), interventional cardiology (FI=8 [5–19]), and other (FI=3 [1.75–13.25]; P=0.020).


    Main Findings

    In a cross-sectional view of positive studies published in top-tier medicine and cardiovascular journals, we observed considerable variation in the robustness of clinical findings. Our 2 main findings were that nearly 1 in 4 studies had a low FI (between 1 and 4), and in 37 trials (30.1%) the number of patients lost to follow-up was higher than the FI.

    Our observed median FI among cardiovascular trials was higher than those previously reported in other subspecialties, including critical care medicine,8 spinal surgery,9 sports medicine,10 and anesthesiology,12 all of which had FIs <5. The number of cardiovascular trials in which patients lost to follow-up exceeds the FI also compares favorably to studies from other fields, which had trials with loss to follow-up >40% to 70%.6,9,11,13

    Although it is encouraging that cardiovascular trials compare well to trials in other fields,18 there is room for improvement in the design of cardiovascular trials, considering the validity of outcomes can be disturbed by a mere change of a median ≈13 events across these trials.13 Furthermore, the FQ was relatively low (0.0042); meaning that the study’s outcome would be nonsignificant if just 0.4 patients per 100 experienced a different event. Statistical significance, in the form of simple P value, can be misleading and may represent a disconnect with clinical impact of the intervention.19 While experienced clinicians and researchers may have a more in-depth interpretation of the results, some readers likely rely on P value, as the P value remains the primary value around which investigators base their conclusions.20 Furthermore, some readers may not be familiar with the interpretation of other reported statistics such as power of the trial to detect estimate differences, event rates, patient loss to follow-up, and follow-up duration (allowing a sufficient time window for outcome differences to emerge), which can greatly affect the statistical significance of the results.4,5

    Examples of Fragile Results in Cardiovascular Trials

    McIntyre et al21 provide a good example of statistically fragile results when considering the most recent trials of patent foramen ovale closure versus medical therapy for the secondary prevention of cryptogenic stroke. Two of the 3 positive trials (REDUCE [Patent Foramen Ovale Closure or Antiplatelet Therapy for Cryptogenic Stroke] and RESPECT [Randomized Evaluation of Recurrent Stroke Comparing PFO Closure to Established Current Standard of Care Treatment]) had fragility indices less than the number of patients lost to follow-up. In the long-term publication of RESPECT, the FI for the primary end point was 1, but there were 56 (11%) patients lost to follow-up in the active arm and 67 (14%) of patients lost to follow-up in the control arm. Similarly, in REDUCE, the 4% absolute risk reduction in the primary end point in the active arm came with a FI of 5, which is considerably lower than the 22 total patients lost to follow-up in both arms.22,23

    In the CSPPT (China Stroke Primary Prevention Trial), which enrolled 20 702 participants with a history of hypertension and no previous cardiac events, Huo et al found that the addition of 0.8 mg folic acid to a regimen of 10 mg enalapril significantly reduced the risk of the primary outcome (first stroke) by 0.7% (2.7% versus 3.4%). The FI of this trial was calculated as 23. However, the numbers lost to follow-up (n=68), were almost 3× greater than the FI.24

    The ARISTOTLE trial (Apixaban for Reduction in Stroke and Other Thromboembolic Events in Atrial Fibrillation; 18 201 participants) discovered that a twice-daily dose of 5 mg apixaban was superior to warfarin (target international normalized ratio, 2.0–3.0) in significantly reducing the primary outcome of hemorrhagic stroke, ischemic stroke, or systemic embolism in atrial fibrillation by 0.33% per year (1.27% versus 1.60%). The FI of this trial calculates as 12; however, the number of subjects lost to follow-up (69) exceed the number of events needed to change the results from significant to nonsignificant by almost 6 times.25

    We found no statistically significant associations between the FI and journal of publication, its impact factor, or the year of publication. This suggests that cardiovascular RCTs published in top-tier journals over the last decade have similar degrees of robustness. We did observe higher fragility indices from trials performed on many continents and trials within the subspecialty of preventive cardiology.

    Our findings compare well to a previous trial from Docherty et al26 who assessed the FI of clinical trials underpinning heart failure guidelines. Their analysis, which included trials from the 1980s, found a median FI of 26 (range 0–118), an FI of <10 in 35% of trials, and FI less than the number lost to follow-up in 20% of trials.

    Practical Implications

    The FI is an easy to calculate index that any clinician can use. It provides readers with an alternative way to evaluate robustness of trial results in addition to the P value and effect size estimates. In 2016, the American Statistical Association released a statement on the context, process, and purpose of P values.27 Their concluding sentence was that “no single index should substitute for scientific reasoning.”

    Thus, the FI and its comparison to the numbers of patients lost to follow-up offers users of the medical evidence (patients and clinicians alike) another way to assess the efficacy and safety of a therapeutic intervention. Knowing that the number of patients lost to follow-up exceeds the number of events needed to turn a pivotal trial to statistically insignificant results can play an important factor in decision making. Shared decision-making tools could incorporate the FI as a way to assist in conveying the level of certainty, when trading benefits and harms.28 It must be remembered that a trial with a high FI does not necessarily indicate a robust result—if the loss to follow-up exceeds FI, the results still need to be interpreted with caution. Similarly, it is crucial to realize that a low FI does not mean that the results of an RCT be considered as trivial. RCTs are performed with finite resources and are thus constructed to maintain an equilibrium between sample size and anticipated efficacy. This will, by design, lower the FI making the results depend on fewer events. Multiple factors including trial design and elimination of biases should also be taken into account, in addition to the FI, when interpreting results.29


    Our study has limitations. First, we only included trials that were of 2-by-2 factorial design. We were not able to evaluate RCTs with primary end points that were ordinal or continuous in nature. Thus, our analysis could miss some important cardiovascular RCTs. Therefore, if the FI is used on a wider scale, many studies will not be able to utilize it. Second, time-to-event outcomes were included in the analysis. While multiple studies have included such outcomes in the calculation of the FI,6,30 the FI does not account for the contribution of time to the difference in treatment effects.31Third, our analysis did not take allocation concealment into account, and hence, any impact it had on FIs could not be evaluated. Last, we did not analyze median FIs of similar and differing intervention strategies and thus could not adequately establish whether actually comparing like-with-like end points yields more accurate differences in FIs.


    Although cardiovascular RCTs may be statistically significant based on the P value (P≤0.05), the results are often fragile, and the statistical significance of their findings may hinge on a small number of events. Therefore, when evaluating findings of trials, especially ones with large numbers of patients lost to follow-up and those in which patients lost to follow-up exceed the FI of that trial, clinicians should be cautious when translating results to bedside practice.


    The Data Supplement is available at

    Muhammad Shahzeb Khan, MD, John H. Stroger Jr. Hospital of Cook County, 1900 W Harrison St, Chicago, IL 60601. Email


    • 1. Narayan VM, Gandhi S, Chrouser K, Evaniew N, Dahm P. The fragility of statistically significant findings from randomised controlled trials in the urological literature.BJU Int. 2018; 122:160–166. doi: 10.1111/bju.14210CrossrefMedlineGoogle Scholar
    • 2. Breau RH, Dahm P, Fergusson DA, Hatala R. Understanding results.J Urol. 2009; 181:985–992. doi: 10.1016/j.juro.2008.11.029CrossrefMedlineGoogle Scholar
    • 3. Scales CD, Norris RD, Preminger GM, Vieweg J, Peterson BL, Dahm P. Evaluating the evidence: statistical methods in randomized controlled trials in the urological literature.J Urol. 2008; 180:1463–1467. doi: 10.1016/j.juro.2008.06.026CrossrefMedlineGoogle Scholar
    • 4. Akl EA, Briel M, You JJ, Sun X, Johnston BC, Busse JW, Mulla S, Lamontagne F, Bassler D, Vera C, Alshurafa M, Katsios CM, Zhou Q, Cukierman-Yaffe T, Gangji A, Mills EJ, Walter SD, Cook DJ, Schünemann HJ, Altman DG, Guyatt GH. Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review.BMJ. 2012; 344:e2809. doi: 10.1136/bmj.e2809CrossrefMedlineGoogle Scholar
    • 5. Thorlund K, Imberger G, Walsh M, Chu R, Gluud C, Wetterslev J, Guyatt G, Devereaux PJ, Thabane L. The number of patients and events required to limit the risk of overestimation of intervention effects in meta-analysis–a simulation study.PLoS One. 2011; 6:e25491. doi: 10.1371/journal.pone.0025491CrossrefMedlineGoogle Scholar
    • 6. Walsh M, Srinathan SK, McAuley DF, Mrkobrada M, Levine O, Ribic C, Molnar AO, Dattani ND, Burke A, Guyatt G, Thabane L, Walter SD, Pogue J, Devereaux PJ. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index.J Clin Epidemiol. 2014; 67:622–628. doi: 10.1016/j.jclinepi.2013.10.019CrossrefMedlineGoogle Scholar
    • 7. Berti A, Cornec D, Medina Inojosa JR, Matteson EL, Murad MH. Treatments for giant cell arteritis: meta-analysis and assessment of estimates reliability using the fragility index.Semin Arthritis Rheum. 2018; 48:77–82. doi: 10.1016/j.semarthrit.2017.12.009CrossrefMedlineGoogle Scholar
    • 8. Ridgeon EE, Young PJ, Bellomo R, Mucchetti M, Lembo R, Landoni G. The fragility index in multicenter randomized controlled critical care trials.Crit Care Med. 2016; 44:1278–1284. doi: 10.1097/CCM.0000000000001670CrossrefMedlineGoogle Scholar
    • 9. Evaniew N, Files C, Smith C, Bhandari M, Ghert M, Walsh M, Devereaux PJ, Guyatt G. The fragility of statistically significant findings from randomized trials in spine surgery: a systematic survey.Spine J. 2015; 15:2188–2197. doi: 10.1016/j.spinee.2015.06.004CrossrefMedlineGoogle Scholar
    • 10. Khan M, Evaniew N, Gichuru M, Habib A, Ayeni OR, Bedi A, Walsh M, Devereaux PJ, Bhandari M. The fragility of statistically significant findings from randomized trials in sports surgery: a systematic survey.Am J Sports Med. 2017; 45:2164–2170.CrossrefMedlineGoogle Scholar
    • 11. Shen Y, Cheng X, Zhang W. The fragility of randomized controlled trials in intracranial hemorrhage.Neurosurg Rev. 2019; 42:9–14. doi: 10.1007/s10143-017-0870-8CrossrefMedlineGoogle Scholar
    • 12. Mazzinari G, Ball L, Serpa Neto A, Errando CL, Dondorp AM, Bos LD, Gama de Abreu M, Pelosi P, Schultz MJ. The fragility of statistically significant findings in randomised controlled anaesthesiology trials: systematic review of the medical literature.Br J Anaesth. 2018; 120:935–941. doi: 10.1016/j.bja.2018.01.012CrossrefMedlineGoogle Scholar
    • 13. Matics TJ, Khan N, Jani P, Kane JM. The fragility index in a cohort of pediatric randomized controlled trials.J Clin Med. 2017; 6:79.CrossrefGoogle Scholar
    • 14. Checketts JX, Scott JT, Meyer C, Horn J, Jones J, Vassar M. The robustness of trials that guide evidence-based orthopaedic surgery.J Bone Joint Surg Am. 2018; 100:e85. doi: 10.2106/JBJS.17.01039CrossrefMedlineGoogle Scholar
    • 15. Brown J, Lane A, Cooper C, Vassar M. The results of randomized controlled trials in emergency medicine are frequently fragile.Ann Emerg Med. 2019; 73:565–576. doi: 10.1016/j.annemergmed.2018.10.037CrossrefMedlineGoogle Scholar
    • 16. Bowers A, Meyer C, Tritz D, Cook C, Fuller K, Smith C, Diener B, Vassar M. Assessing quality of randomized trials supporting guidelines for laparoscopic and endoscopic surgery.J Surg Res. 2018; 224:233–239. doi: 10.1016/j.jss.2017.11.061CrossrefMedlineGoogle Scholar
    • 17. Wayant C, Meyer C, Gupton R, Som M, Baker D, Vassar M. The fragility index in a cohort of HIV/AIDS randomized controlled trials.J Gen Intern Med. 2019; 34:1236–1243. doi: 10.1007/s11606-019-04928-5CrossrefMedlineGoogle Scholar
    • 18. Schultz MJ, Bos LD, Dondorp AM. How to improve quality of research in intensive care medicine.Ann Transl Med. 2018; 6:35. doi: 10.21037/atm.2018.01.18CrossrefMedlineGoogle Scholar
    • 19. Schober P, Bossers SM, Schwarte LA. Statistical significance versus clinical importance of observed effect sizes: what do P values and confidence intervals really represent?Anesth Analg. 2018; 126:1068–1072. doi: 10.1213/ANE.0000000000002798CrossrefMedlineGoogle Scholar
    • 20. Farland LV, Correia KF, Wise LA, Williams PL, Ginsburg ES, Missmer SA. P-values and reproductive health: what can clinical researchers learn from the American Statistical Association?Hum Reprod. 2016; 31:2406–2410. doi: 10.1093/humrep/dew192CrossrefMedlineGoogle Scholar
    • 21. McIntyre WF, Spence J, Belley-Cote EP. Assessing the quality of evidence supporting patent foramen ovale closure over medical therapy after cryptogenic stroke.Eur Heart J. 2018; 39:3618–3619. doi: 10.1093/eurheartj/ehy496CrossrefMedlineGoogle Scholar
    • 22. Søndergaard L, Kasner SE, Rhodes JF, Andersen G, Iversen HK, Nielsen-Kudsk JE, Settergren M, Sjöstrand C, Roine RO, Hildick-Smith D, Spence JD, Thomassen L; Gore REDUCE Clinical Study Investigators. Patent foramen ovale closure or antiplatelet therapy for cryptogenic stroke.N Engl J Med. 2017; 377:1033–1042. doi: 10.1056/NEJMoa1707404CrossrefMedlineGoogle Scholar
    • 23. Saver JL, Carroll JD, Thaler DE, Smalling RW, MacDonald LA, Marks DS, Tirschwell DL; RESPECT Investigators. Long-term outcomes of patent foramen ovale closure or medical therapy after stroke.N Engl J Med. 2017; 377:1022–1032. doi: 10.1056/NEJMoa1610057CrossrefMedlineGoogle Scholar
    • 24. Huo Y, Li J, Qin X, Huang Y, Wang X, Gottesman RF, Tang G, Wang B, Chen D, He M, Fu J, Cai Y, Shi X, Zhang Y, Cui Y, Sun N, Li X, Cheng X, Wang J, Yang X, Yang T, Xiao C, Zhao G, Dong Q, Zhu D, Wang X, Ge J, Zhao L, Hu D, Liu L, Hou FF; CSPPT Investigators. Efficacy of folic acid therapy in primary prevention of stroke among adults with hypertension in China: the CSPPT randomized clinical trial.JAMA. 2015; 313:1325–1335. doi: 10.1001/jama.2015.2274CrossrefMedlineGoogle Scholar
    • 25. Granger CB, Alexander JH, McMurray JJ, Lopes RD, Hylek EM, Hanna M, Al-Khalidi HR, Ansell J, Atar D, Avezum A, Bahit MC, Diaz R, Easton JD, Ezekowitz JA, Flaker G, Garcia D, Geraldes M, Gersh BJ, Golitsyn S, Goto S, Hermosillo AG, Hohnloser SH, Horowitz J, Mohan P, Jansky P, Lewis BS, Lopez-Sendon JL, Pais P, Parkhomenko A, Verheugt FW, Zhu J, Wallentin L; ARISTOTLE Committees and Investigators. Apixaban versus warfarin in patients with atrial fibrillation.N Engl J Med. 2011; 365:981–992. doi: 10.1056/NEJMoa1107039CrossrefMedlineGoogle Scholar
    • 26. Docherty KF, Campbell RT, Jhund PS, Petrie MC, McMurray JJV. How robust are clinical trials in heart failure?Eur Heart J. 2017; 38:338–345. doi: 10.1093/eurheartj/ehw427MedlineGoogle Scholar
    • 27. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose.Am Stat. 2016; 70:129–133.CrossrefGoogle Scholar
    • 28. Tignanelli CJ, Napolitano LM. The fragility index in randomized clinical trials as a means of optimizing patient care.JAMA Surg. 2019; 154:74–79. doi: 10.1001/jamasurg.2018.4318CrossrefMedlineGoogle Scholar
    • 29. Carter RE, McKie PM, Storlie CB. The fragility index: a P-value in sheep’s clothing?Eur Heart J. 2017; 38:346–348. doi: 10.1093/eurheartj/ehw495MedlineGoogle Scholar
    • 30. Del Paggio JC, Tannock IF. The fragility of phase 3 trials supporting FDA-approved anticancer medicines: a retrospective analysis.Lancet Oncol. 2019; 20:1065–1069. doi: 10.1016/S1470-2045(19)30338-9CrossrefMedlineGoogle Scholar
    • 31. Svantesson E, Hamrin Senorski E, Danielsson A, Sundemo D, Westin O, Ayeni OR, Samuelsson K. Strength in numbers? The fragility index of studies from the Scandinavian knee ligament registries [published online June 12, 2019).Knee Surg Sports Traumatol Arthrosc. doi: 10.1007/s00167-019-05551-xGoogle Scholar