23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          When designing a study to develop a new prediction model with binary or time‐to‐event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants ( n) and outcome events ( E) relative to the number of predictor parameters ( p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of 0.9, (ii) small absolute difference of 0.05 in the model's apparent and adjusted Nagelkerke's R 2 , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox‐Snell R 2 , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          Cardiovascular disease risk profiles.

          This article presents prediction equations for several cardiovascular disease endpoints, which are based on measurements of several known risk factors. Subjects (n = 5573) were original and offspring subjects in the Framingham Heart Study, aged 30 to 74 years, and initially free of cardiovascular disease. Equations to predict risk for the following were developed: myocardial infarction, coronary heart disease (CHD), death from CHD, stroke, cardiovascular disease, and death from cardiovascular disease. The equations demonstrated the potential importance of controlling multiple risk factors (blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking, glucose intolerance, and left ventricular hypertrophy) as opposed to focusing on one single risk factor. The parametric model used was seen to have several advantages over existing standard regression models. Unlike logistic regression, it can provide predictions for different lengths of time, and probabilities can be expressed in a more straightforward way than the Cox proportional hazards model.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Derivation of a simple clinical model to categorize patients probability of pulmonary embolism: increasing the models utility with the SimpliRED D-dimer.

            We have previously demonstrated that a clinical model can be safely used in a management strategy in patients with suspected pulmonary embolism (PE). We sought to simplify the clinical model and determine a scoring system, that when combined with D-dimer results, would safely exclude PE without the need for other tests, in a large proportion of patients. We used a randomly selected sample of 80% of the patients that participated in a prospective cohort study of patients with suspected PE to perform a logistic regression analysis on 40 clinical variables to create a simple clinical prediction rule. Cut points on the new rule were determined to create two scoring systems. In the first scoring system patients were classified as having low, moderate and high probability of PE with the proportions being similar to those determined in our original study. The second system was designed to create two categories, PE likely and unlikely. The goal in the latter was that PE unlikely patients with a negative D-dimer result would have PE in less than 2% of cases. The proportion of patients with PE in each category was determined overall and according to a positive or negative SimpliRED D-dimer result. After these determinations we applied the models to the remaining 20% of patients as a validation of the results. The following seven variables and assigned scores (in brackets) were included in the clinical prediction rule: Clinical symptoms of DVT (3.0), no alternative diagnosis (3.0), heart rate >100 (1.5), immobilization or surgery in the previous four weeks (1.5), previous DVT/PE (1.5), hemoptysis (1.0) and malignancy (1.0). Patients were considered low probability if the score was 4.0. 7.8% of patients with scores of less than or equal to 4 had PE but if the D-dimer was negative in these patients the rate of PE was only 2.2% (95% CI = 1.0% to 4.0%) in the derivation set and 1.7% in the validation set. Importantly this combination occurred in 46% of our study patients. A score of <2.0 and a negative D-dimer results in a PE rate of 1.5% (95% CI = 0.4% to 3.7%) in the derivation set and 2.7% (95% CI = 0.3% to 9.0%) in the validation set and only occurred in 29% of patients. The combination of a score < or =4.0 by our simple clinical prediction rule and a negative SimpliRED D-Dimer result may safely exclude PE in a large proportion of patients with suspected PE.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A prognostic index in primary breast cancer.

              From a multiple-regression analysis of prognostic factors and survival in a series of 387 patients with primary breast cancer, a prognostic index has been constructed, based on lymph-node stage, tumour size and pathological grade. This index is more discriminating than lymph-node stage alone, and enables a larger group of patients to be identified with a very poor prognosis.
                Bookmark

                Author and article information

                Contributors
                r.riley@keele.ac.uk
                Journal
                Stat Med
                Stat Med
                10.1002/(ISSN)1097-0258
                SIM
                Statistics in Medicine
                John Wiley and Sons Inc. (Hoboken )
                0277-6715
                1097-0258
                24 October 2018
                30 March 2019
                : 38
                : 7 ( doiID: 10.1002/sim.v38.7 )
                : 1276-1296
                Affiliations
                [ 1 ] Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences Keele University Staffordshire UK
                [ 2 ] Department of Biostatistics Vanderbilt University School of Medicine Nashville Tennessee
                [ 3 ] Julius Centre for Health Sciences and Primary Care University Medical Centre Utrecht Utrecht The Netherlands
                [ 4 ] Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences University of Oxford Oxford UK
                Author notes
                [*] [* ] Richard D Riley, Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences, Keele University, Staffordshire ST5 5BG, UK.

                Email: r.riley@ 123456keele.ac.uk

                Author information
                http://orcid.org/0000-0001-8699-0735
                http://orcid.org/0000-0001-7481-0282
                http://orcid.org/0000-0003-2803-1151
                http://orcid.org/0000-0002-2772-2316
                Article
                SIM7992 SIM-18-0084.R2
                10.1002/sim.7992
                6519266
                30357870
                cc90517c-ec6d-443d-989c-64b59b882d50
                © 2018 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

                History
                : 04 February 2018
                : 13 September 2018
                : 13 September 2018
                Page count
                Figures: 3, Tables: 2, Pages: 1, Words: 13366
                Funding
                Funded by: National Institute for Health Research School for Primary Care Research (NIHR SPCR)
                Funded by: Netherlands Organisation for Scientific Research
                Award ID: 9120.8004
                Award ID: 918.10.615
                Funded by: National Centre for Advancing Translational Sciences
                Award ID: UL1 TR002243
                Funded by: NIHR Biomedical Research Centre, Oxford
                Categories
                Research Article
                Research Articles
                Custom metadata
                2.0
                sim7992
                sim7992-hdr-0001
                30 March 2019
                Converter:WILEY_ML3GV2_TO_NLMPMC version:5.6.2.1 mode:remove_FC converted:15.05.2019

                Biostatistics
                binary and time‐to‐event outcomes,logistic and cox regression,multivariable prediction model,pseudo r‐squared,sample size,shrinkage

                Comments

                Comment on this article