Net reclassification improvement
Net reclassification improvement (NRI) is an index that attempts to quantify how well a new model reclassifies subjects - either appropriately or inappropriately - as compared to an old model.[1] While c-statistics or AUC has been the standard metric for quantifying improvements over the last few decades, several studies have analyzed the limitations of this metric including lack of clinical relevance and difficulty in interpretation of small magnitude changes.[2][3] This limitation can be best seen in the example of HDL and Framingham Risk Score (FRS). When a models with and without HDL were analyzed with AUC regarding effect of HDL of modifying FRS, HDL was found not to have a statistical significant effect. However, when analyzed in terms of outcomes, HDL was found to be a significant predictor of heart disease and thus should affect FRS.[4] To overcome this limitation the concept of reclassification, that is how well a new model correctly reclassifies cases, was introduced through the metric of NRI.[5]
Basic Concept
NRI attempts to quantify how well a new model correctly reclassifies subjects. Typically this comparison is between an original model (e.g. hip fractures as a function age and sex) and a new model which is the original model plus one additional component (e.g. hip fractures as a function of age, sex, and weight). NRI is composed of two components, subjects without events and subjects with events. Subject without (with) events who were correctly reclassified lower (higher) are assigned a +1. Subjects without (with) events who were incorrectly classified as higher (lower) are assigned a -1. Subjects not reassigned are assigned a 0. Sum the scores in each group and divide by the number of subjects in that group. The sum of these two values is the NRI.
Example
Event | Test 1 | Total, split | Total | ||
---|---|---|---|---|---|
Non-event | Abnormal | Normal | |||
Test 2 | Abnormal | 18 | 4 | 22 | 28 |
2 | 4 | 6 | |||
Normal | 2 | 6 | 8 | 72 | |
8 | 56 | 64 | |||
Total, split | 20 | 10 | 30 | ||
10 | 60 | 70 | |||
Total | 30 | 70 | 100 |
In a perfect test, all subjects with events would be classified as abnormal and all subjects without events would be classified as normal. Bold indicates subjects correctly classified by both tests. White indicates subjects incorrectly classified by both tests. Green indicates subjects correctly reclassified by test 2. Red indicates subjects incorrectly reclassified by test 2. NRIe = (4-2)/30 = 0.067. NRIne = (8-4)/70 = 0.057. NRI is the sum which is approximately 0.12.
Limitations
NRI limitations including determining whether a subject has been "correctly" reclassified and issues when the models results are not binary (e.g. low, medium, and high risk).
See also
References
- ↑ Leening MJG, Vedder MM, Witteman JCM, Pencina MJ, Steyerberg EW. Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician’s guide. Ann Intern Med. 2014;160(2):122-131.
- ↑ 1. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115(7):928-935.
- ↑ Pencina MJ, D’Agostino RB, Pencina KM, Janssens ACJW, Greenland P. Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol. 2012;176(6):473-481.
- ↑ Steyerberg EW, Calster BV, Pencina MJ. Performance Measures for Prediction Models and Markers: Evaluation of Predictions and Classifications. Revista Española de Cardiología (English Edition). 2011;64(9):788-794
- ↑ Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157-172; discussion 207-212.