Evaluation of Biomarkers in Critical Care and Perioperative Medicine

A Clinician's Overview of Traditional Statistical Methods and Machine Learning Algorithms

Sabri Soussi, M.D., M.Sc.; Gary S. Collins, Ph.D.; Peter Jüni, M.D.; Alexandre Mebazaa, M.D., Ph.D.; Etienne Gayat, M.D., Ph.D.; Yannick Le Manach, M.D., Ph.D.


Anesthesiology. 2020;134(1):15-25. 

In This Article

Challenges and Common Pitfalls in Studies Evaluating Biomarkers

Properties of Biomarker Assay

The precision of the measurement of a biomarker should be assessed. Along this line, the biologic assay and its measurement errors should be reported. The biomarker assay should be sensitive, detecting low concentrations of the biomarker, and specific, in that it is not affected by other molecules. Interlaboratory biomarker assay reproducibility should be considered when assessing a biomarker model performance in a cohort collected from a different institution (external validation).

Another potential issue is that the same biomarker can be produced by different cells with a different pathway mechanism. For example, urinary kidney injury molecule-1 (a biomarker of kidney injury) can also be produced by kidney cancer cells in the absence of kidney injury.[58,59] This point is difficult to control when analyzing data, as the physiology of a novel biomarker is often incompletely known.

Role of Time and Biomarker Kinetics

The timing of biomarker measurement is important to consider. For example, optimal information needed for the diagnosis of myocardial infarction in the postoperative period is obtained at the peak of troponin I concentrations (~24 h).

In major surgery and critical care, biomarkers of interest such as troponin T, N-terminal pro-B-type natriuretic peptide, and C-reactive protein may have completely different kinetics.[60] The main issue in these conditions is the timing of biomarker measurement, which has to take into consideration not only the biomarkers kinetics, but also the time of onset of various pathophysiological processes (e.g., major surgery with a secondary onset of sepsis). Correlations between repeated measurements of the biomarker within an individual should also be considered during analysis. The use of mixed models instead of repeated measures analysis of variance offers distinct advantages in many instances.[61]

Another issue is that renal or hepatic function could influence the elimination of a biomarker and thus its diagnostic properties. This point is important to consider in elderly (with chronic organ dysfunctions), as well as major surgery and critical care, patients who are more likely to present with organ failure.

Along this line, the choice of the "optimal" biomarker measurement timing and adjustment for covariates (e.g., age, renal function) are a real challenge when including them in regression models and machine learning algorithms with clinical parameters gathered in real time.

Imperfect Accepted Standard Methods

The choice of the reference test used to define diseased and nondiseased patients (e.g., postoperative AKI, postoperative myocardial infarction) should be carefully considered. Novel biomarkers are frequently evaluated against accepted standards that are assumed to classify patients with perfect accuracy according to the presence or absence of disease. In practice, reference tests are rarely unerring predictors of disease and tend to misclassify patients. In the case of an imperfect accepted standard (e.g., delayed increase in serum creatinine in the case of AKI[62]), patient misclassification introduces biases into the sensitivity and specificity estimates of the new biomarker. One of the main methods suggested to improve an "imperfect" reference standard is composite reference standards. The rationale is that combining results of different imperfect tests leads to a more accurate reference test. Nevertheless, the accuracy of this approach has been questioned.[63]

There are some situations in which the outcome is not dichotomous (diseased or nondiseased patients) but continuous (e.g., creatinine level variation) or ordinal (e.g., AKI network stages). In this case, a nonparametric estimator of the novel biomarker diagnostic accuracy with an interpretation analogous to the AUC can be applied.[64]

Different Populations

The studied population could greatly influence the diagnostic and prognostic performance of a test. For example, there are different cutoff points of cardiac troponin I to diagnose postoperative myocardial infarction in noncardiac versus cardiac surgery, or even in cardiac surgery patients with different procedures (coronary artery bypass graft vs. valve surgery).[65] Diagnostic test results may also vary in populations with different demographic characteristics and chronic illnesses (e.g., age, chronic kidney disease). Therefore, authors should describe the exact studied population about which they want to make inference. Adjustment for covariates (external influences) is a major point when including biomarkers in regression models.

Associated Clinical Predictors or Multiple Biomarkers

To assess associated clinical predictors or multiple biomarkers regarding an outcome, a risk prediction model could be developed using logistic regression or Cox regression. Two models are then built and compared based on the difference in the AUC or the difference in the Harrell C-statistic, the first with usual predictors and the second with usual predictors and the novel biomarkers, respectively.[49,66] A multiple biomarker approach could also be applied. For example, stratification of long-term outcome is improved when adding several novel biomarkers of cardiac (N-terminal pro-B-type natriuretic peptide and soluble ST2) and vascular failure (bioactive adrenomedullin) to the multivariable clinical model.[67]

Conceptual issues related to the planning and analysis of biomarker performance are presented in Box 4. This methodologic approach could lead to a decrease in bias and thus obtain a pooled estimation of the biomarker performance. A summary of the most common avoidable pitfalls is presented in Box 5.


Biomarker evaluations need a rigorously documented statistical analysis plan, which should be set up before analysis. Investigators need to choose methods based on the clinical question/hypothesis, biomarker phase of development (i.e., discovery, evaluation of accuracy, assessment of incremental value), and weaknesses of the statistical methods. Biomarkers studies are often presented with statistical analyses pitfalls (e.g., not considering properties of a biomarker assay, biomarker kinetics, imperfect accepted standard methods, and different populations) that preclude them from providing a pragmatic scientific message for anesthesiologists and intensivists. Therefore, the tables and toolboxes provided in this article could be used in addition to existing guidelines by investigators, editors, and reviewers to ensure the publication of high-quality biomarker studies for informed readers.

Furthermore, novel biostatistical techniques (e.g., machine learning) are used more and more in critical care and perioperative medicine research. Machine learning is a promising tool to improve outcome prediction and patient subphenotyping to personalize treatments in critical patients. However, we believe that there is a real need for further research to better evaluate the role of machine learning to predict pathology or response to treatment. A direct implementation of machine learning in clinical decision making is as deleterious for patients as a poorly implemented statistical approach. Tables are provided in this article to help the reader to better understand machine learning techniques applied in health care and to avoid their misuse (e.g., overfitting, lack of independent validation, lack of comparison with simpler modeling approaches).