Prediction precision

NEW YORK & LOS ANGELES—Prostate cancer, the third most common cause of death and the most prevalent male malignancy worldwide, is second to lung cancer in annual death tolls for American men. Although recent advances in prostate cancer research have saved many lives, objective prediction tools have been an unmet need.

Researchers from the Icahn School of Medicine at Mount Sinai have collaborated with researchers from the Keck School of Medicine at the University of Southern California (USC) to change that. The team has developed a machine-learning framework to precisely distinguish between low- and high-risk prostate cancer. Described in a Scientific Reports article, the framework is designed to help physicians, especially radiologists, to identify treatment choices for prostate cancer patients with less chance for unnecessary clinical intervention.

Standard methods to assess prostate cancer risk—multiparametric magnetic resonance imaging (mpMRI), which detects prostate lesions, and the Prostate Imaging Reporting and Data System, version 2 (PI-RADS v2), a five-point scoring system that classifies lesions found on the mpMRI—can predict the likelihood of clinically significant prostate cancer. Nonetheless, scoring is subjective and does not distinguish clearly between intermediate and malignant cancer levels, potentially resulting in different interpretations from clinicians.

By combining machine learning with radiomics (a branch of medicine using algorithms to extract large amounts of quantitative characteristics from medical images), the problem can be solved if enough machine-learning methods can be studied to address this limitation. The Mount Sinai and USC researchers developed a predictive framework to rigorously and systematically assess numerous methods to identify the best-performing one while leveraging more training and validation data sets than previous studies had. Thus, researchers could classify patients’ prostate cancer with high sensitivity and a high predictive value.

To conduct a comprehensive assessment of the candidate classifiers tested, the Precision-Recall-F-measure family of evaluation measures was used in addition to the AUC score. This family is reportedly more informative about classifier performance in situations with unbalanced class distributions, typical in biomedical studies such as prostate cancer risk stratification, as is true in this and other studies’ cohorts. The performance of the final classifier developed by the framework for assessing risk was evaluated in an independent cohort of prostate cancer patients, and compared to the PI-RADS v2 system to assess the relative utility of a well-developed combination of radiomics and machine learning for objective and accurate prostate cancer risk stratification.

According to Dr. Gaurav Pandey, an assistant professor of genetics and genomic sciences at the Icahn School of Medicine at Mount Sinai and senior corresponding author of the publication, “By rigorously and systematically combining machine learning with radiomics, our goal is to provide radiologists and clinical personnel with a sound prediction tool that can eventually translate to more effective and personalized patient care. The pathway to predicting prostate cancer progression with high accuracy is ever improving, and we believe our objective framework is a much-needed advancement.”

Seventy-three prostate cancer patients with histopathologic diagnosis, mpMRI of the prostate and transrectal ultrasound-magnetic resonance (TRUS-MR) imaging fusion guided biopsy of the prostate within 2 months of mpMRI, diagnosed between March 2013 and May 2016, were included in the single-institution, retrospective study. Five patients were excluded, resulting in the final development set of 68 patients. The dominant lesion was chosen, and the patients were divided into high, intermediate and low categories per National Comprehensive Cancer Network guidelines. Then, these categories were combined into two classes—“high risk” and “lower risk”—to make the data fit into the traditional classification algorithms.

To compensate for the substantial class imbalance in the development set—a higher number of lower-risk patients (54) than high-risk ones (14)—the researchers performed the classification algorithms constituting the framework with and without random over-sampling. For all the algorithms, random over-sampling provided improved performance across all the evaluation measures as compared to not oversampling.