Pseudoprospective Paraclinical Interaction of Radiology Residents With a Deep Learning System for Prostate Cancer Detection: Experience, Performance, and Identification of the Need for Intermittent Recalibration

Abstract

OBJECTIVES: The aim of this study was to estimate the prospective utility of a previously retrospectively validated convolutional neural network (CNN) for prostate cancer (PC) detection on prostate magnetic resonance imaging (MRI). MATERIALS AND METHODS: The biparametric (T2-weighted and diffusion-weighted) portion of clinical multiparametric prostate MRI from consecutive men included between November 2019 and September 2020 was fully automatically and individually analyzed by a CNN briefly after image acquisition (pseudoprospective design). Radiology residents performed 2 research Prostate Imaging Reporting and Data System (PI-RADS) assessments of the multiparametric dataset independent from clinical reporting (paraclinical design) before and after review of the CNN results and completed a survey. Presence of clinically significant PC was determined by the presence of an International Society of Urological Pathology grade 2 or higher PC on combined targeted and extended systematic transperineal MRI/transrectal ultrasound fusion biopsy. Sensitivities and specificities on a patient and prostate sextant basis were compared using the McNemar test and compared with the receiver operating characteristic (ROC) curve of CNN. Survey results were summarized as absolute counts and percentages. RESULTS: A total of 201 men were included. The CNN achieved an ROC area under the curve of 0.77 on a patient basis. Using PI-RADS ≥3-emulating probability threshold (c3), CNN had a patient-based sensitivity of 81.8% and specificity of 54.8%, not statistically different from the current clinical routine PI-RADS ≥4 assessment at 90.9% and 54.8%, respectively (P = 0.30/P = 1.0). In general, residents achieved similar sensitivity and specificity before and after CNN review. On a prostate sextant basis, clinical assessment possessed the highest ROC area under the curve of 0.82, higher than CNN (AUC = 0.76, P = 0.21) and significantly higher than resident performance before and after CNN review (AUC = 0.76 / 0.76, P ≤ 0.03). The resident survey indicated CNN to be helpful and clinically useful. CONCLUSIONS: Pseudoprospective paraclinical integration of fully automated CNN-based detection of suspicious lesions on prostate multiparametric MRI was demonstrated and showed good acceptance among residents, whereas no significant improvement in resident performance was found. General CNN performance was preserved despite an observed shift in CNN calibration, identifying the requirement for continuous quality control and recalibration.

Publication
Investigative Radiology
Jens Kleesiek
Jens Kleesiek
Professor of Translational Image-guided Oncology
Klaus Maier-Hein
Klaus Maier-Hein
Head of Medical Image Computing