Studies and Evidence
Current performance using real world NHS patient data
Sensitivity 95%
Specificity 67%
Pair-1 Study
The PAIR-1 study is a large, retrospective, multi-centre cohort study which collected data from 8 NHS Trusts including Somerset, Bath, Bristol, Mid & South Essex, Cambridge, East & North Herts, Basingstoke and Cornwall. Anonymised data was collected on patients from each NHS site referred for pre-biopsy MRI for suspected prostate cancer.
Data were collected from a variety of 1.5T and 3T scanner vendors and scanner types with different acquisition protocols according to local NHS practice. The data were assessed against study inclusion and exclusion criteria and split, balanced per site, into training (N=794) and external validation datasets (N=252). Men included in the external validation dataset had an average age of 67 years and median pre-biopsy PSA of 6.8 ng/mL.
Pi v2.4 has been shown in the PAIR-1 study to have 95% sensitivity (95% confidence interval (CI): 90-99%) and 67% (60-74%) specificity for the detection of clinically significant prostate cancer, defined as Gleason grade ≥ 3+4 confirmed on prostate biopsy. Overall patient-level ROC AUC is 0.91 (0.87-0.95).
The results from the study have been presented at both national and international conferences and the PAIR-1 manuscript is currently under peer-review at a high impact journal. The study was led by Dr Aarti Shah as Chief Investigator with Hampshire Hospitals NHSFT acting as study sponsor.
Lucy Davies
Head of Clinical Strategy Lucida Medical &
Honorary Research Associate, University of Oxford
Dr Aarti Shah and Prof Richard Hindley discuss the objectives and importance of the PAIR-1 study
We are especially grateful to Dr Aarti Shah, Consultant Radiologist, and Prof Richard Hindley, Consultant Urologist, at Hampshire Hospitals NHS Foundation Trust, who led the PAIR-1 study
Published abstracts from the Pair-1 study
Multiple centre external validation of an AI solution for prostate cancer diagnostic imaging
Oral research presentation at RSNA 2023
Aarti Shah; Nadia Sofia Moreira Da Silva; Michael Yeung; Francesco Giganti; Lucy Davies; Paul Richard Burn; Richard Hindley; Nikhil Vasdev; John Hayes; Sophie Squire; Alison Jane Bradley; Giles Maskell; Adrian Andreou; Sidath Liyanage; Mark De Bono; Raj Persad; Nimalan Sanmugalingam; Tristan Barrett; Jonathan Aning; Mark Hinton; Antony Rix; Evis Sala
Purpose:
Clinical translation of AI solutions for detection of clinically significant prostate cancer (csPCa) has been limited by the lack of validation on multi-centre datasets including multiple MRI scanners, vendors, field strengths and imaging protocols. Here, we evaluate the ability of an AI solution to generalise to real- world external validation data including blinded validation on an unseen site.
Materials and Methods:
AI-based software [Lucida Medical, Pi v2.2] was developed using PROSTATEx and retrospective data from five sites (794 patients, 34% csPCa). The software was evaluated on a blinded external validation set (252 patients – 42 per site, 31% csPCa, 9% with prior negative biopsy) of multiparametric (mpMRI) data obtained from six sites; one site was unseen during development, and data from other sites was from later time periods than the development set. This external data included six scanner models from two vendors, with different field strengths (1.5T/3.0T) and acquisition protocols.
The software automatically outputs scores intended to identify Gleason score (GS)≥3+4 csPCa per-patient. csPCa was confirmed by biopsy (GS≥3+4 / PI-RADS ≥3), with PI-RADS 1/2 patients that did not receive a biopsy assumed negative. Exclusion criteria included quality issues such as severe motion and metal prostheses, active surveillance, prior prostate or bladder surgery or treatment including brachytherapy, TURP, prostatectomy, ablation, HIFU/focal therapy, or water vapour therapy. Performance was evaluated using ROC analysis, with 95% confidence intervals estimated by bootstrapping..
Results:
For selecting patients for biopsy, the AI identified patients with csPCa with sensitivity 94% (95% CI 88- 99%), specificity 57% (49-64%), NPV 95% (90-99%), and AUC 0.85 (0.80-0.90) using mpMRI data from the blinded external validation set. Comparing between sites, the AUC ranged from 0.70-0.98, with a pooled AUC of 0.86±0.11. On the unseen site, the AUC was 0.95 (0.87-1.00).
Reporting radiologists had per-patient sensitivity 99% (95% CI 96-100%) due to the assumed ground truth, specificity 73% (67-80%), NPV 99% (98-100%), and AUC 0.95 (0.92-0.97). In a 2019 Cochrane meta-analysis of 12 major studies (37% csPCa), radiologists identified patients with GS≥3+4 csPCa with sensitivity 86% and specificity 42%.
Conclusion:
The proposed AI solution shows comparable performance to radiologists in major expert studies, on a large real-world, multi-centre, external validation dataset with different scanners, vendors, field strengths and imaging protocols.
Clinical Relevance:
AI could support prostate cancer detection in clinical practice, generalises to multiple sites, scanners and imaging protocols, and is robust to novel data.
Per-patient sensitivity vs. false positive rate, with comparators
Click image to enlarge
Figure illustrates the patient-level ROC curves compared against ground truth for:
- Pi v2.4 as evaluated in this study
- The original reporting radiologists, who are assumed near-100% sensitive as they determined whether a biopsy was performed
- Major prostate MRI studies PROMIS, MRI-FIRST and 4M
- Comparable academic AI research by Hosseinzadeh et al (radiologists and AI).
Click image to enlarge
Click image to enlarge
95% confidence intervals estimated through bootstrapping are shown where available.
Comparator studies with extensive biopsy ground truth (all cited at PI-RADS/Likert 3 threshold unless stated):
1. PROMIS: Ahmed HU, et al. Lancet 2017;389:815-822
2. MRI-FIRST: Rouvière O, et al. Lancet Oncol. 2019;20(1):100-109
3. 4M: van der Leest M, et al. Eur. Urol. 2019;75(4):570–578
4. 252 patients from blinded external data; note that most MRI-negative cases did not receive biopsy, resulting in the assumption that sensitivity and NPV are near 100%
5. Hosseinzadeh M, et al. Eur Radiol. 2022; 32(4): 2224–2234 (AI study)
Integrating clinical data with AI to optimise decision-making in prostate MRI
Abstract presented at RSNA 2023
Antony Rix; Nadia Moreira Da Silva; Jobie Budd; Michael Yeung; Francesco Giganti; Lucy Davies; Paul Richard Burn; Richard Hindley; Nikhil Vasdev; Alison Jane Bradley; Giles Maskell; Adrian Andreou; Sidath Liyanage; Raj Persad; Jonathan Aning; Tristan Barrett; Mark Hinton; Anwar Roshanali Padhani; Evis Sala; Aarti Shah
Purpose:
To determine whether combining prostate MRI AI-based decision support outputs, clinical data and PI- RADS scores in a multi-modal predictive model enhances detection of clinically significant prostate cancer.
Methods and Materials:
MRI, clinical history, histopathology, and PI-RADS scores were obtained retrospectively from five sites in a multi-vendor, multiple field strength study. After exclusions for AI contraindications including prior treatment and quality issues, model training used data from 352 patients and a held-out test set comprised data from 235 patients (Gleason grade group (GGG)≥2, prevalence 34%).
Our automated multi-stage AI-based software segments and calculates the volume of prostate whole gland and transition zone (TZ) on MRI, and segments and scores lesions/patients for GGG≥2 disease likelihood.
Biopsy-verified GGG≥2 was used as ground truth, with MRI-negative patients not undergoing biopsy assumed negative. Sensitivity, specificity, and AUC were evaluated at patient level on the held-out test set, with 95% confidence intervals obtained through bootstrapping. Combinations of AI, clinical and PI-RADS data were tested for significant improvement to the AI score and PI-RADS assessment, at pre-determined thresholds equivalent to PI-RADS 3.
Results:
mpMRI PI-RADS scores alone detected GGG≥2 with sensitivity 1.00 (95% CI 1.00-1.00), specificity 0.67 (0.61-0.75) and AUC 0.94 (0.91-0.97).
GGG≥2 was detected by bpMRI AI with sensitivity 0.97 (0.93-1.00), specificity 0.55 (0.47-0.62) and AUC 0.88 (0.84-0.92). Combining AI score and TZ-PSA density (PSAD) improved specificity (sensitivity 0.95 (0.90-0.99), specificity 0.70 (0.63-0.77) and AUC 0.90 (0.85-0.93)).
The addition of AI and TZ-PSAD to PI-RADS scores maintained high sensitivity of 0.99 (0.96-1.00), while significantly improving specificity to 0.83 (0.77-0.89, KS p-value<0.001) and AUC to 0.96 (0.93- 0.98, DeLong p-value 0.003).
TZ volume based PSAD had modest additional benefit compared to whole-prostate PSAD. Other variables offered <5% specificity improvements or non-significant benefits. Findings with bpMRI and mpMRI AI models were similar.
Limitations:
Most MRI-negative cases did not receive biopsy in this retrospective study.
Conclusion:
The use of PSAD improves the predictive accuracy of prostate MRI AI decision support, with significant improvement in specificity at similar sensitivity. Combining PI-RADS, PSAD and AI offers substantial improvement compared to AI or PI-RADS assessments alone.
Clinical Relevance:
The improved specificity achieved through integrating patient PSAD and radiologists’ PI-RADS scores with AI software can potentially reduce false positive cases, further aiding patient selection for biopsy using MRI.
The combined models were trained on a separate training data set (n=352). The sensitivity and specificity measures reported in the text and plotted in the figure were obtained at thresholds that were pre-determined from the training data to correspond to PI-RADS 3.
Ground truth for this evaluation is biopsy-verified Gleason grade group ≥2 or higher cancer. Patients in this study received standard-of-care biopsy according to local practice. MRI-negative (PI-RADS 1-2) patients who did not receive a biopsy were assumed negative. Note that this implies near-100% sensitivity for the clinical PI-RADS assessment. It is therefore appropriate to consider the improvements to specificity (false-positive reductions) when assessing the potential added value of data integration in this analysis.
Per-patient sensitivity vs. false positive rate, identifying patients with any Gleason>=3+4 cancer
Click image to enlarge
Figure illustrates the patient-level ROC curves obtained on the held-out test data (n=235) for each predictor compared against ground truth:
-
bpMRI AI alone
-
Combined model using bpMRI AI score together with TZ-PSAD computed using PSA data and AI TZ volume
-
PI-RADS category
-
Combined model using PI-RADS, bpMRI AI score and TZ-PSAD.