Bhatia, BahadarMorlese, John FYusuf, SarahXie, YitingSchallhorn, BobGruen, David2024-07-102024-07-102023-12-12Bhatia BS, Morlese JF, Yusuf S, Xie Y, Schallhorn B, Gruen D. A real-world evaluation of the diagnostic accuracy of radiologists using positive predictive values verified from deep learning and natural language processing chest algorithms deployed retrospectively. BJR Open. 2023 Dec 12;6(1):tzad009. doi: 10.1093/bjro/tzad009.2513-987810.1093/bjro/tzad00938352188http://hdl.handle.net/20.500.14200/5089Objectives: This diagnostic study assessed the accuracy of radiologists retrospectively, using the deep learning and natural language processing chest algorithms implemented in Clinical Review version 3.2 for: pneumothorax, rib fractures in digital chest X-ray radiographs (CXR); aortic aneurysm, pulmonary nodules, emphysema, and pulmonary embolism in CT images. Methods: The study design was double-blind (artificial intelligence [AI] algorithms and humans), retrospective, non-interventional, and at a single NHS Trust. Adult patients (≥18 years old) scheduled for CXR and CT were invited to enroll as participants through an opt-out process. Reports and images were de-identified, processed retrospectively, and AI-flagged discrepant findings were assigned to two lead radiologists, each blinded to patient identifiers and original radiologist. The radiologist's findings for each clinical condition were tallied as a verified discrepancy (true positive) or not (false positive). Results: The missed findings were: 0.02% rib fractures, 0.51% aortic aneurysm, 0.32% pulmonary nodules, 0.92% emphysema, and 0.28% pulmonary embolism. The positive predictive values (PPVs) were: pneumothorax (0%), rib fractures (5.6%), aortic dilatation (43.2%), pulmonary emphysema (46.0%), pulmonary embolus (11.5%), and pulmonary nodules (9.2%). The PPV for pneumothorax was nil owing to lack of available studies that were analysed for outpatient activity. Conclusions: The number of missed findings was far less than generally predicted. The chest algorithms deployed retrospectively were a useful quality tool and AI augmented the radiologists' workflow. Advances in knowledge: The diagnostic accuracy of our radiologists generated missed findings of 0.02% for rib fractures CXR, 0.51% for aortic dilatation, 0.32% for pulmonary nodule, 0.92% for pulmonary emphysema, and 0.28% for pulmonary embolism for CT studies, all retrospectively evaluated with AI used as a quality tool to flag potential missed findings. It is important to account for prevalence of these chest conditions in clinical context and use appropriate clinical thresholds for decision-making, not relying solely on AI.enRadiologyA real-world evaluation of the diagnostic accuracy of radiologists using positive predictive values verified from deep learning and natural language processing chest algorithms deployed retrospectivelyArticleBJR Open