Diversity, inclusivity and traceability of mammography datasets used in development of Artificial Intelligence technologies: a systematic review.
Name:
Publisher version
View Source
Access full-text PDFOpen Access
View Source
Check access options
Check access options
Author
Laws, ElinorPalmer, Joanne
Alderman, Joseph
Sharma, Ojasvi
Ngai, Victoria
Salisbury, Thomas
Hussain, Gulmeena
Ahmed, Sumiya
Sachdeva, Gagandeep
Vadera, Sonam
Mateen, Bilal
Matin, Rubeta
Kuku, Stephanie
Calvert, Melanie
Gath, Jacqui
Treanor, Darren
McCradden, Melissa
Mackintosh, Maxine
Gichoya, Judy
Trivedi, Hari
Denniston, Alastair K
Liu, Xiaoxuan
Publication date
2024-11-26
Metadata
Show full item recordAbstract
Purpose: There are many radiological datasets for breast cancer, some which have supported the development of AI medical devices for breast cancer screening and image classification. This review aims to identify mammography datasets (including digitised screen film mammography, 2D digital mammography and digital breast tomosynthesis) used in the development of AI technologies and present their characteristics, including their transparency of documentation, content, populations included and accessibility. Materials and methods: MEDLINE and Google Dataset searches identified studies describing AI technology development and referencing breast imaging datasets up to June 2024. The characteristics of each dataset are summarised. In particular, the accompanying documentation was reviewed with a focus on diversity and inclusion of populations represented within each dataset. Results: 254 datasets were referenced in the literature search, 190 were privately held, 36 had barriers which prevented access, and 28 were accessible. Most datasets originated from Europe, East Asia and North America. There was poor reporting of individuals' attributes: 32 (12 %) datasets reported race or ethnicity; 76 (30 %) reported female/male categories with only one dataset explicitly defining whether these categories represented sex or gender attributes. Conclusion: Through this review, we demonstrate gaps in the data landscape for mammography, highlighting poor representation globally. To ensure datasets in breast imaging have maximum utility for researchers, their characteristics should be documented and limitations of datasets, such as their representativeness of populations and settings, should inform scientific efforts to translate data-driven insights into technologies and discoveries.Citation
Laws E, Palmer J, Alderman J, Sharma O, Ngai V, Salisbury T, Hussain G, Ahmed S, Sachdeva G, Vadera S, Mateen B, Matin R, Kuku S, Calvert M, Gath J, Treanor D, McCradden M, Mackintosh M, Gichoya J, Trivedi H, Denniston AK, Liu X. Diversity, inclusivity and traceability of mammography datasets used in development of Artificial Intelligence technologies: a systematic review. Clin Imaging. 2025 Feb;118:110369. doi: 10.1016/j.clinimag.2024.110369. Epub 2024 Nov 26.Type
ArticleOther
Additional Links
https://www.sciencedirect.com/journal/clinical-imagingPMID
39616879Journal
Clinical imagingPublisher
Elsevierae974a485f413a2113503eed53cd6c53
10.1016/j.clinimag.2024.110369