Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review

Alderman, Joseph E; Charalambides, Maria; Sachdeva, Gagandeep; Laws, Elinor; Palmer, Joanne; Lee, Elsa; Menon, Vaishnavi; Malik, Qasim; Vadera, Sonam; Calvert, Melanie; Ghassemi, Marzyeh; McCradden, Melissa D; Ordish, Johan; Mateen, Bilal; Summers, Charlotte; Gath, Jacqui; Matin, Rubeta N; Denniston, Alastair K; Liu, Xiaoxuan

Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review

dc.contributor.affiliation	University Hospitals Birmingham NHS Foundation Trust; University of Birmingham; University Hospital Southampton NHS Foundation Trust; The Royal Wolverhampton NHS Trust; King's College London; Birmingham Women's and Children's NHS Foundation Trust; University Hospitals of Leicester NHS Trust; NIHR Blood and Transplant Research Unit (BTRU); Massachusetts Institute of Technology; The Hospital for Sick Children; SickKids Research Institute; University of Cambridge; Roche Diagnostics; University College London; PATH; Wellcome Trust; Independent Cancer Patients Voice; Oxford University Hospitals NHS Foundation Trust; Moorfields Eye Hospital; NIHR Biomedical Research Centre	en_US
dc.contributor.author	Alderman, Joseph E
dc.contributor.author	Charalambides, Maria
dc.contributor.author	Sachdeva, Gagandeep
dc.contributor.author	Laws, Elinor
dc.contributor.author	Palmer, Joanne
dc.contributor.author	Lee, Elsa
dc.contributor.author	Menon, Vaishnavi
dc.contributor.author	Malik, Qasim
dc.contributor.author	Vadera, Sonam
dc.contributor.author	Calvert, Melanie
dc.contributor.author	Ghassemi, Marzyeh
dc.contributor.author	McCradden, Melissa D
dc.contributor.author	Ordish, Johan
dc.contributor.author	Mateen, Bilal
dc.contributor.author	Summers, Charlotte
dc.contributor.author	Gath, Jacqui
dc.contributor.author	Matin, Rubeta N
dc.contributor.author	Denniston, Alastair K
dc.contributor.author	Liu, Xiaoxuan
dc.contributor.department	Research and Development	en_US
dc.contributor.department	Ophthalmology	en_US
dc.contributor.role	Admin and Clerical	en_US
dc.contributor.role	Medical and Dental	en_US
dc.contributor.trustauthor	Vadera, Sonam
dc.contributor.trustauthor	Denniston, Alastair
dc.date.accessioned	2024-12-04T12:57:34Z
dc.date.available	2024-12-04T12:57:34Z
dc.date.issued	2024-10-23
dc.description.abstract	During the COVID-19 pandemic, artificial intelligence (AI) models were created to address health-care resource constraints. Previous research shows that health-care datasets often have limitations, leading to biased AI technologies. This systematic review assessed datasets used for AI development during the pandemic, identifying several deficiencies. Datasets were identified by screening articles from MEDLINE and using Google Dataset Search. 192 datasets were analysed for metadata completeness, composition, data accessibility, and ethical considerations. Findings revealed substantial gaps: only 48% of datasets documented individuals' country of origin, 43% reported age, and under 25% included sex, gender, race, or ethnicity. Information on data labelling, ethical review, or consent was frequently missing. Many datasets reused data with inadequate traceability. Notably, historical paediatric chest x-rays appeared in some datasets without acknowledgment. These deficiencies highlight the need for better data quality and transparent documentation to lessen the risk that biased AI models are developed in future health emergencies.	en_US
dc.identifier.citation	Alderman JE, Charalambides M, Sachdeva G, Laws E, Palmer J, Lee E, Menon V, Malik Q, Vadera S, Calvert M, Ghassemi M, McCradden MD, Ordish J, Mateen B, Summers C, Gath J, Matin RN, Denniston AK, Liu X. Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review. Lancet Digit Health. 2024 Nov;6(11):e827-e847. doi: 10.1016/S2589-7500(24)00146-8.	en_US
dc.identifier.doi	10.1016/S2589-7500(24)00146-8
dc.identifier.eissn	2589-7500
dc.identifier.pmid	39455195
dc.identifier.uri	http://hdl.handle.net/20.500.14200/6680
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.relation.url	https://www.thelancet.com/journals/landig/home	en_US
dc.rights	Copyright © 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license. Published by Elsevier Ltd.. All rights reserved.
dc.source.beginpage	e827
dc.source.country	England
dc.source.endpage	e847
dc.source.issue	11
dc.source.journaltitle	The Lancet Digital Health	en_US
dc.source.volume	6
dc.subject	Patients. Primary care. Medical profession. Forensic medicine	en_US
dc.subject	Public health. Health statistics. Occupational health. Health education	en_US
dc.subject	Health services. Management	en_US
dc.title	Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review	en_US
dc.type	Article	en_US
dspace.entity.type	Publication
oa.grant.openaccess	na	en_US
rioxxterms.version	NA	en_US

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.7 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Health Care Services