Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review

dc.contributor.affiliationUniversity Hospitals Birmingham NHS Foundation Trust; University of Birmingham; University Hospital Southampton NHS Foundation Trust; The Royal Wolverhampton NHS Trust; King's College London; Birmingham Women's and Children's NHS Foundation Trust; University Hospitals of Leicester NHS Trust; NIHR Blood and Transplant Research Unit (BTRU); Massachusetts Institute of Technology; The Hospital for Sick Children; SickKids Research Institute; University of Cambridge; Roche Diagnostics; University College London; PATH; Wellcome Trust; Independent Cancer Patients Voice; Oxford University Hospitals NHS Foundation Trust; Moorfields Eye Hospital; NIHR Biomedical Research Centreen_US
dc.contributor.authorAlderman, Joseph E
dc.contributor.authorCharalambides, Maria
dc.contributor.authorSachdeva, Gagandeep
dc.contributor.authorLaws, Elinor
dc.contributor.authorPalmer, Joanne
dc.contributor.authorLee, Elsa
dc.contributor.authorMenon, Vaishnavi
dc.contributor.authorMalik, Qasim
dc.contributor.authorVadera, Sonam
dc.contributor.authorCalvert, Melanie
dc.contributor.authorGhassemi, Marzyeh
dc.contributor.authorMcCradden, Melissa D
dc.contributor.authorOrdish, Johan
dc.contributor.authorMateen, Bilal
dc.contributor.authorSummers, Charlotte
dc.contributor.authorGath, Jacqui
dc.contributor.authorMatin, Rubeta N
dc.contributor.authorDenniston, Alastair K
dc.contributor.authorLiu, Xiaoxuan
dc.contributor.departmentResearch and Developmenten_US
dc.contributor.departmentOphthalmologyen_US
dc.contributor.roleAdmin and Clericalen_US
dc.contributor.roleMedical and Dentalen_US
dc.contributor.trustauthorVadera, Sonam
dc.contributor.trustauthorDenniston, Alastair
dc.date.accessioned2024-12-04T12:57:34Z
dc.date.available2024-12-04T12:57:34Z
dc.date.issued2024-10-23
dc.description.abstractDuring the COVID-19 pandemic, artificial intelligence (AI) models were created to address health-care resource constraints. Previous research shows that health-care datasets often have limitations, leading to biased AI technologies. This systematic review assessed datasets used for AI development during the pandemic, identifying several deficiencies. Datasets were identified by screening articles from MEDLINE and using Google Dataset Search. 192 datasets were analysed for metadata completeness, composition, data accessibility, and ethical considerations. Findings revealed substantial gaps: only 48% of datasets documented individuals' country of origin, 43% reported age, and under 25% included sex, gender, race, or ethnicity. Information on data labelling, ethical review, or consent was frequently missing. Many datasets reused data with inadequate traceability. Notably, historical paediatric chest x-rays appeared in some datasets without acknowledgment. These deficiencies highlight the need for better data quality and transparent documentation to lessen the risk that biased AI models are developed in future health emergencies.en_US
dc.identifier.citationAlderman JE, Charalambides M, Sachdeva G, Laws E, Palmer J, Lee E, Menon V, Malik Q, Vadera S, Calvert M, Ghassemi M, McCradden MD, Ordish J, Mateen B, Summers C, Gath J, Matin RN, Denniston AK, Liu X. Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review. Lancet Digit Health. 2024 Nov;6(11):e827-e847. doi: 10.1016/S2589-7500(24)00146-8.en_US
dc.identifier.doi10.1016/S2589-7500(24)00146-8
dc.identifier.eissn2589-7500
dc.identifier.pmid39455195
dc.identifier.urihttp://hdl.handle.net/20.500.14200/6680
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.relation.urlhttps://www.thelancet.com/journals/landig/homeen_US
dc.rightsCopyright © 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license. Published by Elsevier Ltd.. All rights reserved.
dc.source.beginpagee827
dc.source.countryEngland
dc.source.endpagee847
dc.source.issue11
dc.source.journaltitleThe Lancet Digital Healthen_US
dc.source.volume6
dc.subjectPatients. Primary care. Medical profession. Forensic medicineen_US
dc.subjectPublic health. Health statistics. Occupational health. Health educationen_US
dc.subjectHealth services. Managementen_US
dc.titleRevealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic reviewen_US
dc.typeArticleen_US
dspace.entity.typePublication
oa.grant.openaccessnaen_US
rioxxterms.versionNAen_US
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.7 KB
Format:
Item-specific license agreed upon to submission
Description: