Artificial intelligence chatbots as sources of patient education material for cataract surgery : ChatGPT-4 versus Google Bard
; Ng, Benjamin ; Logeswaran, Abison ; Loizou, Constantinos ; Cheong, Ryan Chin Taw ; Gireesh, Prasanth ; ; Chong, Yu Jeat
Ng, Benjamin
Logeswaran, Abison
Loizou, Constantinos
Cheong, Ryan Chin Taw
Gireesh, Prasanth
Chong, Yu Jeat
Citations
Altmetric:
Affiliation
Royal Free London NHS Foundation Trust; University of Oxford Christ Church; Moorfields Eye Hospital NHS Foundation Trust; Sandwell and West Birmingham NHS Trust; et al.
Other Contributors
Publication date
2024-10-17
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Objective: To conduct a head-to-head comparative analysis of cataract surgery patient education material generated by Chat Generative Pre-trained Transformer (ChatGPT-4) and Google Bard. Methods and analysis: 98 frequently asked questions on cataract surgery in English were taken in November 2023 from 5 trustworthy online patient information resources. 59 of these were curated (20 augmented for clarity and 39 duplicates excluded) and categorised into 3 domains: condition (n=15), preparation for surgery (n=21) and recovery after surgery (n=23). They were formulated into input prompts with 'prompt engineering'. Using the Patient Education Materials Assessment Tool-Printable (PEMAT-P) Auto-Scoring Form, four ophthalmologists independently graded ChatGPT-4 and Google Bard responses. The readability of responses was evaluated using a Flesch-Kincaid calculator. Responses were also subjectively examined for any inaccurate or harmful information. Results: Google Bard had a higher mean overall Flesch-Kincaid Level (8.02) compared with ChatGPT-4 (5.75) (p<0.001), also noted across all three domains. ChatGPT-4 had a higher overall PEMAT-P understandability score (85.8%) in comparison to Google Bard (80.9%) (p<0.001), which was also noted in the 'preparation for cataract surgery' (85.2% vs 75.7%; p<0.001) and 'recovery after cataract surgery' (86.5% vs 82.3%; p=0.004) domains. There was no statistically significant difference in overall (42.5% vs 44.2%; p=0.344) or individual domain actionability scores (p>0.10). None of the generated material contained dangerous information. Conclusion: In comparison to Google Bard, ChatGPT-4 fared better overall, scoring higher on the PEMAT-P understandability scale and exhibiting more faithfulness to the prompt engineering instruction. Since input prompts might vary from real-world patient searches, follow-up studies with patient participation are required.
Citation
Azzopardi M, Ng B, Logeswaran A, Loizou C, Cheong RCT, Gireesh P, Ting DSJ, Chong YJ. Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard. BMJ Open Ophthalmol. 2024 Oct 17;9(1):e001824. doi: 10.1136/bmjophth-2024-001824
Type
Article