Application of large language models in health education for patients with diabetic retinopathy

DOI: 10.3760/cma.j.cn115989-20240723-00207

Published: 2024-12-10

Citation

Gao Fei, Gao Xue, Shao Yan, et al. Application of large language models in health education for patients with diabetic retinopathy[J]. Chin J Exp Ophthalmol, 2024, 42(12):1111-1118. DOI: 10.3760/cma.j.cn115989-20240723-00207.

ABSTRACT [Download PDF] [Read Full Text]

Objective To evaluate the accuracy, completeness, and reproducibility of domestic open-source large language models (LLM) in diabetic retinopathy (DR) patient education, and to explore their potential as intelligent virtual assistants for DR patient education.

Methods A total of 41 questions and answers related to the diagnosis and treatment of DR in five categories, namely risk factors, screening and examination, symptoms and staging, diagnosis, treatment and prognosis.All questions were repeated twice as a ” new dialogue” in the LLM, and all the answers were recorded.Three senior fundus physicians independently evaluated the answers on a 6-point Likert scale for accuracy and a 3-point Likert scale for completeness and repeatability, and for each answer, the evaluator was asked to make a recommendation between the LLM and the manual answers.Five questions were randomly selected to evaluate the three open source LLM, ERNIE Bot 3.5, Qwen and Kimi chat, and the LLM with the best overall performance was selected for further evaluation in the full question bank.

Results Among the three LLM, Kimi chat had the best overall performance, Kimi chat performed best, with percentages of 6 for accuracy, 3 for completeness, and 3 for repeatability among the 5 questions at 90%, 90%, and 100%, respectively.For all questions answered, the number of words in manual replies was 106 (70, 202), which was significantly lower than 505 (386, 600) in Kimi chat ( Z=-7.866, P<0.001).There was no significant correlation between the number of Kimi chat replies and the accuracy score ( r _s=-0.044, P=0.492), but it was positively correlated with the integrity score ( r _s=0.239, P<0.001).The interclass correlation coefficient for accuracy and completeness scores were above 0.700 among three evaluators, with the highest agreement for repeatability at 0.853, followed by completeness of the first response at 0.771.The proportion of responses ≥5 points for accuracy was 87.0%(214/246), the proportion ≥2 points for completeness was 98.0%(241/246), and the proportion higher than 70% for repeatability was 78.5%(193/246).Kimi chat excelled in answering basic questions about the disease such as disease definition, staging, frequency of screening, and common risk factors, but performed poorly on questions involving treatment choices that require a doctor’s professional judgment.The proportion of evaluators choosing Kimi chat responses as superior was 69.5% (171/246), and the reasons for non-selection included lack of characteristic answers, inclusion of too much irrelevant information, and lack of responses to questions requiring a high degree of medical expertise.

Conclusions Kimi chat answers DR-related diagnostic questions in a detailed and well-organized manner, with a high degree of accuracy, completeness and reproducibility.

KEYWORDS：

Diabetic retinopathy;Health education;Deep learning;Large language models;Evaluation

Authors Info & Affiliations

Gao Fei

Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China

Gao Xue

Shao Yan

Ren Xinjun

Liu Boshi

Jiao Mingfei

Li Xiaorong

Liu Juping

(Read 71 times, 1 visits today)

2020s	2010s	2000s	1990s
2020	2019	2009	1999
2021	2018	2008	1998
2022	2017	2007	1997
2023	2016	2006	1996
2024	2015	2005	1995
2025	2014	2004	1994
	2013	2003	1993
	2012	2002	1992
	2011	2001	1991
	2010	2000	1990

CJEO Journal

Citation

ABSTRACT [Download PDF] [Read Full Text]

KEYWORDS：

Authors Info & Affiliations