Application of large language models in health education for patients with diabetic retinopathy

Authors: Gao Fei, Gao Xue, Shao Yan, Ren Xinjun, Liu Boshi, Jiao Mingfei, Li Xiaorong, Liu Juping
DOI: 10.3760/cma.j.cn115989-20240723-00207

Citation

Gao Fei, Gao Xue, Shao Yan, et al. Application of large language models in health education for patients with diabetic retinopathy[J]. Chin J Exp Ophthalmol, 2024, 42(12):1111-1118. DOI: 10.3760/cma.j.cn115989-20240723-00207.

ABSTRACT                  [Download PDF] [Read Full Text]

Objective  To evaluate the accuracy, completeness, and reproducibility of domestic open-source large language models (LLM) in diabetic retinopathy (DR) patient education, and to explore their potential as intelligent virtual assistants for DR patient education.

Methods  A total of 41 questions and answers related to the diagnosis and treatment of DR in five categories, namely risk factors, screening and examination, symptoms and staging, diagnosis, treatment and prognosis.All questions were repeated twice as a ” new dialogue” in the LLM, and all the answers were recorded.Three senior fundus physicians independently evaluated the answers on a 6-point Likert scale for accuracy and a 3-point Likert scale for completeness and repeatability, and for each answer, the evaluator was asked to make a recommendation between the LLM and the manual answers.Five questions were randomly selected to evaluate the three open source LLM, ERNIE Bot 3.5, Qwen and Kimi chat, and the LLM with the best overall performance was selected for further evaluation in the full question bank.

Results  Among the three LLM, Kimi chat had the best overall performance, Kimi chat performed best, with percentages of 6 for accuracy, 3 for completeness, and 3 for repeatability among the 5 questions at 90%, 90%, and 100%, respectively.For all questions answered, the number of words in manual replies was 106 (70, 202), which was significantly lower than 505 (386, 600) in Kimi chat ( Z=-7.866, P<0.001).There was no significant correlation between the number of Kimi chat replies and the accuracy score ( r s =-0.044, P=0.492), but it was positively correlated with the integrity score ( r s =0.239, P<0.001).The interclass correlation coefficient for accuracy and completeness scores were above 0.700 among three evaluators, with the highest agreement for repeatability at 0.853, followed by completeness of the first response at 0.771.The proportion of responses ≥5 points for accuracy was 87.0%(214/246), the proportion ≥2 points for completeness was 98.0%(241/246), and the proportion higher than 70% for repeatability was 78.5%(193/246).Kimi chat excelled in answering basic questions about the disease such as disease definition, staging, frequency of screening, and common risk factors, but performed poorly on questions involving treatment choices that require a doctor’s professional judgment.The proportion of evaluators choosing Kimi chat responses as superior was 69.5% (171/246), and the reasons for non-selection included lack of characteristic answers, inclusion of too much irrelevant information, and lack of responses to questions requiring a high degree of medical expertise.

Conclusions  Kimi chat answers DR-related diagnostic questions in a detailed and well-organized manner, with a high degree of accuracy, completeness and reproducibility.

Diabetic retinopathy;Health education;Deep learning;Large language models;Evaluation

Authors Info & Affiliations 

Gao Fei
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Gao Xue
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Shao Yan
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Ren Xinjun
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Liu Boshi
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Jiao Mingfei
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Li Xiaorong
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
Liu Juping
Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin 300384, China
(Read 12 times, 1 visits today)