Conclusions
While ChatGPT holds promise in clinical pharmacy practice as a supplementary tool, the ability of ChatGPT to handle complex problems needs further improvement and refinement.
Main text
Introduction
In November 2022, OpenAI launched chat generative pre-trained transformer (ChatGPT) that evolved from artificial intelligence (AI) large language models (LLMs). It was designed to simulate human conversation in response to prompts or questions based on the context of input text1,2. As a transformative and disruptive technology, ChatGPT presents both promise and challenge for the scientific community, particularly in healthcare 3-5. Despite limited access to medical data, ChatGPT has achieved scores equivalent to a third-year undergraduate medical student on the United States Medical Licensing Examination (USMLE)6. Meanwhile, another study outlined ChatGPT’s ability to generate formal discharge summaries in a matter of seconds following brief patient profile prompts7. Moreover, ChatGPT has potential applications in enhancing radiological decision-making, risk prediction for various ailments, and drug discovery processes8,9. Beyond its technological impact, ChatGPT offers medical information to patients and may improve personalized healthcare across various clinical practices, including endocrinology10, hepatology11, cardiology12, anti-infective medicine13, obstetrics & gynecology14, and neurosurgery15.
Clinical pharmacists play a crucial role in ensuring patient safety, optimizing medication therapy, and providing patient-centered care16. They possess the skills and knowledge required to offer clinical pharmacy services to healthcare teams and patients17. In clinical pharmacy practice, AI-powered systems have demonstrated the potential to aid clinical pharmacists, as previously discussed18. The development of AI-powered apps and tools is primarily focused on prescription review19,20. Furthermore, the AI platform assists clinical pharmacists in drug counseling, thus reducing costs and medical utilization21. AI-powered tools based on neural network models have been developed to detect self-administration errors, allowing for tailored patient medication education on device techniques22. In adverse drug reaction (ADR) monitoring, AI employs big data from hospital information systems to intelligently screen for adverse drug reactions/events through electronic medical record analysis23. With increasing workloads and growing demands on clinical pharmacists, AI-driven support tools like ChatGPT could potentially enhance efficiency, decision-making, and patient care in clinical settings. However, the accuracy, reliability, and real-world applicability of ChatGPT in clinical settings have not been thoroughly assessed.
This study aims to evaluate the performance of ChatGPT in various aspects of clinical pharmacy practice including prescription review, patient medication education, ADR recognition, ADR causality assessment, and drug counseling. Quantitative and qualitative methods were both used to compare the accuracy and quality of ChatGPT’s responses to those of the clinical pharmacist. Our findings will offer valuable insights into ChatGPT’s strengths and limitations in clinical pharmacy practice.
Methods
ChatGPT
“ChatGPT Mar 23 Version” was used in this study, which is the latest version of ChatGPT released on March 23, 202324.
Data source
From the practical cases of clinical pharmacists and assessment questions for resident pharmacists, twenty-five questions were chosen to evaluate the performance of ChatGPT in prescription review, patient medication education, adverse drug reaction (ADR) recognition, ADR causality assessment, and drug counseling. Each aspect had five questions that covered the main content of the corresponding aspect. In the prescription review, verbal descriptions of patient demographic information, diagnoses, and medication details recorded in prescriptions were extracted and submitted to ChatGPT. In the clinical setting, according to the WHO-UMC system for standardized case causality assessment25, the evaluation process and results of the causality assessment of ADR have been formatted as a table, termed “the adverse drug reaction/event reporting forms”. The information that is necessary to determine the correlation was extracted from these forms submitted by clinical pharmacists. Then ChatGPT performed the correlation assessment according to that information.
Quantitative and qualitative evaluation methods
Each question was separately input for ChatGPT by utilizing the ”New Chat” function11. The same set of questions was also presented to occupational clinical pharmacists. All the questions and answers were documented. Five clinical pharmacy professionals reviewed these answers and rated their accuracy on a scale of 0 (completely incorrect) to 10 (comprehensive)26. The professionals rated the answers independently and were unaware of which answers were provided by ChatGPT or the clinical pharmacist. After reviewing all answers, ChatGPT’s capabilities and limitations in clinical pharmacy practice were also qualitatively assessed according to the main content of each aspect.
Statistical analysis
The score for each question was computed as the mean score evaluated by five professional reviewers. Subsequently, the score for each aspect of clinical pharmacy was determined by averaging the corresponding five questions. An unpaired two-tailed Student’s t-test was conducted to compare the mean scores of ChatGPT and the clinical pharmacist, with a p-value of <0.05 considered to be significant. GraphPad Prism software (version 9.0) was used for these analyses. Additionally, the interrater reliability was assessed through intraclass correlation coefficients (ICCs) 27,28. ICC estimates and their 95% confidence intervals were calculated using the ”irr” package (version 0.84.1) of R studio (powered by R version 4.2.0) based on a single-rating, absolute-agreement, two-way random-effects model. An ICC exceeding 0.70 is indicative of excellent reliability29.
Results
The quantitative evaluation of the performance of ChatGPT in clinical pharmacy
Quantitative results indicated that ChatGPT’s accuracy in answering questions varied across different domains (Table 1). ChatGPT excelled in drug counseling, achieving an equivalent score to the clinical pharmacist (ChatGPT: 8.76 vs. pharmacist: 9.52, p-value=0.0596). In other areas, ChatGPT’s performance was significantly weaker, including prescription review, patient medication education, ADR recognition, and ADR causality assessment. As depicted in Figure 1, ChatGPT’s performance was average in patient medication education, scoring between 5 and 8, and no response was rated as completely incorrect. In prescription review, ADR recognition, and ADR causality assessment, scores of ChatGPT’s answers displayed high variability, with some questions scoring 0. The evaluation results of all questions are recorded in Table S1 and Table S2 of the Appendices . The ICC value for the interrater reliability was 0.93 (95% CI 0.90-0.96).
The qualitative analysis of the performance of ChatGPT in clinical pharmacy
ChatGPT’s capabilities and limitations in clinical pharmacy practice were qualitatively assessed. Results are presented in Table 2. In general, ChatGPT’s responses exhibited coherent spelling and grammar. The answers provided by the model followed a consistent pattern, characterized by clarity, organization, and informativeness.
In terms of drug counseling, ChatGPT accurately provided comprehensive drug-related information, answering all questions. It did not take into consideration the patients’ real-life circumstances. For instance, while advising on reducing soy milk’s impact on Eltroxin at breakfast, ChatGPT suggested waiting at least four hours after taking Eltroxin before consuming soy milk, which was an impractical solution in real life. It was more feasible of the clinical pharmacist’s recommendation to adjust the medication timing to four hours after dinner. Furthermore, when the patients expressed concerns about incorrect medication, the clinical pharmacist paid attention to their emotions.
In prescription appropriateness review, ChatGPT successfully reviewed prescriptions with fewer medication-related issues but struggled with complex prescriptions. It performed poorly in identifying drug-related problems concerning traditional Chinese medicine. For example, it was unable to detect drug duplication by using Ganmao Qingre granule, banlangen granule, and lanqin oral liquid, and mistook them as antibiotics. It also failed to identify that glycyrrhizic acid diammonium enteric-coated capsules lacked a corresponding diagnosis and were contraindicated in patients with hypertension. ChatGPT was able to identify inappropriate dosage and frequency, but it did not provide specific adjustment advice.
For patient medication education, ChatGPT provided a well-organized and detailed list of therapeutic indications, dosing regimens, and common adverse reactions for each medication. Its answers were sometimes overly verbose and specialized. In comparison, the clinical pharmacist used layperson’s terms to warn patients of common and life-threatening adverse reactions. For example, when mentioned ticagrelor’s side effects of bleeding, the clinical pharmacist guided patients to monitor typical signs including bruises, petechiae, hemoptysis, and black stools. In addition, ChatGPT did not guide patients on necessary monitoring items and lifestyle changes.
Regarding ADR recognition, ChatGPT accurately defined ADR but misunderstood its connotation, as demonstrated by mistaking events caused by non-drug and substandard drugs as ADRs. ChatGPT correctly interpreted clinical indicators and identified simple ADRs but struggled with complex cases. When assessing ADR causality, ChatGPT extracted key information from cases and analyzed causality using WHO criteria. It excelled in analyzing temporal relationships and previous knowledge but often incorrectly assessed drug dechallenge, rechallenge, and alternative causes, leading to misjudgments. Finally, ChatGPT tended to classify ADR causality as possible. All the questions and answers were reported in the Appendices .
Discussion
Our study provides a comprehensive evaluation of ChatGPT’s performance in various aspects of clinical pharmacy practice compared to that of the clinical pharmacist. Quantitative evaluation results revealed that ChatGPT excelled in drug counseling, achieving a comparable score to the clinical pharmacist. Its performance in prescription review, patient medication education, ADR recognition, and ADR causality assessment was significantly weaker. The qualitative analysis highlighted the key strengths and limitations of ChatGPT in clinical settings.
Owing to training in extensive text datasets, ChatGPT can comprehend the context and generate human-like responses1. This ability enables it to perform well in drug counseling and medication education. Therefore, ChatGPT can be used as a pharmacy encyclopedia for patients and aid clinical pharmacists in literature search and synthesis to enhance work efficiency in the future. Several limitations of ChatGPT should be noted. First, ChatGPT struggled with drug-related problems involving traditional Chinese medicine, likely due to constraints in its training data. It lacks a medical-specific database and is based on the information before 2021. Seghier ML pointed out that ChatGPT faced language barriers, with its non-English responses being notably inferior30. In particular, it is not open to the public in China, potentially leading to informational gaps regarding Chinese patent medicines. Second, ChatGPT often neglected patients’ real-life circumstances and lacks focus, indicating a deficiency in situational awareness13. This issue may be mitigated by incorporating clinical pharmacy professional annotations and feedback into ChatGPT’s responses31.
ChatGPT’s performance in prescription review, ADR recognition, and ADR causality assessment was suboptimal. The model tended to manage simple cases effectively but struggled with complex ones, suggesting difficulties in handling intricate instructions. While ChatGPT exhibited some reasoning capabilities, they remained limited6,14. It misunderstood the implications of the ADR concept and misjudged ADR causality. From these analyses, it could be concluded that ChatGPT lacks human-like deep understanding and adaptive application in complex real-world situations32.
In this study, ChatGPT’s ability to provide emotional support was not specifically assessed, but evidence suggested it may not be proactive in offering such support. For instance, when patients expressed concerns about incorrect medication, the clinical pharmacist addressed their emotions first, whereas ChatGPT provided information only. In contrast, another study investigating ChatGPT in hepatic disease management discovered that it could supply empathetic advice to patients and caregivers11.
Prior to implementing ChatGPT in clinical pharmacy practice, ethical issues must be carefully considered. Bias risks are present since AI algorithms are often trained on biased datasets. OpenAI even acknowledges that ChatGPT’s output can perpetuate sexist stereotypes33. Additionally, the data used for ChatGPT training lacks transparency, affecting reliability. ChatGPT also occasionally generates plausible-sounding but incorrect or nonsensical answers, called AI hallucinations, posing significant risks to patient safety7,34. As a result, legal frameworks and accountability should be established for these errors35. Italy has banned ChatGPT due to privacy concerns, necessitating attention towards data governance and privacy7,36.
This study has several limitations. Firstly, the evaluation focused on specific aspects of clinical pharmacy practice, not encompassing all potential applications. Secondly, the limited number of questions and prompts may not fully capture ChatGPT’s capabilities and limitations. Lastly, the GPT-3.5 model instead of the more advanced GPT-4 model was employed in this study. However, this is due to the GPT-3.5 model’s unrestricted access to the public. The GPT-4 model is only available for paid ChatGPT Plus subscribers on a limited basis, thereby reducing its accessibility to patients.
In conclusion, ChatGPT shows promise in specific domains such as drug counseling, but its overall performance across various aspects of clinical pharmacy practice is significantly weaker compared to the clinical pharmacist. These findings indicate that ChatGPT has the potential as a supplementary tool in clinical settings. Further enhancements and refinements to the ChatGPT system, particularly in expanding medicine-specific datasets and augmenting capabilities for advanced reasoning and complex instructions, will be crucial for optimizing its utility in clinical pharmacy practice.