Conclusions
While ChatGPT holds promise in clinical pharmacy practice as a
supplementary tool, the ability of ChatGPT to handle complex problems
needs further improvement and refinement.
Main text
Introduction
In November 2022, OpenAI launched chat generative pre-trained
transformer (ChatGPT) that evolved from artificial intelligence (AI)
large language models (LLMs). It was designed to simulate human
conversation in response to prompts or questions based on the context of
input text1,2. As a transformative and disruptive
technology, ChatGPT presents both promise and challenge for the
scientific community, particularly in healthcare 3-5.
Despite limited access to medical
data, ChatGPT has achieved scores equivalent to a third-year
undergraduate medical student on the United States Medical Licensing
Examination (USMLE)6. Meanwhile, another study
outlined ChatGPT’s ability to generate formal discharge summaries in a
matter of seconds following brief patient profile
prompts7. Moreover, ChatGPT has potential applications
in enhancing radiological decision-making, risk prediction for various
ailments, and drug discovery processes8,9. Beyond its
technological impact, ChatGPT offers medical information to patients and
may improve personalized healthcare across various clinical practices,
including endocrinology10,
hepatology11, cardiology12,
anti-infective medicine13, obstetrics &
gynecology14, and neurosurgery15.
Clinical pharmacists play a crucial role in ensuring patient safety,
optimizing medication therapy, and providing patient-centered
care16. They possess the skills and knowledge required
to offer clinical pharmacy services to healthcare teams and
patients17. In clinical pharmacy practice, AI-powered
systems have demonstrated the potential to aid clinical pharmacists, as
previously discussed18. The development of AI-powered
apps and tools is primarily focused on prescription
review19,20. Furthermore, the AI platform assists
clinical pharmacists in drug counseling, thus reducing costs and medical
utilization21. AI-powered tools based on neural
network models have been developed to detect self-administration errors,
allowing for tailored patient medication education on device
techniques22. In adverse drug reaction (ADR)
monitoring, AI employs big data from hospital information systems to
intelligently screen for adverse drug reactions/events through
electronic medical record analysis23. With increasing
workloads and growing demands on clinical pharmacists, AI-driven support
tools like ChatGPT could potentially enhance efficiency,
decision-making, and patient care in clinical settings. However, the
accuracy, reliability, and real-world applicability of ChatGPT in
clinical settings have not been thoroughly assessed.
This study aims to evaluate the performance of ChatGPT in various
aspects of clinical pharmacy practice including prescription review,
patient medication education, ADR recognition, ADR causality assessment,
and drug counseling. Quantitative and qualitative methods were both used
to compare the accuracy and quality of ChatGPT’s responses to those of
the clinical pharmacist. Our findings will offer valuable insights into
ChatGPT’s strengths and limitations in clinical pharmacy practice.
Methods
ChatGPT
“ChatGPT Mar 23 Version” was used in this study, which is the latest
version of ChatGPT released on March 23, 202324.
Data source
From the practical cases of clinical pharmacists and assessment
questions for resident pharmacists, twenty-five questions were chosen to
evaluate the performance of ChatGPT in prescription review, patient
medication education, adverse drug reaction (ADR) recognition, ADR
causality assessment, and drug counseling. Each aspect had five
questions that covered the main content of the corresponding aspect. In
the prescription review, verbal descriptions of patient demographic
information, diagnoses, and medication details recorded in prescriptions
were extracted and submitted to ChatGPT. In the clinical setting,
according to the WHO-UMC system for standardized case causality
assessment25, the evaluation process and results of
the causality assessment of ADR have been formatted as a table, termed
“the adverse drug reaction/event reporting forms”. The information
that is necessary to determine the correlation was extracted from these
forms submitted by clinical pharmacists. Then ChatGPT performed the
correlation assessment according to that information.
Quantitative and qualitative evaluation methods
Each question was separately input for ChatGPT by utilizing the ”New
Chat” function11. The same set of questions was also
presented to occupational clinical pharmacists. All the questions and
answers were documented. Five clinical pharmacy professionals reviewed
these answers and rated their accuracy on a scale of 0 (completely
incorrect) to 10 (comprehensive)26. The professionals
rated the answers independently and were unaware of which answers were
provided by ChatGPT or the clinical pharmacist. After reviewing all
answers, ChatGPT’s capabilities and limitations in clinical pharmacy
practice were also qualitatively assessed according to the main content
of each aspect.
Statistical analysis
The score for each question was computed as the mean score evaluated by
five professional reviewers. Subsequently, the score for each aspect of
clinical pharmacy was determined by averaging the corresponding five
questions. An unpaired two-tailed Student’s t-test was conducted to
compare the mean scores of ChatGPT and the clinical pharmacist, with a
p-value of <0.05 considered to be significant. GraphPad Prism
software (version 9.0) was used for these analyses. Additionally, the
interrater reliability was assessed through intraclass correlation
coefficients (ICCs) 27,28. ICC estimates and their
95% confidence intervals were calculated using the ”irr” package
(version 0.84.1) of R studio (powered by R version 4.2.0) based on a
single-rating, absolute-agreement, two-way random-effects model. An ICC
exceeding 0.70 is indicative of excellent
reliability29.
Results
The quantitative evaluation of the performance of ChatGPT in clinical
pharmacy
Quantitative results indicated that ChatGPT’s accuracy in answering
questions varied across different domains (Table 1). ChatGPT excelled in
drug counseling, achieving an equivalent score to the clinical
pharmacist (ChatGPT: 8.76 vs. pharmacist: 9.52, p-value=0.0596). In
other areas, ChatGPT’s performance was significantly weaker, including
prescription review, patient medication education, ADR recognition, and
ADR causality assessment. As depicted in Figure 1, ChatGPT’s performance
was average in patient medication education, scoring between 5 and 8,
and no response was rated as completely incorrect. In prescription
review, ADR recognition, and ADR causality assessment, scores of
ChatGPT’s answers displayed high variability, with some questions
scoring 0. The evaluation results of all questions are recorded in Table
S1 and Table S2 of the Appendices . The ICC value for the
interrater reliability was 0.93 (95% CI 0.90-0.96).
The qualitative analysis of the performance of ChatGPT in clinical
pharmacy
ChatGPT’s capabilities and limitations in clinical pharmacy practice
were qualitatively assessed. Results are presented in Table 2. In
general, ChatGPT’s responses exhibited coherent spelling and grammar.
The answers provided by the model followed a consistent pattern,
characterized by clarity, organization, and informativeness.
In terms of drug counseling, ChatGPT accurately provided comprehensive
drug-related information, answering all questions. It did not take into
consideration the patients’ real-life circumstances. For instance, while
advising on reducing soy milk’s impact on Eltroxin at breakfast, ChatGPT
suggested waiting at least four hours after taking Eltroxin before
consuming soy milk, which was an impractical solution in real life. It
was more feasible of the clinical pharmacist’s recommendation to adjust
the medication timing to four hours after dinner. Furthermore, when the
patients expressed concerns about incorrect medication, the clinical
pharmacist paid attention to their emotions.
In prescription appropriateness review, ChatGPT successfully reviewed
prescriptions with fewer medication-related issues but struggled with
complex prescriptions. It performed poorly in identifying drug-related
problems concerning traditional Chinese medicine. For example, it was
unable to detect drug duplication by using Ganmao Qingre granule,
banlangen granule, and lanqin oral liquid, and mistook them as
antibiotics. It also failed to identify that glycyrrhizic acid
diammonium enteric-coated capsules lacked a corresponding diagnosis and
were contraindicated in patients with hypertension. ChatGPT was able to
identify inappropriate dosage and frequency, but it did not provide
specific adjustment advice.
For patient medication education, ChatGPT provided a well-organized and
detailed list of therapeutic indications, dosing regimens, and common
adverse reactions for each medication. Its answers were sometimes overly
verbose and specialized. In comparison, the clinical pharmacist used
layperson’s terms to warn patients of common and life-threatening
adverse reactions. For example, when mentioned ticagrelor’s side effects
of bleeding, the clinical pharmacist guided patients to monitor typical
signs including bruises, petechiae, hemoptysis, and black stools. In
addition, ChatGPT did not guide patients on necessary monitoring items
and lifestyle changes.
Regarding ADR recognition, ChatGPT accurately defined ADR but
misunderstood its connotation, as demonstrated by mistaking events
caused by non-drug and substandard drugs as ADRs. ChatGPT correctly
interpreted clinical indicators and identified simple ADRs but struggled
with complex cases. When assessing ADR causality, ChatGPT extracted key
information from cases and analyzed causality using WHO criteria. It
excelled in analyzing temporal relationships and previous knowledge but
often incorrectly assessed drug dechallenge, rechallenge, and
alternative causes, leading to misjudgments. Finally, ChatGPT tended to
classify ADR causality as possible. All the questions and answers were
reported in the Appendices .
Discussion
Our study provides a comprehensive evaluation of ChatGPT’s performance
in various aspects of clinical pharmacy practice compared to that of the
clinical pharmacist. Quantitative evaluation results revealed that
ChatGPT excelled in drug counseling, achieving a comparable score to the
clinical pharmacist. Its performance in prescription review, patient
medication education, ADR recognition, and ADR causality assessment was
significantly weaker. The qualitative analysis highlighted the key
strengths and limitations of ChatGPT in clinical settings.
Owing to training in extensive text datasets, ChatGPT can comprehend the
context and generate human-like responses1. This
ability enables it to perform well in drug counseling and medication
education. Therefore, ChatGPT can be used as a pharmacy encyclopedia for
patients and aid clinical pharmacists in literature search and synthesis
to enhance work efficiency in the future. Several limitations of ChatGPT
should be noted. First, ChatGPT struggled with drug-related problems
involving traditional Chinese medicine, likely due to constraints in its
training data. It lacks a medical-specific database and is based on the
information before 2021. Seghier ML pointed out that ChatGPT faced
language barriers, with its non-English responses being notably
inferior30. In particular, it is not open to the
public in China, potentially leading to informational gaps regarding
Chinese patent medicines. Second, ChatGPT often neglected patients’
real-life circumstances and lacks focus, indicating a deficiency in
situational awareness13. This issue may be mitigated
by incorporating clinical pharmacy professional annotations and feedback
into ChatGPT’s responses31.
ChatGPT’s performance in prescription review, ADR recognition, and ADR
causality assessment was suboptimal. The model tended to manage simple
cases effectively but struggled with complex ones, suggesting
difficulties in handling intricate instructions. While ChatGPT exhibited
some reasoning capabilities, they remained
limited6,14. It misunderstood the implications of the
ADR concept and misjudged ADR causality. From these analyses, it could
be concluded that ChatGPT lacks human-like deep understanding and
adaptive application in complex real-world
situations32.
In this study, ChatGPT’s ability to provide emotional support was not
specifically assessed, but evidence suggested it may not be proactive in
offering such support. For instance, when patients expressed concerns
about incorrect medication, the clinical pharmacist addressed their
emotions first, whereas ChatGPT provided information only. In contrast,
another study investigating ChatGPT in hepatic disease management
discovered that it could supply empathetic advice to patients and
caregivers11.
Prior to implementing ChatGPT in clinical pharmacy practice, ethical
issues must be carefully considered. Bias risks are present since AI
algorithms are often trained on biased datasets. OpenAI even
acknowledges that ChatGPT’s output can perpetuate sexist
stereotypes33. Additionally, the data used for ChatGPT
training lacks transparency, affecting reliability. ChatGPT also
occasionally generates plausible-sounding but incorrect or nonsensical
answers, called AI hallucinations, posing significant risks to patient
safety7,34. As a result, legal frameworks and
accountability should be established for these
errors35. Italy has banned ChatGPT due to privacy
concerns, necessitating attention towards data governance and
privacy7,36.
This study has several limitations. Firstly, the evaluation focused on
specific aspects of clinical pharmacy practice, not encompassing all
potential applications. Secondly, the limited number of questions and
prompts may not fully capture ChatGPT’s capabilities and limitations.
Lastly, the GPT-3.5 model instead of the more advanced GPT-4 model was
employed in this study. However, this is due to the GPT-3.5 model’s
unrestricted access to the public. The GPT-4 model is only available for
paid ChatGPT Plus subscribers on a limited basis, thereby reducing its
accessibility to patients.
In conclusion, ChatGPT shows promise in specific domains such as drug
counseling, but its overall performance across various aspects of
clinical pharmacy practice is significantly weaker compared to the
clinical pharmacist. These findings indicate that ChatGPT has the
potential as a supplementary tool in clinical settings. Further
enhancements and refinements to the ChatGPT system, particularly in
expanding medicine-specific datasets and augmenting capabilities for
advanced reasoning and complex instructions, will be crucial for
optimizing its utility in clinical pharmacy practice.