INTRODUCTION
There is an estimated 2 million adults who identify as transgender (TG)
living in the United States.1 Unjust discrimination
and violence have led to disproportionate health burdens among TG
populations that have been consistently reported, such as higher
prevalence of mental health distress, substance misuse, and HIV when
compared to cisgender people (i.e. those whose sex assigned at birth
aligns with their current gender identity).2,3 While
health literature on TG individuals is growing, this population is
largely overlooked in epidemiologic studies due to small sample size
limitations and inconsistent gender identity data collection
measures.4 Recruiting a large sample size of TG people
is labor intensive and costly, leading researchers to resort to
real-world data (RWD) sources like electronic healthcare databases to
create efficient methods for identifying these patients.
Computational phenotypes (CPs) have become emerging tools to distinguish
groups of patients with shared characteristics within electronic
healthcare databases, and they have an important role in TG
health-related research.5 In brief, CPs are algorithms
that use a combination of diagnostic and procedure codes, medication
records, and demographic characteristics to identify patient populations
within healthcare utilization data.6 Given the varying
data models from RWD sources, there is not a single standardized method
to identify TG patients. A systematic review in 2016 assessing the
variation in prevalence estimates of TG people using self-reported
gender identity information from surveys and TG-related diagnosis codes
from electronic healthcare data across the world highlighted the lack of
standardization and significant heterogeneity of ascertainment of TG
status across studies as an important barrier for
research.4
To date, there has not been a review of published literature on CPs to
better understand their ability to identify TG people and their health
utilization patterns within electronic health care data. Similarly,
there has not been a comprehensive assessment of validity of such CP
algorithms in this setting. In this narrative review, we aim to discuss
the existing literature that has utilized CPs to identify TG people
within electronic healthcare data and their validity, potential gaps,
and a synthesis of future recommendations based on current knowledge.