Using artificial intelligence to identify patients with migraine and associated symptoms and conditions within electronic health records
BMC Medical Informatics and Decision Making volume 23, Article number: 121 (2023)
Real-world evidence (RWE)—based on information obtained from sources such as electronic health records (EHRs), claims and billing databases, product and disease registries, and personal devices and health applications—is increasingly used to support healthcare decision making. There is variability in the collection of EHR data, which includes “structured data” in predefined fields (e.g., problem list, open claims, medication list, etc.) and “unstructured data” as free text or narrative. Healthcare providers are likely to provide more complete information as free text, but extracting meaning from these fields requires newer technologies and a rigorous methodology to generate higher-quality evidence. Herein, an approach to identify concepts associated with the presence and progression of migraine was developed and validated using the complete patient record in EHR data, including both the structured and unstructured portions.
“Traditional RWE” approaches (i.e., capture from structured EHR fields and extraction using structured queries) and “Advanced RWE” approaches (i.e., capture from unstructured EHR data and processing by artificial intelligence [AI] technology, including natural language processing and AI-based inference) were evaluated against a manual chart abstraction reference standard for data collected from a tertiary care setting. The primary endpoint was recall; differences were compared using chi square.
Compared with manual chart abstraction, recall for migraine and headache were 66.6% and 29.6%, respectively, for Traditional RWE, and 96.8% and 92.9% for Advanced RWE; differences were statistically significant (absolute differences, 30.2% and 63.3%; P < 0.001). Recall of 6 migraine-associated symptoms favored Advanced RWE over Traditional RWE to a greater extent (absolute differences, 71.5–88.8%; P < 0.001). The difference between traditional and advanced techniques for recall of migraine medications was less pronounced, approximately 80% for Traditional RWE and ≥ 98% for Advanced RWE (P < 0.001).
Unstructured EHR data, processed using AI technologies, provides a more credible approach to enable RWE in migraine than using structured EHR and claims data alone. An algorithm was developed that could be used to further study and validate the use of RWE to support diagnosis and management of patients with migraine.
There is increasing emphasis on the use of real-world data collected as part of routine clinical practice to augment the interpretation of data from randomized clinical trials (RCTs) . Although appropriately designed RCTs are the gold standard of clinical evidence as observed in a patient population, the highly controlled testing environment and study design (i.e., specific selection of patients, rigorous treatment protocols, monitoring, and statistical hierarchies) do not reflect treatment of individual patients cared for in real-world settings [2, 3]. RCTs are an inefficient method to understand patient subgroups and comparative effectiveness of medications. Nonetheless, healthcare providers (HCPs) and payers almost exclusively use RCTs to establish policies on new therapeutic options, as these are considered the best available evidence .
The US Congress, under the 21st Century Cures Act, supports the use of real-world evidence (RWE) by the US Food and Drug Administration to support or satisfy post-approval study requirements [4, 5]. RWE is based on information obtained from sources other than clinical studies. These include data from electronic health records (EHRs), claims and billing data, product and disease registries, personal devices (e.g., phones, wearable devices), and health applications (i.e., smartphone apps) . The role and use of RWE is gaining prominence in regulatory, drug development, and healthcare decision-making environments . RWE can complement and validate RCT evidence, extending the understanding of interventions and outcomes based on typical clinical practice patterns in a broader treatment population diagnosed and treated in routine care settings (including those who would be ineligible for RCT due to factors such as age or comorbidities) . This could help policy stakeholders, payers, and HCPs better refine their understanding of the significance and relevance of such evidence in the broader patient population that is likely to receive a given treatment and provide patients an opportunity to voice their experiences and preferences [6, 8, 9]. RWE may support regulatory applications (e.g., new drug indications, post-marketing surveillance) , aid payers with coverage and reimbursement decisions, assist physicians in making treatment decisions, and allow patients to be more effective participants in collaborative healthcare decisions [2, 9, 10]. Patients may also benefit because RWE can provide information on treatment patterns across a much broader range of outcomes, including patients who are often excluded from RCTs due to comorbid disease, psychosocial barriers, and ethnicity .
Traditional real-world studies rely on insurance claims data with International Classification of Disease (ICD) or Current Procedural Terminology codes providing context or structured EHR data, which leverages the entries made in the predefined fields clinicians use when recording a physician record (e.g., problem list, open claims, medication list, etc.). However, many conditions or procedures require information at a more discrete level of detail than can be gained from these methods [11,12,13,14], and there is variability in how routinely collected data are entered in claims or structured EHRs among HCPs . Some HCPs are more comprehensive in their approach than others, with many finding the addition of structured data to be time consuming and inefficient [12, 13]. Additionally, structured data are most commonly used in the billing pathway, which requires limited clinical detail .
In contrast, unstructured data from the narrative of the EHRs are used as the medicolegal record and provide HCPs the opportunity to record detail that is essential to patient care, creating an incentive for these data to be more complete; this provides the level of detail necessary for investigators to fully understand the condition or disease progression. However, extracting meaning from the unstructured EHRs for large scale studies requires artificial intelligence (AI) technologies, including natural language processing (NLP) and machine-learned inference, that, if successful, results in the higher-quality evidence for regulators, payers, prescribers, and patients to make their decisions . In this study, data came from the same EHR system which included both structured data and a documentation system for unstructured data. The objective of this study was to develop and validate an approach to identify concepts associated with the presence and progression of migraine using unstructured EHR data. Worldwide, migraine is one of the most prevalent and disabling diseases . The diagnosis of migraine is complex: specific headache history and duration criteria must be met in patients with at least 2 of 4 key migraine-associated characteristics (i.e., unilateral location, pulsating quality, moderate or severe pain intensity, and aggravated by activity) and accompanied by nausea and/or vomiting and/or both photophobia and phonophobia . Furthermore, migraine phenotype requires specific headache and migraine frequency criteria to be met, including a 3-month or longer duration for chronic migraine. Taken together, the prevalence and complex diagnostic criteria make migraine an ideal therapeutic area to assess approaches to obtaining high-quality RWE from EHR data.
Administration claims and EHR data were sampled from a large US-based integrated health system for the years 2010–2012. Using these data, we compared the accuracy of traditional and advanced RWE approaches for identifying migraine-related concepts relying on chart abstraction with data collected in a tertiary care setting as the reference standard. For purposes of this study, we defined “Traditional RWE” as the use of insurance claims or structured EHR data to identify concepts. Traditional RWE included problem and medication lists and encounter-associated open claims, which were extracted by standard queries by structured query language. “Advanced RWE” was defined as the use of unstructured EHRs to identify relevant concepts and attributes, such as negation or temporality, with AI to process physicians’ notes.
To achieve high accuracy, NLP and AI-based inference were used. AI-based technologies were provided by Verantos, Inc. (Menlo Park, CA). NLP supported identification of relevant concepts within the sentence boundary. In situations where insufficient information was available within the sentence or where a concept was suspect, AI-based inference supported identification of patterns within the longitudinal record. Patterns were recognized based on a large corpus of machine-learned healthcare associations as well as inferencing algorithms. For example, in the text “pt with MA”, association with headaches and photophobia may favor disambiguation of “MA” to migraine with aura whereas association with lung cancer and tumor may favor mass. As another example, if a new template includes “-headache, +nausea”, inference may determine that headache is not asserted due to lack of support elsewhere in the encounter.
Manual annotation of the data set was used to create a reference standard supporting estimate of recall and precision for extracted data in the Traditional RWE and Advanced RWE arms. In the manual annotation, each concept—and all the metadata associated with each concept—was identified and labeled in the data set. For example, an annotator might mark the text “mod HA” as headache experienced = true, current = true, and severity = moderate. The concepts identified were selected based on recommendations from a local panel of academic and industry headache specialists. Concepts that were chosen reflect features likely to be inclusion and exclusion criteria or subgroup analyses within an RWE migraine study.
The data set consisted of 6,032 encounters. Each encounter was reviewed by 2 clinical annotators and was defined as a single visit for a patient with a single physician on a specific date. During the course of the study, the annotation was performed by 7 annotators, each with ≥ 1 year of medical annotation experience spanning approximately 3,000 training encounters. Inter-annotator agreement was measured by Cohen’s κ score; an average κ score ≥ 0.8 was required for this study. To ensure there was no systematic error, annotator pairings were rotated between a pair of annotators. Any disagreement between the annotators was noted and resolved with discussion between the 2 annotators to arrive at a common agreement. In cases of dispute between the 2 annotators, a third annotator served as the tiebreaker. The κ score was calculated prior to resolution of disagreement.
Preselecting the data set was done to increase the likelihood of relevant concepts and allow for the selection of relevant encounters to annotate. Filtering was implemented to ensure that low prevalence conditions would have sufficient occurrence rates to provide adequate power to demonstrate statistically significant differences between study arms (i.e., Traditional vs. Advanced RWE). Concepts included the terms migraine and headache and rizatriptan and sumatriptan. Filtering was applied equally to all data to avoid bias.
Outcomes and comparisons
Because concepts such as headache or migraine are discrete, time-specific events, concepts were tested at the encounter level, meaning that if a patient had migraine at a specific encounter, it would only be counted as correct in that encounter. It is not assumed that the patient will have migraine throughout the longitudinal record. In each encounter, a specific concept can occur multiple times (e.g., “patient describes frequent headaches” and “the headaches have been severe throbbing”). Thus, concept occurrence is the sum of all occurrences of a specific concept (e.g., migraine). Encounter occurrence is the number of encounters that had ≥ 1 occurrence of a concept. Recall was the primary endpoint used to determine the performance of Traditional RWE and Advanced RWE against the manually annotated reference standard (Table 1). Precision was a secondary endpoint, and F1 scores—the weighted harmonic mean (reciprocal of arithmetic mean with equal weight to each data point) of precision and recall—were calculated as 2 × ([precision × recall] / [precision + recall]) (Table 1). As these are proportions, the chi-square test was used to compare differences in accuracy. For Advanced RWE to outperform Traditional RWE, the protocol required an average recall of at least 80% or an absolute recall difference of at least 25% between the 2 approaches.
The average interrater reliability Cohen κ score was high (0.9), indicating that manual annotation (i.e., reference standard) was consistent and credible. Concept occurrences for migraine and headache were 2,642 and 6,530, respectively. Recall for migraine and headache were 66.6% and 29.6% for Traditional RWE and 96.8% and 92.9% for Advanced RWE (absolute differences, 30.2% and 63.3% for migraine and headache, respectively; Fig. 1). There were statistically significant differences between Traditional RWE and Advanced RWE for identification of migraine and headache concepts (P < 0.001).
For identification of 6 migraine-associated symptoms (243–4,088 concept occurrences), Traditional RWE recall ranged from 0 to 17.9%, whereas Advanced RWE recall ranged from 79.3 to 96.6%, with respective F1 scores of 0.0–28.9% and 80.7–95.6% (Fig. 2). For symptom identification, the absolute differences in recall between Advanced RWE and Traditional RWE ranged from 71.5 to 88.8%; all differences between Traditional RWE and Advanced RWE were statistically significant (P < 0.001). For identification of migraine medications (i.e., rizatriptan and sumatriptan; 102 and 510 concept occurrences, respectively), recall was approximately 80% for Traditional RWE and ≥ 98% for Advanced RWE, with F1 scores > 88% for Traditional RWE and ≥ 98% for Advanced RWE (P < 0.001; Fig. 3).
In this phenotyping study, we demonstrated that Advanced RWE techniques consistently outperformed Traditional RWE techniques for identification of migraine-related concepts, with a 193% increase in concept recall for advanced relative to traditional approaches. Advanced RWE met the defined study success criteria for identifying patient characteristics (e.g., associated symptoms) in migraine (recall ≥ 80%), whereas Traditional RWE did not (recall range, 0–66.6%). This should not be surprising, as structured data are typically intended for administrative purposes such as filing insurance claims, marketing, or meeting regulatory requirements; as a consequence, critical information necessary for decision making is not likely to be documented in structured data. For instance, there is no ICD-10 code describing the intensity of migraine pain ; thus, this omnipresent structured information is not useful for assessing the severity of migraine-related disability or disease progression. Of course, that was not the intent of this administrative and epidemiological tool.
In contrast, symptoms are typically well documented in the EHR narrative, as that is where physicians document impressions to support decisions regarding patient management and prescribing. However, as demonstrated here, these data require use of sophisticated tools, such as AI, to unlock meaning that can be employed by decision and policy makers in regulatory agencies and by payers. There are some decisions that do not require these sophisticated tools; for instance, Traditional RWE was sufficient for identifying prescribed medications (Traditional RWE recall, ~ 80%) and may be the preferred approach for evaluation of interventions because of ease of use and lower cost. Nonetheless, the results of the current study provide support for using Advanced RWE techniques to more accurately identify migraine concepts for evaluating outcome of migraine treatment in a real-world setting.
The use of real-world data necessitates a thorough assessment of their quality and relevance when used for evaluating data from clinical care settings . Unlike data derived from RCTs, which are affected by selection bias (e.g., specific inclusion/exclusion criteria) and nonresponse, data collected in EHRs and claims may be more representative of the real-world setting . However, scientific evaluations and validation of EHR and claims data greatly depend on the quality of the input data and the technology used to extract the appropriate data .
Despite the prevalence of migraine and the substantial disability due to migraine , very few studies have used EHR data to identify patients with migraine and to characterize their diagnosis, symptoms, or disease progression. A retrospective study using an EHR-based algorithm in a cohort of patients with migraine with multiple comorbidities found that patients with migraine had an increased occurrence of multiple comorbidities compared with a control population . Additional studies using RWE in patients with migraine are needed.
Study arms were separated to provide the best visibility into differential performance of Traditional versus Advanced RWE techniques. This is important in the context of an industry which typically uses single-source claims data only. In practice, combination of datasets can provide the clearest picture into the patient journey. This may include EHR structured and unstructured as well as linkage of national pharmacy and medical claims data and death registry. Combination or linkage of datasets to achieve completeness is outside the scope of this manuscript.
The results of our study provide a novel approach that can be used to extract high-quality data in the migraine population. We believe using this new approach will provide real-world data that allow for a more thorough characterization of patients with migraine and progression of migraine and that support more credible evidence than when low-accuracy data are used. High-validity RWE applied to rich data sources provides a pathway for payers to make informed decisions regarding treatment management, for HCPs to improve understanding of subgroups and comparative effectiveness, and for patients to sustain a better quality of life and reduce the burden of migraine.
A possible limitation of this study is that a tertiary care (i.e., highly specialized care) EHR was used, and the generalization to other healthcare settings is unknown. Variability in language may cause differential performance of technology. Differences in patient populations in terms of severity and effects of treatment are unknown. Additionally, there is a possibility of selection bias favoring Traditional RWE because patients who were likely to have migraine were selected per protocol to provide sufficient frequency of the disease. However, the improvement in recall and precision with Advanced RWE over Traditional RWE methods found in this study is not likely to be an artifact of these factors.
A specific implementation approach for retrospective EHR-based observational studies in migraine was established. Advanced RWE techniques were required to accurately identify patients with migraine and associated symptoms. A visual summary of the methods and findings is shown in Supplemental Fig. 1. Based on the robust findings of this study, a more reliable approach for RWE studies in migraine could be progressed by using advanced RWE approaches. An algorithm was validated that could be used to further study RWE in patients with migraine.
The datasets generated and/or analyzed during the current study are not publicly available. Lundbeck LLC is committed to responsible sharing of data in a manner that is consistent with safeguarding the privacy of patients, respecting the integrity of national regulatory systems, and protecting the intellectual property of the sponsor. The protection of intellectual property ensures continued research and innovation in the pharmaceutical industry. Deidentified data are available to those whose request has been reviewed and approved through an application submitted to https://www.lundbeck.com/global/our-science/clinical-data-sharing.
electronic health record
International Classification of Disease
natural language processing
randomized clinical trial
Hernandez-Boussard T, Monda KL, Crespo BC, Riskin D. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record-based studies. J Am Med Inform Assoc. 2019;26:1189–94. https://doi.org/10.1093/jamia/ocz119.
Camm AJ, Fox KAA. Strengths and weaknesses of ‘real-world’ studies involving non-vitamin K antagonist oral anticoagulants. Open Heart. 2018;5:e000788. https://doi.org/10.1136/openhrt-2018-000788.
Kennedy-Martin T, Curtis S, Faries D et al. (2015) A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials 16.
Levenson MS. Regulatory-grade clinical trial design using real-world data. Clin Trails. 2020;17:377–82. https://doi.org/10.1177/1740774520905576.
21st Century Cures Act. Accessed April 22., 2021. https://www.congress.gov/114/plaws/publ255/PLAW-114publ255.pdf.
Sherman RE, Anderson SA, Dal GJ, et al. Real-world evidence - what is it and what can it tell us? N Engl J Med. 2019;375:2293–7.
Miksad RA, Abernethy AP. Harnessing the power of real-world evidence (RWE): a Checklist to Ensure Regulatory-Grade Data Quality. Clin Pharmacol Ther. 2018;103:202–5. https://doi.org/10.1002/cpt.946.
Blonde L, Khunti K, Harris S, et al. Interpretation and impact of real-world clinical data for the practicing clinician. Adv Ther. 2018;35:1763–74. https://doi.org/10.1007/s12325-018-0805-y.
Dhruva SS, Ross JS, Desai NR. Real-world evidence: promise and Peril for Medical product evaluation. P & T. 2018;43:464–72.
Garrison LP, Neumann PJ, Erickson P, et al. Using real-world data for coverage and payment decisions: the ISPOR real-world data Task Force report. Value in Health. 2007;10:326–35. https://doi.org/10.1111/j.1524-4733.2007.00186.x.
Makam AN, Lanham HJ, Batchelor K, et al. Use and satisfaction with key functions of a common commercial electronic health record: a survey of primary care providers. BMC Med Inf Decis Mak. 2013;13. https://doi.org/10.1186/1472-6947-13-86.
Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality measurement for performance monitoring. J Am Med Inform Assoc. 2012;19:604–9. https://doi.org/10.1136/amiajnl-2011-000557.
Singer A, Yakubovich S, Kroeker AL, et al. Data quality of electronic medical records in Manitoba: do problem lists accurately reflect chronic disease billing diagnoses? J Am Med Inform Assoc. 2016;23:1107–12. https://doi.org/10.1093/jamia/ocw013.
Wright A, McCoy AB, Hickman TTT, et al. Problem list completeness in electronic health records: a multi-site study and assessment of success factors. Int J Med Informatics. 2015;84:784–90. https://doi.org/10.1016/j.ijmedinf.2015.06.011.
GBD 2016 Headache Collaborators. Global, regional, and national burden of migraine and tension-type headache, 1990–2016: a systematic analysis for the global burden of Disease Study 2016. Lancet Neurol. 2018;17:954–76. https://doi.org/10.1016/s1474-4422(18)30322-3.
Headache Classification Committee of the International Headache Society (IHS). The International classification of Headache Disorders, 3rd edition. Cephalalgia. 2018;38:1–211. https://doi.org/10.1177/0333102417738202.
Groenhof TKJ, Koers LR, Blasse E, et al. Data mining information from electronic health records produced high yield and accuracy for current smoking status. J Clin Epidemiol. 2020;118:100–6. https://doi.org/10.1016/j.jclinepi.2019.11.006.
Korolainen MA, Tuominen S, Kurki S, et al. Burden of migraine in Finland: Multimorbidity and phenotypic disease networks in occupational healthcare. J Headache Pain. 2020;21:8. https://doi.org/10.1186/s10194-020-1077-x.
Under the direction of the authors, Sonia Mohinta, PhD, and Patrick Little, PhD, employees of ICON (North Wales, PA, USA), provided writing assistance for this manuscript. The authors also thank The Medicine Group, LLC (New Hope, PA, United States) for providing editorial support, which was funded by Lundbeck, LLC (Deerfield, IL, United States) and in accordance with Good Publication Practice guidelines.
This study was sponsored and funded by Lundbeck LLC (Deerfield, IL, USA), including medical writing support for the development of the manuscript. In collaboration with the academic authors, the sponsor participated in the design and conduct of the study and in the collection, management, analysis, and interpretation of the data. The preparation, review, and approval of the manuscript was undertaken by all authors and by a professional medical writer and editor funded by the sponsor. All authors and Lundbeck LLC prepared, reviewed, and approved the final version of the manuscript and made the decision to submit the manuscript for publication. The sponsor did not have the right to veto publication or to control the decision regarding to which journal the manuscript was submitted. This work was supported in part by the National Center for Advancing Translational Sciences of the NIH under Award Number R44TR002437.
Ethics approval and consent to participate
All methods were carried out in accordance with relevant guidelines and regulations and all experimental protocols were approved by an institutional ethics review committee (Ethical & Independent Review Services, Lee’s Summit, MO). Informed consent was waived by the ethics review committee (Ethical & Independent Review Services, Lee’s Summit, MO) under exemption 4 of the Common Rule due to use of deidentified data because of the retrospective nature of the study.
Consent for publication
DR is an employee and stockholder of Verantos Inc., a company that received funding from Lundbeck LLC for time spent conducting this research. RC was an employee of Lundbeck LLC at the time of manuscript development. AS is an employee of Verantos Inc., a company that received funding from Lundbeck LLC for time spent conducting this research. NAH has received honoraria from Lundbeck LLC, Eli Lilly, and Impel. TS reports an advisor/consultancy role with Lundbeck LLC, Amgen, Allergan/AbbVie, Biohaven, Eli Lilly, Impel Neuropharma, Neurolief, Nocira, Teva, Theranica, Vorso/Nesos; Speakers Bureau for Amgen, Allergan/AbbVie, Biohaven, Eli Lilly; clinical trial support from Lundbeck LLC, Aeon Pharmaceuticals, Allergan/AbbVie, Amgen, Biohaven, Boehringer Ingelheim, Charleston Labs, electroCore, Eliem Pharmaceuticals, Impel Neuropharma, Eli Lilly, Nocira, Novartis, Novo Nordisk, Pfizer, Satsuma, Teva, Theranica, Vorso/Nesos; stockholdings in United Health Group; officer and Board of Directors member (unpaid volunteer work for a non-profit organization) for the National Headache Foundation. SK is an employee of Lundbeck LLC.RK Consults and StudyMetrix Research LLC are owned by RC and TS, respectively. The companies did not participate in this study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Riskin, D., Cady, R., Shroff, A. et al. Using artificial intelligence to identify patients with migraine and associated symptoms and conditions within electronic health records. BMC Med Inform Decis Mak 23, 121 (2023). https://doi.org/10.1186/s12911-023-02190-8
- Real-world evidence
- Electronic health records
- Observational study