Skip to main content

Early temporal characteristics of elderly patient cognitive impairment in electronic health records



The aging population has led to an increase in cognitive impairment (CI) resulting in significant costs to patients, their families, and society. A research endeavor on a large cohort to better understand the frequency and severity of CI is urgent to respond to the health needs of this population. However, little is known about temporal trends of patient health functions (i.e., activity of daily living [ADL]) and how these trends are associated with the onset of CI in elderly patients. Also, the use of a rich source of clinical free text in electronic health records (EHRs) to facilitate CI research has not been well explored. The aim of this study is to characterize and better understand early signals of elderly patient CI by examining temporal trends of patient ADL and analyzing topics of patient medical conditions in clinical free text using topic models.


The study cohort consists of physician-diagnosed CI patients (n = 1,435) and cognitively unimpaired (CU) patients (n = 1,435) matched by age and sex, selected from patients 65 years of age or older at the time of enrollment in the Mayo Clinic Biobank. A corpus analysis was performed to examine the basic statistics of event types and practice settings where the physician first diagnosed CI. We analyzed the distribution of ADL in three different age groups over time before the development of CI. Furthermore, we applied three different topic modeling approaches on clinical free text to examine how patients’ medical conditions change over time when they were close to CI diagnosis.


The trajectories of ADL deterioration became steeper in CI patients than CU patients approximately 1 to 1.5 year(s) before the actual physician diagnosis of CI. The topic modeling showed that the topic terms were mostly correlated and captured the underlying semantics relevant to CI when approaching to CI diagnosis.


There exist notable differences in temporal trends of basic and instrumental ADL between CI and CU patients. The trajectories of certain individual ADL, such as bathing and responsibility of own medication, were closely associated with CI development. The topic terms obtained by topic modeling methods from clinical free text have a potential to show how CI patients’ conditions evolve and reveal overlooked conditions when they close to CI diagnosis.


Medical achievements have produced a population whose lifespan has increased by 30 years since the beginning of the twentieth century [1]. In 2012, there were 40.7 million people aged 65 and over in the United States (13.2% of the total population), with 38.7% reported to have one or more disabilities [2]. The aging population has also led to an increase in persons living with CI, more than 17 million people in the United States [3], causing patients, their families, and society an annual estimate of $18 billion in lost income and direct cost of care [4]. Herein, we are defining CI as either mild cognitive impairment (MCI) or dementia. In 2015 Alzheimer’s Disease International estimated that dementias affected 46.8 million individuals worldwide. They projected the number to nearly triple by 2050 reaching 131.5 million people worldwide [5]. Regarding this, the subject of MCI is paramount as it is a transitional zone between normal life in older ages and dementia. One study indicated that clinicians were not aware of CI in more than 40% of their patients [6]. The failure to diagnose cognitive complaints will delay appropriate care plans of underlying diseases and comorbid conditions, and may cause safety issues for patients and others [7, 8]. In many cases, the CI problem will worsen over time [8,9,10]. Thus, early diagnosis of CI can be of utmost importance and may reduce the large burden later on the medical and social care.

The impact of CI on ADL has been used as a criterion to differentiate MCI and dementia [11]. ADL is often divided into basic ADL (b-ADL), which includes activities such as personal hygiene, clothing, feeding and toileting [12] and instrumental ADL (i-ADL), which is commonly referred to as independent living abilities such as household activities, handling money, shopping, and transportation [13,14,15]. The i-ADL has a higher demand for cognitive function than the b-ADL and is important for living an independent life in society [16]. ADL is highly dependent on cognitive function and behavior [17]. Therefore, there should be assessments that are capable of detecting changes in ADL as soon as changes in cognition and behavior are detected [17].

In this study, we first examined basic statistics of EHR corpus relevant to CI diagnosis. Temporal trends of ADL in elderly patients (age 65 or up) mined from EHRs before they develop CI were compared between CI and CU patients. We used both structured (current visit information provided by patients) and unstructured data (clinical notes). Furthermore, we applied machine learning techniques (i.e., three topic modeling methods) on clinical notes to extract meaningful semantics (i.e., topics and terms) residing in clinical free text to examine their potential association with future CI development.

Different studies have used machine learning algorithms to differentiate between cognitively normal and MCI individuals [18, 19], to predict conversion from MCI to Alzheimer’s disease (AD) [20], and to predict the time to this conversion [19]. Researchers [21] developed two layers model in which the first layer is for a screening test to categorize a normal or abnormal group. The second layer is a close examination to classify MCI or dementia. They compared result with various machine learning approaches. Support vector machines, multi-layer perceptron and logistic regression showed high performance. Conversion from MCI to AD has also been studied using a deep learning model with MRI, neuropsychological and demographics data [22].

In another study [23], they tried to predict MCI from spontaneous spoken utterances. Classifying cognitive profiles using machine learning with fMRI data as an addition to cognitive data were explored [24]. In their work fMRI data are only used to train the classifier and classification of new data is solely based on cognitive data. Another research [25] focuses on the early diagnosis of AD with deep learning, utilizing sparse auto-encoder. They used neuroimages obtained from neuroimaging initiative database for identifying the region of brain images that are sensitive to AD progression. These previous studies tried to leverage their results by incorporating fMRI data into their models. Although it may have a positive impact on the result, not all the patients have the fMRI data so it may not be broadly applicable, compared to the application using routine EHR data in the health care population. Also, they did not try to identify new risk factors associated with the CI patients in EHRs but only rely on existing known medical conditions and fMRI data to predict the CI.

There are studies focused on predicting progression from MCI to dementia using neuropsychological data. Researches [26] considered the neuropsychological test results to examine their applicability for predicting dementia using a machine learning algorithm. They used a feature selection ensemble approach to choose the features available in the neuropsychological test as a predictor of developing AD dementia. The neuropsychological test to predict the time conversion from MCI was also investigated in [27]. In this study, MCI patients were grouped with regards to who developed to dementia (converter MCI) or remained MCI (stable MCI) during a specified time window. Then a prognostic model was developed to predict the conversion time as early as 5 years before developing to dementia.

Unlike the previous studies, we applied a machine learning approach (i.e., topic modeling) to examine topics and terms in EHR free text that can be potentially used for early detection of CI. A few studies have focused on the early diagnosis of CI [28]; however, these studies have followed the conventional approaches of assessing patients by i-ADL and b-ADL rather than utilizing machine learning algorithm and EHR free text.


The basic EHR corpus statistics (i.e., distributions of event types and practice settings of the first CI diagnosis, numbers of clinical notes between CI and CU patients) were examined. Temporal trends of patient ADL were compared and topics in the clinical free text were analyzed over time using three machine learning models between physician-diagnosed CI and CU patient groups.


The study cohort was selected from patients 65 years of age or older at the time of enrollment in the Mayo Clinic Biobank (n = 22,772), where we identified physician-diagnosed CI patients (n = 1,435; male 55%) and CU patients (n = 1,435) matched by age (+/− 1 year) and sex. The physician-diagnosed CI patients were determined based on diagnosis (i.e., dementia, cognitive impairment, cognitive deficit, cognitive decline, mild cognitive impairment) under the diagnosis section in clinical notes [29].

Corpus analysis

The basic EHR corpus statistics relevant to CI diagnosis (i.e., the distributions of event types and practice settings of the first CI diagnosis) were examined and also the number of clinical notes over time between CI and CU patients was compared.

Analysis of activity of daily living

The ADL was collected from two sources: 1) the current visit information, which is provided and updated by the patients every 6 months when they visit the Mayo Clinic, 2) certain sections in clinical notes (i.e., instructions for continuing care, ongoing care orders, system review). The current visit information includes questionnaires to assess the ability of patients to accomplish ADL (binary assessment assessing the difficulty of ADL: yes or no) in a structured format. The clinical notes were processed by the MedTaggerIE module in MedTagger [30, 31], which is the open-source pipeline developed by Mayo Clinic for pattern-based information extraction with a capability of assertion detection (i.e., negated, possible, hypothetical, associated with a patient) and normalization, to extract ADL related concepts. These concepts were automatically mapped to the corresponding predefined ADL categories through the MedTaggerIE implementation (i.e., rule-based normalization process). We only included non-negated ADL related concepts.

Once we obtained ADL concepts, they were mapped to items in Katz’s index (b-ADL) [12] and Lawton scale (i-ADL) [13,14,15], which are the most commonly used tools for assessing ADL. The items of ADL used in this study for each ADL category are—1) b-ADL: bathing, dressing, transferring, toileting, and feeding; 2) i-ADL: using transportation, shopping, preparing food, housekeeping, responsibility for own medications, and handling financing. These items can be mapped to the International Classification of Functioning, Disability, and Health (ICF) [32], allowing for broad information exchange. The temporal trends of b-ADL and i-ADL between CI and CU patients were compared in every 6 months for 5 years before the first physician-diagnosed CI and the latest visit for CI and CU patients, respectively.

Analysis of topics in clinical notes

The topics in clinical notes were investigated: 1) how topic terms evolve in CI patients each year for the past 5 years (experiment 1), and 2) how topic terms are different between CI and CU patients over the 5-year period before the development of CI (experiment 2). This step-wise time frame allows us to observe how the topics change over time, motived by the expert recommendation that people older than 65 years old should visit doctors every 6 months to determine if symptoms are staying the same, improving or growing worse [17]. We examined the topics in 1) entire clinical notes, 2) individual sections (i.e., history of present illness, diagnosis, current medication) independently, and 3) the set of sections that most likely include medical concepts of interest (i.e., chief complaint, history of present illness, system review, past medical history, physical examination, impression/report/plan, and diagnosis).

For preprocessing the topic models, we keep the most frequent 2,000 words as the vocabulary after removing stop words and stemming. We applied three different machine learning models; two conventional topic modeling methods (LDA and TKM) and one deep learning approach (KATE) as follows. The number of topics was determined based on the self-regulatory capability embedded in a TKM model.

Latent Dirichlet allocation (LDA)

It is a generative probabilistic model in which the document will be viewed as a mixture of various topics and each topic as a distribution of the words [33]. We set the number of topics to 20 and 10 words distribution in each topic. Other hyper parameters were set as the code implemented in [34].

Topic keyword model (TKM)

This method addresses the shortcoming of LDA approach (i.e., ignoring the order of words). In TKM, each word in each topic aims to show how common the word is within the topic and how common it is between other topics [35]. The other advantage of this method is that redundant topics will be removed automatically. We used the hyper parameters as explained in the paper in [35].

K competitive autoencoder (KATE)

An autoencoder is a neural network which can automatically learn data representations though constructing its input at the output level. Many variants of autoencoders have been proposed mainly for image data. However, KATE has been designed to overcome the weakness of traditional autoencoder which is not suitable for textual data [34]. The number of the topics in this experiment was set to 20 and 10 words distribution for each topic. Other deep learning parameters were set as discussed in the original paper [34].


We first examined basic EHR corpus statistics of the cohort. Then, we analyzed patterns of temporal trends of 1) b-ADL and i-ADL and 2) individual ADL between CI and CU patients before patients develop CI. The outcomes of three topic modeling methods (i.e., terms and topics mined from clinical notes) were analyzed and compared between the two patient groups over time, both qualitatively and quantitatively, in order to better understand patient medical conditions that may contribute more to CI development.

Corpus statistics

Figure 1 shows major event types (i.e., note types) and practice settings along with their occurrences in which a physician first diagnosed CI. The consultation was the most dominant event to diagnose CI (28%), followed by subsequent visit (19%), limited exam (18%), multi-system evaluation (11%), and supervisory (6%), which cover more than 80% of total events of CI diagnosis. For practice setting, neurology (31%) was the most dominant, followed by primary care (26%), general internal medicine (12%), family medicine (6%), and brain (3%).

Fig. 1
figure 1

Distribution of the first CI diagnosis (CON: consult, SV: subsequent visit, LE: limited exam, ME: multi-system evaluation, SUP: supervisory, SE: specialty evaluation, ADM: admission; GIM: general internal medicine)

Table 1 contains the statistics of clinical notes for the past 5 years of CI and CU patients before they develop CI and the latest visit date, respectively. As can be seen, CI patients consistently showed higher reading of clinical notes than CU patients and the difference was most significant in the first year before CI diagnosis.

Table 1 Average number of clinical notes for CI and CU patients (SD in parenthesis)

ADL distribution

Figure 2 shows temporal distributions of the deteriorated b-ADL and i-ADL of CI and CU patients in three age groups (65–74, 75–84, and 85 & up). Overall, CI patients had worse b-ADL and i-ADL (i.e., a higher ratio of deteriorated ADL) than CU patients in all age groups and this trend is more significant when it is close to physician-diagnosed CI for CI patients. The deteriorated b-ADL and i-ADL between the age groups of 65–74 and 75–84 are not much different for both CI and CU patients. Interestingly, the overall CU patients’ b-ADL were worse than i-ADL, but it is opposite for CI patients—i.e., CI patients’ i-ADL became worse than b-ADL over time, mainly when it was close to 1.5 to 1 year(s) before the physician-diagnosed CI.

Fig. 2
figure 2

Distribution of b-ADL and i-ADL for CI and CU patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

We have also examined individual ADL trajectories for the entire patient cohort between CI and CU patients. Overall, CI patients had more deteriorated ADL than CU patients over time for all ADL categories. The most deteriorated ADL in 6 months prior was transferring (17% for CI and 14% for CU patients) in b-ADL and housekeeping (14% for CI and 10% for CU patients) in i-ADL. The difference between the two groups is relatively small for housekeeping and transferring, but large for bathing and responsibility for own medication (Fig. 3).

Fig. 3
figure 3

ADL distributions for CU and CI patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest clinical visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

Topic modeling

Qualitative analysis

We examined the topic terms extracted by three different models (i.e., LDA, TKM, and KATE) from clinical notes to compare hidden topics in CI patients before they develop CI. This approach may reveal potential patient medical conditions that lead to CI. Tables 2, 3 and 4 include topic terms generated by different topic models in different portions of clinical notes for 6 months before physician-diagnosed CI. The bold font in the topic denotes the correlated words in a given topic relevant to CI.

Table 2 Topic words by TKM (6 months before CI diagnosis)
Table 3 Topic words by KATE (6 months before CI diagnosis)
Table 4 Topic words by LDA (6 months before CI diagnosis)

The words in the tables are stemmed. We included one representative cluster of the topics for each section. As can be seen in the tables, the topics are distinguishable of each other, capturing a meaningful representation of the text data. For example, Table 2, all sections show some symptoms related to “fatigue,” which may be the potential risk of dysfunction [36]; the topic in set of sections is relevant to “sleep issue” that could be observed in the individuals suffering from cognitive disorder [36, 37]. The topic words in the history of present illness section, we can observe glucose, diabetes, insulin, and hydrochlorothiazide, which are related to diabetes disease considered as a potential risk factor of cognitive decline [38]. For the topic in the medication section, we observed medications to control high blood sugar [38]. The topic in the diagnosis section includes the terms related to cancer [39,40,41,42].

Table 3, set of section and history of present illness include hyperlipidemia that can be considered as a risk factor of CI [43], coronary artery disease and hypertension, which are relevant to cognitive decline [44, 45]. In Table 4, LDA result in similar outcomes as TKM and KATE is shown. Words like edema, distress, memory, hypertension, coronary, urinary and hyperlipidemia as the potential risk factor of cognitive dysfunction was discussed [44,45,46,47]. Carcinoma, melanoma, cancer, and squamous in the last row are the terms related to cancer [39,40,41,42].

Quantitative analysis

We quantified how the topic terms learned by the topic models are: 1) changed in CI patients when they approach physician-diagnosed CI, comparing year by year for the past 5 years (experiment 1), and 2) distinct between CI and CU patients for the entire past 5 years (experiment 2). We utilized aggregated term frequency in the topic terms over time.

For the first approach (experiment 1), the differences of topic term frequencies between two consecutive years prior to CI diagnosis were computed (starting from 1 year prior to the CI diagnosis), repeated for each year, for the whole 5 year period. We used 400 topic terms for each year. This may allow us to identify potential topic terms associated with CI development because we may observe more frequent topic terms that are relevant to CI when it approaches the CI diagnosis date. For the second approach (experiment 2), we also used the same approach of aggregated term-frequency differences but for the entire 5-year period. In this way, the common topic terms between CI and CU patients might be sorted out and the remaining terms are likely the ones associated with CI. The reason we used the entire 5 years was that we have not observed any significance comparing year by year.

Figure 4 shows the high-level concept of our approach using aggregated term differences. The result of these approaches is visualized in Figs. 5, 6, 7, 8, 9 and 10. The larger words denote that they appear more frequently in the result of topic modeling on clinical notes compared to the previous year (experiment 1), or in the whole 5 years (experiment 2) (the corresponding individual raw data in Figs. 5, 6, 7, 8, 9 and 10 are located in Tables in Appendix). The results were compared with the recent publication to verify whether this approach generates meaningful outcomes relevant to CI.

Fig. 4
figure 4

Aggregated term frequencies. The first table shows the frequency one year before CI development, middle table is the frequency two year before CI development. Last table is the result which terms repeated most

Fig. 5
figure 5

Topic terms for CI patients - TKM (Experiment 1)

Fig. 6
figure 6

Topic terms for CI patients - KATE (Experiment 1)

Fig. 7
figure 7

Topic terms for CI patients - LDA (Experiment 1)

Fig. 8
figure 8

Topic terms in the TKM model (Experiment 2)

Fig. 9
figure 9

Topic terms in the KATE model (Experiment 2)

Fig. 10
figure 10

Topic terms in the LDA model (Experiment 2)

A disease, “lymphoma” was seen in multiple results (Figs. 5a, 6b, c, 7a, b, c, 8b, 9b, c and 10a, c), which appeared in Hodgkin lymphoma patients complaining about cognitive deterioration and fatigue [48]. A researcher found that cognitive decline was more severe and frequent in Hodgkin lymphoma patients compared to the healthy population [48]. Based on recent study patients with “nocturnal hypoxia” had poor memory retention compared with healthy individuals [49]. Indeed, “oximetry” (Fig. 5a) is a device able to measure the oxygen saturated in the blood in hypoximia patients.

In another study [50], a researcher demonstrated that “global cerebral edema” is a vital risk factor for cognitive dysfunction which we see more frequently in Figs. 5a, 6a, and 10b,c. Researchers studied the association between cancer and cognitive decline in older ages [39,40,41,42]. They concluded that cancer therapy could negatively impact cognition in some patients. Regarding to this, the word “metastasi,” “squamous,” “chemotherapy,” “oxaliplatin,” and “carcinoma” can be seen in Figs. 5c, 7b, c, 8a, b, 9b, and 10a, c. It has been explored that “tinnitus patients” are more at risk of the cognitive deficit as shown in Fig. 5b, c [51]. The word “bevacizumab” in Fig. 8a is a cancer medicine that interferes with the growth of a cancer cell in the body. Indeed, it is used to treat certain types of brain cancer or kidney cancer. The relation between urinary disease and CI has been investigated in several studies (Figs. 6c and 8b) [46, 47]. The words like “depression,” “confusion,” “memory,” and “pressure,” which has been already known as the sign of CI can be seen in the Figs. 6b, 7a, b, c, 9b, c, and 10b, c.

A couple of the studies explored the relationship between CI in late life and hyperlipidemia, hypertension, and coronary (Figs. 6a, 9a, 8c, and 10c). Heavy snoring and sleep apnea in Figs. 6a, b, c and 8a have been investigated largely by researchers which shows a strong link to earlier cognitive decline [37]. An apnea/hypopnea index is an index, which is usually used to indicate the severity of sleep apnea in patients, is another extracted topic repeated 8 times more in the CI population compared with CU. CPAP is used to treat sleep-related breathing disorders including sleep apnea (Fig. 8c).

Diabetes diseases have been identified as a potential risk of cognitive dysfunction [38] and regarding that topic diabetes, glucose, and sugar [44, 45, 52] can be seen at Figs. 6c, 9c, and 10a. In [53], researchers showed that memory impairment has a particular association with the presence of left ventricular hypertrophy (Figs. 9b and 10b). Atrial Fibrillation has been studied at [54] as a risk factor of cognitive decline (Figs. 6c, 8c, and 9c). We can find the relation between “osteomyel” patients and CI at [55] as illustrated in Fig. 8b.

In [56] researcher explored that after ischemia cognitive function is disrupted (Fig. 8c). Figures 8b and 10b, c indicate the word “lung.” Some studies including researchers at [57] discussed lung diseases as a determinant of cognitive decline.

Apart from the topics and words discussed here, there are some words whose frequency was high in the years close to CI diagnosis, so they are bold and large. Some of them, for example, caregiver, care, exercise, and neuropathy may be indirectly relevant to CI. However, there are common words like boilerplate such as problem, pain, sudden, disease, status, which can appear in all diseases and need to be filtered out.


It is important to identify early signs of CI and thus clinicians plan accordingly and perform appropriate actions, relieving potential cost and burden. In this study, we examined basic EHR corpus statistics relevant to CI patients, and analyzed temporal trends of patient ADL over time and topics in clinical notes between CI and CU patient groups in order to characterize and better understand elderly patient’s medical conditions before they develop CI.

The consultation was the most significant event type, and the neurology was the most dominant practice setting first to diagnose CI by physicians. The consistently higher number of clinical notes for CI patients than CU patients presumably concludes that CI patients likely visit hospitals or clinics more than CU patients. Temporal trends of individual ADL and the groups of ADL (i.e., b-ADL and i-ADL) have been examined over time back in 5 years before the first physician-diagnosed CI and the latest visit for CU patients, respectively. It was observed that the trajectories of ADL deterioration became steeper in CI patients than CU patients approximately 1 to 1.5 year(s) before the actual physician diagnosis of CI. More notably, the deterioration of i-ADL was worse than that of b-ADL in CI patients during this period, which was not in the case in CU patients. Considering a significant delay in CI diagnosis and a missing opportunity for appropriate plans in the current practice [4, 5], this observation may be beneficial to promote early detection of CI. The trajectories of bathing (b-ADL) and responsibility for own medication (i-ADL) deteriorated much more rapidly in CI patients than CU patients over time. These measures might also be a potential surrogate symptom to facilitate early CI diagnosis.

The result of this study suggests that using topic modeling can benefit to discover meaningful and hidden topics and terms of the clinical notes. The result was promising as we discussed in the qualitative and quantitative analysis. We observed that the words in the topic were mostly correlated and captured the underlying semantics. The model was able to extract the words relevant to CI; the words like hypertension, depress, and memory which are a potential indication associated with CI. We were also able to come up with other potential factors that may be relevant to CI according to the recent publications.

Overall, the recent models TKM and KATE were better at capturing the semantically meaningful representation of the data compared to LDA. Further, KATE model generated more words related to CI which falls in memory, depression, hypertension, dizziness, and confusion category than TKM model. We validated the results of the topic modeling based on aggregated term frequencies. The results were visualized to show the hidden potential topics that may contribute to developing CI. These results were validated by recent publications and showed promising outcomes. However, some common topic words, not relevant to CI but may appear in any diseases, were also captured. A further post-process would be required to filter out them.

Generally, CI is diagnosed by health professionals through asking questions to patients to assess memory, concentration, and understanding. However, it is not routinely performed in many healthcare institutions, causing a delay in timely CI diagnosis. Considering this fact, our study of the use of EHR free text to analyze early signals of CI would be a potential alternative to automate or support CI assessment and thus to facilitate a routine practice to detect CI in advance.

The limitations of this study include the use of physician-diagnosed CI, which does not differentiate the severity of CI, instead of full assessment or test due to its unavailability. However, our study is still useful since the focus of this study is to explore the use of EHR documentation to promote early detection of CI, considering the significant delay in CI diagnosis by clinicians in the current health care practice. Another limitation would be a potential imbalanced distribution of clinical notes for certain illnesses (e.g., cancer patients are seen more than others and have more clinical notes). This may affect the result of topic modeling; however, we examined a broad range of topics and demonstrated good potential applicability.


There exist notable differences in temporal trends of b-ADL and i-ADL between CI and CU patients, approximately 1 to 1.5 year(s) earlier than actual physician-diagnosis CI—i.e., the steeper slope of overall ADL deterioration and worse i-ADL than b-ADL in CI patients during this period. The trajectories of certain individual ADL (bathing and responsibility of own medication) were closely associated with the CI development. The topics and terms over time obtained by topic modeling methods from clinical free text have the potential to show how CI patient’s conditions evolve and reveal overlooked conditions when they close to CI diagnosis. These observations may promote early detection of CI and thus expedite appropriate care of underlying diseases and comorbid conditions. In the future, we plan to use neuroimaging and assessment data to identify the more granular classification of cognitive function and develop a prediction model leveraging our observations to detect patients in high risk of different stages of CI and identify associated longitudinal risk factors.



Alzheimer’s disease


Activity of daily living


Cognitive impairment


Cognitively unimpaired


Electronic health record


functional magnetic resonance imaging


International classification of functioning, disability and health


K competitive autoencoder for text


Latent Dirichlet allocation


Mild cognitive impairment


Topic keyword model


  1. Centers for Disease Control and Prevention (CDC. Ten great public health achievements--United States, 1900-1999. MMWR. Morbidity and mortality weekly report. 1999;48(12):241.

  2. He W, Larsen LJ. Older americans with a disability, 2008–2012. Washington DC: US Census Bureau; 2014.

    Google Scholar 

  3. Family Caregiver Alliance. Incidence and prevalence of the major causes of brain impairment. San Francisco; 2013. Available from: Accessed 22 Feb 2019.

  4. Langa KM, Chernew ME, Kabeto MU, Herzog AR, Ofstedal MB, Willis RJ, et al. National estimates of the quantity and cost of informal caregiving for the elderly with dementia. J Gen Intern Med. 2001;16(11):770–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Appendix O. World alzheimer report 2015 the global impact of dementia; 2015.

    Google Scholar 

  6. Chodosh J, Petitti DB, Elliott M, Hays RD, Crooks VC, Reuben DB, et al. Physician recognition of cognitive impairment: evaluating the need for improvement. J Am Geriatr Soc. 2004;52(7):1051–9.

    Article  PubMed  Google Scholar 

  7. Bradford A, Kunik MEM, Schulz P, Williams SPS, Singh H. Missed and delayed diagnosis of dementia in primary care: prevalence and contributing factors. Alzheimer Dis Assoc Disord. 2009;23(4):306–14.

    Article  PubMed  PubMed Central  Google Scholar 

  8. McPherson S, Schoephoerster G. Screening for dementia in a primary care practice. Minn Med. 2012;95(1):36–40.

    PubMed  Google Scholar 

  9. Galvin JE, Sadowsky CH. Practical Guidelines for the Recognition and Diagnosis of Dementia. J Am Board Fam Med. 2012;25(3):367–82.

    Article  PubMed  Google Scholar 

  10. Boustani M, Peterson B, Hanson L, Harris R, Lohr KN. Clinical guidelines. Screening for dementia in primary care: a summary of the evidence for the U.S. preventive services task force. Ann Intern Med. 2003;138(11):927–37.

    Article  PubMed  Google Scholar 

  11. Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E. Mild cognitive impairment: clinical characterization and outcome. Arch Neurol. 1999;56(3):303–8.

    Article  CAS  PubMed  Google Scholar 

  12. Katz S, AB F, RW M, BA J, MW J. Studies of illness in the aged: The index of adl: a standardized measure of biological and psychosocial function. JAMA. 1963;185(12):914–9.

    Article  CAS  PubMed  Google Scholar 

  13. Lawton MP, Brody EM. Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontologist. 1969;9(3):179–86.

    Article  CAS  PubMed  Google Scholar 

  14. Hartigan I. A comparative review of the Katz ADL and the barthel index in assessing the activities of daily living of older people. Int J Older People Nursing. 2007;2(3):204–12.

    Article  Google Scholar 

  15. Yang M, Ding X, Dong B. The measurement of disability in the elderly: a systematic review of self-reported questionnaires. J Am Med Dir Assoc. 2014;15(2):150.e1–9.

    Article  Google Scholar 

  16. Mlinac ME, Feng MC. Assessment of activities of daily living, self-care, and independence. Arch Clin Neuropsychol. 2016;31(6):506–16.

    Article  PubMed  Google Scholar 

  17. Marshall GA, Amariglio RE, Sperling RA, Rentz DM. Activities of daily living: where do they fit in the diagnosis of alzheimer’s disease? Neurodegener Dis Manag. 2012;2(5):483–91.

    Article  PubMed  PubMed Central  Google Scholar 

  18. De Marco M, Beltrachini L, Biancardi A, Frangi AF, Venneri A. Machine-learning support to individual diagnosis of mild cognitive impairment using multimodal MRI and cognitive assessments. Alzheimer Dis Assoc Disord. 2017;31(4):1–9.

    Article  Google Scholar 

  19. Thung KH, Yap PT, Adeli E, Lee SW, Shen D. Conversion and time-to-conversion predictions of mild cognitive impairment using low-rank affinity pursuit denoising and matrix completion. Med Image Anal. 2018;45:68–82.

    Article  PubMed  Google Scholar 

  20. Hojjati SH, Ebrahimzadeh A, Khazaee A, Babajani-Feremi A. Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM. J Neurosci Methods. 2017;282:69–80.

    Article  PubMed  Google Scholar 

  21. So A, Hooshyar D, Park K, Lim H. Early Diagnosis of dementia from clinical data by machine learning techniques. Appl Sci. 2017;7(7):651.

    Article  Google Scholar 

  22. Spasov S, Passamonti L, Duggento A, Lio P, Toschi N, Initiative AsDN. A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease. Neuroimage. 2019;189:276–87.

    Article  PubMed  Google Scholar 

  23. Asgari M, Kaye J, Dodge H. Predicting mild cognitive impairment from spontaneous spoken utterances. Alzheimers Dement Transl Res Clin Interv. 2017;3(2):219–28.

    Article  Google Scholar 

  24. Alahmadi HH, Shen Y, Fouad S, Luft CDB, Bentham P, Kourtzi Z, et al. Classifying Cognitive Profiles Using Machine Learning with Privileged Information in Mild Cognitive Impairment. Front Comput Neurosci. 2016;10(November):1–17.

    Google Scholar 

  25. Pushkar B, Paul M. Early diagnosis of Alzheimer’s disease: a multi - class deep learning framework with modified k- sparse autoencoder classification. In: Int Conf Image Vis Comput New Zeal; 2017. p. 1–6.

    Google Scholar 

  26. Pereira T, Ferreira FL, Cardoso S, Silva D, de Mendonça A, Guerreiro M, et al. Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer’s disease: a feature selection ensemble combining stability and predictability. BMC Med Inform Decis Mak. 2018;18(1):137.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Pereira T, Lemos L, Cardoso S, Silva D, Rodrigues A, Santana I, et al. Predicting progression of mild cognitive impairment to dementia using neuropsychological data: a supervised learning approach using time windows. BMC Med Inform Decis Mak. 2017;17(1):1–15.

    Article  Google Scholar 

  28. Kato S, Endo H, Homma A, Sakuma T, Watanabe K. Early detection of cognitive impairment in the elderly based on Bayesian mining using speech prosody and cerebral blood flow activation. Conf Proc Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:5813–6.

    Google Scholar 

  29. Amra S, O’Horo JC, Singh TD, Wilson GA, Kashyap R, Petersen R, et al. Derivation and validation of the automated search algorithms to identify cognitive impairment and dementia in electronic health records. J Crit Care. 2017;37:202–5.

    Article  PubMed  Google Scholar 

  30. Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, et al. An Information Extraction Framework for Cohort Identification Using Electronic Health Records. AMIA Summits Transl Sci Proc. 2013;2013:149–53.

    PubMed  Google Scholar 

  31. Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. J Am Med Inform Assoc. 2011;18(5):580–7.

    Article  PubMed  PubMed Central  Google Scholar 

  32. International Classification of Functioning, Disability and Health (ICF). Available from: Accessed 22 Feb 2019.

  33. Blei DM, Edu BB, Ng AY, Edu AS, Jordan MI, Edu JB. Latent Dirichlet Allocation. J Mach Learn Res. 2003;3:993–1022.

    Google Scholar 

  34. Chen Y, Zaki MJ. Kate: K-competitive autoencoder for text. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 Aug 13 (pp. 85-94). ACM.

  35. Schneider J, Vlachos M. Topic modeling based on keywords and context. InProceedings of the 2018 SIAM International Conference on Data Mining 2018 May 7 (pp. 369-377). Society for Industrial and Applied Mathematics.

  36. Neu D, Kajosch H, Peigneux P, Verbanck P, Linkowski P, Le Bon O. Cognitive impairment in fatigue and sleepiness associated conditions. Psychiatry Res. 2011;189(1):128–34.

    Article  PubMed  Google Scholar 

  37. Spira AP, Chen-Edinboro LP, Wu MN, Yaffe K. Impact of sleep on the risk of cognitive decline and dementia. Curr Opin Psychiatry. 2014;27(6):478–83.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Miller K. Diabetes Linked to Cognitive Problems. Available from: Accessed 22 Feb 2019.

  39. Jean-Pierre P. Management of Cancer-related Cognitive Dysfunction— Conceptualization Challenges and Implications for Clinical Research and Practice. US Oncol. 2010;016820:9–12.

    Google Scholar 

  40. Magnuson A, Mohile S, Janelsins M. Cognition and Cognitive Impairment in Older Adults with Cancer. Curr Geriatr Reports. 2016;5(3):213–9.

    Article  Google Scholar 

  41. Jones D, Vichaya EG, Wang XS, Sailors MH, Cleeland CS, Wefel JS. Acute cognitive impairment in patients with multiple myeloma undergoing autologous hematopoietic stem cell transplant. Cancer. 2013;119(23):4188–95.

    Article  PubMed  Google Scholar 

  42. Pakzad A, Obad N, Espedal H, Stieber D, Keunen O, Sakariassen PØ, et al. Bevacizumab treatment for human glioblastoma. Can it induce cognitive impairment? Neuro-Oncology. 2014;16(5):754–6.

    Article  CAS  Google Scholar 

  43. Cheng Y, Jin Y, Unverzagt FW, Su L, Yang L, Ma F, et al. The relationship between cholesterol and cognitive function is homocysteine-dependent. Clin Interv Aging. 2014;9:1823–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Zheng L, MacK WJ, Chui HC, Heflin L, Mungas D, Reed B, et al. Coronary artery disease is associated with cognitive decline independent of changes on magnetic resonance imaging in cognitively normal elderly adults. J Am Geriatr Soc. 2012;60(3):499–504.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Iadecola C, Yaffe K, Biller J, Bratzke LC, Faraci FM, Gorelick PB, et al. Impact of hypertension on cognitive function: a scientific statement from the american heart association. Hypertension. 2016;68(6):e67–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chiang C-H, Wu M-P, Ho C-H, Weng S-F, Huang C-C, Hsieh W-T, et al. Lower urinary tract symptoms are associated with increased risk of dementia among the elderly: a nationwide study. Biomed Res Int. 2015;2015:1–7.

    Article  Google Scholar 

  47. Fenske M. Urinary cortisol excretion: is it really a predictor of incident cognitive impairment? Neurobiol Aging. 2007;28(11):1791–2.

    Article  PubMed  Google Scholar 

  48. Trachtenberg E, Mashiach T, Ben Hayun R, Tadmor T, Fisher T, Aharon-Peretz J, Dann EJ. Cognitive impairment in hodgkin lymphoma survivors. Br J Haematol. 2018;182(5):670-8.

    Article  PubMed  Google Scholar 

  49. Park SY, Kim SM, Sung JJ, Lee KM, Park KS, Kim SY, et al. Nocturnal Hypoxia in ALS Is Related to Cognitive Dysfunction and Can Occur as Clusters of Desaturations. PLoS One. 2013;8(9):1–5.

    Google Scholar 

  50. Kreiter KT, Copeland D, Bernardini GL, Bates JE, Peery S, Claassen J, et al. Predictors of cognitive dysfunction after subarachnoid hemorrhage. Stroke. 2002;33(1):200–8.

    Article  PubMed  Google Scholar 

  51. Wang Y, Zhang JN, Hu W, Li JJ, Zhou JX, Zhang JP, et al. The characteristics of cognitive impairment in subjective chronic tinnitus. Brain Behav. 2018;8(3):1–9.

    Article  Google Scholar 

  52. Ma C, Yin Z, Zhu P, Luo J, Shi X, Gao X. Blood cholesterol in late-life and cognitive decline: A longitudinal study of the Chinese elderly. Mol Neurodegener. 2017;12(1):1–9.

    Article  CAS  Google Scholar 

  53. Restrepo C, Patel SK, Rethnam V, Werden E, Ramchand J, Churilov L, et al. Left ventricular hypertrophy and cognitive function: A systematic review. J Hum Hypertens. 2018;32(3):171–9.

    Article  CAS  PubMed  Google Scholar 

  54. Udompanich S, Lip GYH, Apostolakis S, Lane DA. Atrial fibrillation as a risk factor for cognitive impairment: a semi-systematic review. QJM. 2013;106(9):795–802.

    Article  CAS  PubMed  Google Scholar 

  55. Tseng CH, Huang WS, Muo CH, Kao CH. Increased risk of dementia among chronic osteomyelitis patients. Eur J Clin Microbiol Infect Dis. 2014;34(1):153–9.

    Article  PubMed  Google Scholar 

  56. Stradecki-Cohan HM, Cohan CH, Raval AP, Dave KR, Reginensi D, Gittens RA, et al. Cognitive deficits after cerebral ischemia and underlying dysfunctional plasticity: potential targets for recovery of cognition. J Alzheimers Dis. 2017;60(s1):S87–105.

    Article  PubMed  Google Scholar 

  57. Dodd JW. Lung disease as a determinant of cognitive decline and dementia. Alzheimers Res Ther. 2015;7(1):1–8.

    Article  Google Scholar 

Download references


Not applicable.

Availability of the data and materials

The data are not publicly shareable as they contain protected health information.


This study was partially supported by National Institute of Allergy and Infection Diseases R21AI 142702–1. The publication costs are funded by the Division of Digital Health Sciences at Mayo Clinic.

About this supplement

This article has been published as part of BMC Medical Informatics and Decision Making Volume 19 Supplement 4, 2019: Selected articles from the Third International Workshop on Semantics-Powered Data Analytics (SEPDA 2018). The full contents of the supplement are available online at

Author information

Authors and Affiliations



SG and SS conceived the study and design. SG and SS acquired the data, implemented the algorithms, and draft the initial manuscript. All authors participated in interpretation of the data and contributed to manuscript revisions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sunghwan Sohn.

Ethics declarations

Ethics approval and consent to participate

This project was approved by the Mayo Clinic and Olmsted Medical Center Institutional Review Boards. Because the study involved use of data collected from existing medical records, and included information from deceased patients, the requirement for informed consent was waived. However, the State of Minnesota requires that patients provide a general “research authorization” for their medical records to be used for research, and we only included the records of patients who had provided research authorization.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Tables 5, 6, 7 contain sample raw data in experiment 1, corresponding to Figs. 5, 6 and 7. It illustrates how many times of a given term appeared more in clinical notes of CI patients than the previous year.

Table 5 Raw data used in Fig. 5 TKM Experiment 1
Table 6 Raw data used in Fig. 6 KATE Experiment 1
Table 7 Raw data used in Fig. 7 LDA Experiment 1

Tables 8, 9, 10 contains sample raw data in experiment 2, corresponding to Figs. 8, 9 and 10. It illustrates how many times a given term appeared more in clinical notes of CI patients than CU patients for the past 5 years.

Table 8 Raw data used in Fig. 8 TKM Experiment 2
Table 9 Raw data used in Fig. 9 KATE Experiment 2
Table 10 Raw data used in Fig. 10 LDA Experiment 2

In the following tables, “Rate of increase” denotes number of times for a given term appeared more for the first year before CI diagnosis compared to the previous year (Tables 5, 6, 7) or for the past 5 years (Tables 8, 9, 10).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goudarzvand, S., St. Sauver, J., Mielke, M.M. et al. Early temporal characteristics of elderly patient cognitive impairment in electronic health records. BMC Med Inform Decis Mak 19 (Suppl 4), 149 (2019).

Download citation

  • Published:

  • DOI: