Skip to main content

Prevalence and demographic variation of cardiovascular, renal, metabolic, and mental health conditions in 12 million english primary care records



Primary care electronic health records (EHR) are widely used to study long-term conditions in epidemiological and health services research. Therefore, it is important to understand how well the recorded prevalence of these conditions in EHRs, compares to other reliable sources overall, and varies by socio-demographic characteristics. We aimed to describe the prevalence and socio-demographic variation of cardiovascular, renal, and metabolic (CRM) and mental health (MH) conditions in a large, nationally representative, English primary care database and compare with prevalence estimates from other population-based studies.


This was a cross-sectional study using the Clinical Practice Research Datalink (CPRD) Aurum primary care database. We calculated prevalence of 18 conditions and used logistic regression to assess how this varied by age, sex, ethnicity, and socio-economic status. We searched the literature for population prevalence estimates from other sources for comparison with the prevalences in CPRD Aurum.


Depression (16.0%, 95%CI 16.0–16.0%) and hypertension (15.3%, 95%CI 15.2–15.3%) were the most prevalent conditions among 12.4 million patients. Prevalence of most conditions increased with socio-economic deprivation and age. CRM conditions, schizophrenia and substance misuse were higher in men, whilst anxiety, depression, bipolar and eating disorders were more common in women. Cardiovascular risk factors (hypertension and diabetes) were more prevalent in black and Asian patients compared with white, but the trends in prevalence of cardiovascular diseases by ethnicity were more variable. The recorded prevalences of mental health conditions were typically twice as high in white patients compared with other ethnic groups. However, PTSD and schizophrenia were more prevalent in black patients. The prevalence of most conditions was similar or higher in the primary care database than diagnosed disease prevalence reported in national health surveys. However, screening studies typically reported higher prevalence estimates than primary care data, especially for PTSD, bipolar disorder and eating disorders.


The prevalence of many clinically diagnosed conditions in primary care records closely matched that of other sources. However, we found important variations by sex and ethnicity, which may reflect true variation in prevalence or systematic differences in clinical presentation and practice. Primary care data may underrepresent the prevalence of undiagnosed conditions, particularly in mental health.

Peer Review reports


Cardiovascular, renal, and metabolic (CRM) and mental health (MH) conditions (listed in Box 1) are amongst the most common causes of death and disability globally, [1,2,3,4,5] with MH conditions alone accounting for almost a third of the global burden of years lived with disability [1]. Primary care electronic health records (EHR) databases are routinely used in observational studies of the epidemiology of these long-term health conditions [6]. Clinical Practice Research Datalink (CPRD) Aurum is a relatively new primary care EHR database, with a number of strengths stemming from the richness of the nationally representative routinely collected data, which captures patient demographics, diagnoses, test results, and prescriptions for over 19 million patients [7]. However, there are recognised limitations to EHR data and there are inevitably disparities between self-reported health status and conditions reported in EHRs with variation in case-detection rate according to age, sex and other demographic characteristics. [8,9,10]. A recent US study found varying agreement between self-reported survey answers and EHR diagnoses data, with 81% positive agreement for type 2 diabetes and 59% positive agreement for depression [9].

Objective clinical investigations are typically used to diagnose CRM conditions (e.g. glycosylated haemoglobin (HbA1c) for diabetes or computed tomography (CT) for strokes), although there is still considerable potential for both under and over diagnosis of these conditions [11]. On the other hand, MH diagnoses are based on clusters of symptoms with an element of subjectivity on the part of the diagnosing clinician, especially in milder cases, and there are also a number of recognised barriers to seeking help for MH conditions including societal stigma and difficulties in asking for and accessing support, which may lead to underdiagnosis [12]. The extent of these barriers is likely to vary according to ethnicity, sex and socio-economic status [13]. Furthermore, conditions that are primarily diagnosed in secondary care, might not be as well captured in primary care records where there is inefficient information transfer between hospitals and GP practices. Studies comparing primary care EHR to hospital episode statistics have shown that only around 60% of hospital admissions for stroke were recorded in primary care EHRs [14]. These factors may lead to disparities between the prevalence of diagnoses in EHRs and screen-detected prevalence estimates for MH conditions across both socio-demographic characteristics and when compared with CRM conditions. However, there is a paucity of research that has examined the extent of these disparities in primary care records for this range of conditions, particularly in CPRD and other UK EHR databases.

It is also valuable to compare the prevalence of health conditions in CPRD Aurum with those from other sources (e.g., national health surveys, screening studies) to understand the strengths and limitations of current and future epidemiological research using CPRD Aurum and other similar EHR databases. Therefore, the primary objective of this study was to describe the prevalence of selected CRM and MH conditions within this database and assess variation in the prevalence of reported conditions by categories of age, sex, ethnicity, and socio-economic deprivation. Secondly, we aimed to compare the prevalence of the conditions in this database against the prevalence within the UK (or similar countries) general population in three other sources in the literature: (1) other primary care EHR databases; (2) self-reports of doctor-diagnosed conditions in nationally representative surveys; and (3) screening studies.


Study design, data source and population

This was a cross-sectional analysis of the CPRD Aurum database, which contains routinely-collected primary care EHRs from 1,444 general practices across England using EMIS Web® patient records software [7]. Clinical observations, diagnoses and treatments are recorded as Read Version 2, SNOMED-CT, and EMIS Web® clinical codes. The full data resource profile has been described elsewhere [7]. A cross-sectional dataset was extracted for analyses using the Data Extraction for Epidemiological Research (DExtER) tool [15]. Data for these analyses included all patients who were alive and permanently registered with a participating practice on 1st January 2020 (this date was chosen so that results would not be influenced by the impact of the SARS-CoV-2 pandemic on primary care activity and data recording). Patients were only included if there were at least 12 months of acceptable data recording prior to the index date (1st January 2020). Acceptable data was determined using the “acceptable patient flag” data quality measure provided by CPRD: (consistent recording of events including date of birth, practice registration date and transfer out date, and valid age and gender) [7, 16]. The dataset includes patients’ year of birth, sex, ethnicity, and their socio-economic status (index of multiple deprivation (IMD) quintile). Results of this cross-sectional analysis were compared against population prevalence of these same conditions determined from a literature review.

Selection of cardio-renal-metabolic and mental health conditions

A recent Delphi study has identified key conditions that are important to patient and research stakeholders for inclusion in research into patients with multiple long term conditions [17]. From the results of this study, and after discussions within our clinical team and patient advisory group, eight MH and ten CRM conditions were selected for inclusion in our analyses (see Box 1). We included all recommended cardiovascular conditions from the Delphi study except for venous thromboembolic disease as we have focused on chronic rather than acute conditions. We included all recommended “mental health” conditions except for autism and dementia as these are neurodevelopmental and neurodegenerative conditions, respectively. We also added diabetes and chronic kidney disease (CKD) as these are highly prevalent chronic conditions which are closely related to cardiovascular disease.

Box 1: Included conditions

Outcome measures in CPRD Aurum

Prevalent cases for all conditions were identified using disease-specific clinical codelists. Codelists were developed through collaboration by a team of clinicians in the Universities of Birmingham and Cambridge using a rigorous, systematic process via the DExtER codebuilder tool, with search strategies recorded using a consistent coding checklist. We began by reviewing all existing Quality and Outcomes Framework (QOF) codelists, [18] and published codelists for UK primary care EHR analyses, including HDRUK Phenotype library, [19] OpenCodelists, [20] and CPRD @ Cambridge Codelists [21]. Lists were adapted or, where they did not exist, created anew for CPRD Aurum using the hierarchical Read code system, the NHS Digital SNOMED CT term browser, [22] and the DExtER codebuilder tool to search for relevant text words for symptoms, diagnoses, clinical findings, and interventions that indicated a diagnosis of condition. Finally, codelists, conventions and queries were reviewed and agreed among the team at regular clinical coding meetings. Codelists can be found at

For hypertension and CKD, prescriptions and clinical biomarkers were also used as a secondary method of determining prevalence estimates. Hypertension was defined (according to the same methods as the Health Survey for England [23] to enable comparison) as prescription of an antihypertensive medication in the six months prior to 1st January 2020, or most recent blood pressure within the past three years > 140/90mmHg. CKD was defined as the most recent estimated glomerular filtration rate (eGFR) < 60ml/min/1.73 m² within the past three years prior to 1st January 2020.

Outcome measures in comparator sources

A literature review was undertaken to identify, for each condition, three estimates for UK population prevalence:

UK primary care electronic records prevalence

  • Previous analyses of UK primary care electronic records databases using clinical codes to detect prevalent cases. Where available, QOF data were the ideal comparator as the Quality and Outcomes Framework programme uses data collected from 96% of general practices in England [18]. Practices are financially incentivised via QOF to keep accurate disease registers of patients with specific conditions according to nationally agreed standards. For conditions not included in QOF we used cross-sectional, or cohort studies analysing data from other UK EHRs.

Self-reported doctor-diagnosed prevalence

  • Prevalence estimates identified from studies using methods other than primary care EHRs for detection of cases that have been diagnosed by a healthcare professional. These estimates primarily came from two large cross-sectional studies: the Health Survey for England (HSE) and the Adult Psychiatric Morbidity Survey (APMS) where a nationally representative sample of the UK population were surveyed face-to-face and asked about their health conditions [14, 23].

Screen-detected prevalence

  • Prevalence estimates identified from studies that involved screening of a representative sample of the population using a reference standard diagnostic technique. For example, the Health Survey for England (HSE) used HbA1c blood tests from the representative sample to estimate population prevalence of diabetes [23].

Search strategy

A pragmatic approach was used to identify relevant sources for each condition; where available, prevalence statistics reported within Public Health England fingertips resources, [24] NHS Digital resources, [25] and NICE Clinical Knowledge Summaries were used [26]. Further details of the search strategy can found in Additional file 1. Where QOF, HSE, or AMPS data were not available, PubMed, and Google scholar databases were systematically searched for cross-sectional and longitudinal studies using a Boolean search strategy; “condition name” AND “prevalence” OR “epidemiology”. For EHR prevalence, we added: AND abbreviated and unabbreviated names of these established UK-based primary care EHR databases (e.g., “THIN” and “The Health Improvement Network”). For screen-detected prevalence we added: AND “screening”.

Study selection criteria

Studies were included if they reported the most recent available prevalence of any of the conditions using cross-sectional or cohort study data (or a meta-analysis of these), representative of the general population prior to 1st January 2020. They were excluded if they contained fewer than 500 patients or were based on a subpopulation within a specific disease. The most recent study within a large and comparable population was selected. This was ideally a UK population study, but if this was not available then studies within European or other high-income countries were used. Further details and methods of data collection for all comparator studies were summarised in Additional file 1, and in Additional Table 1, Additional Table 2, and Additional Table 3 within that additional file.

Statistical methods

CPRD aurum prevalence analysis

Frequencies, percentages, and cross-tabulations were used to describe the prevalence of each condition across the entire population and by sociodemographic characteristics with age, sex, ethnicity groups, and deprivation quintiles all treated as categorical variables. Age at entry was categorised into the following age groups: 0–16, 17–30, 31–40, 41–50, 51–60, 61–70, and ≥ 70 years. Ethnicity was categorised into five groups based on those used in the UK Census: white, Asian, black, mixed, and other ethnicity (which includes Chinese, Middle Eastern and Pacific). Socioeconomic categories were based on the English Index of Multiple Deprivation (IMD) quintiles for the geographic area where the patient lives. Patients with missing data on ethnicity were assigned to a separate “missing” category and included in the regression analysis.

For each point estimate of prevalence, 95% confidence intervals (CI) for proportion were calculated using the Clopper-Pearson exact method [27]. Logistic regression was used to calculate the odds ratios of each condition by sociodemographic characteristics (with mutual adjustment). All statistical analyses were performed using Stata statistical software, V.16 (StataCorp, College Station, Texas, USA). Stata codes used for the analysis are publicly available here:

Comparator data prevalence analysis

Numerators (number of cases) and denominators (number of people sampled) and details of the data collection methods were extracted from each source identified in the literature review. Population prevalence and 95% confidence intervals for proportions were calculated for each condition in the same way as for the CPRD Aurum analysis.

Comparisons between prevalence estimates

For each comparison with the prevalence reported in the literature, a sample was created within CPRD Aurum containing all patients who matched the age profile of that population. For aortic aneurysms the only available comparator was from a screening programme that reported incidence within men in their 65th year, therefore a comparison was made with prevalence of aortic aneurysm in men aged 66 (to allow time for the diagnosis to be recorded in their records). For anxiety, the most appropriate comparator population prevalence estimates only measured prevalence of generalised anxiety disorder. Therefore, a new codelist for generalised anxiety disorder was created within CPRD Aurum for comparison. The prevalence estimates were compared using scatter graphs of observed vs. comparator prevalence using Microsoft Excel.


Cross-sectional analysis of primary care EHR

Almost 12.4 million patients within the CPRD Aurum database were eligible for inclusion in this analysis. The median length of follow up in this study was 10.2 years (IQR 4.4–20.9) Males and females were equally represented; 18% of the patients were under 16, 69% were between 16 and 70, and 13% were over 70 years old. Ethnicity was recorded for 80% of patients in the database, and of these 81% were White, 10% were Asian, 5% were Black, 2% were of other ethnicities, and 2% were of mixed ethnicity. Deprivation quintiles were equally distributed (~ 20%). Hypertension, affecting 15% of the study population and depression, affecting 16%, were the most common CRM and MH conditions respectively. The prevalence of those with each CRM and MH condition in the general population and by socio-demographic characteristics are shown in Tables 1 and 2.

Table 1 Prevalence of cardio-renal-metabolic conditions overall and by socio-demographic in the CPRD Aurum database, 2020
Table 2 Prevalence of mental health conditions overall and by socio-demographics within CPRD Aurum database, 2020

For analysis of prevalence by socio-demographic variables the adjusted odds ratios for prevalence of each condition by sex, deprivation quintile, age categories and ethnicity were calculated. These are presented in forest plots in Additional File 2.


Cardio-renal-metabolic conditions were more prevalent in men, except for CKD which was more prevalent in women. There was no difference in the prevalence of PTSD between men and women. Affective (depression, anxiety and bipolar) and eating disorders were more prevalent in women, whilst there were higher odds of substance and alcohol misuse and schizophrenia in men. [See Additional File 2; Supplementary Fig. 1]

Socio-economic status

There was a clear trend of increasing prevalence of almost all conditions with increasing socio-economic deprivation, with ORs in the order of 1.4 (aortic aneurysm) to 3.9 (substance misuse) greater in those from the most compared to the least deprived. Associations were weaker between deprivation and AF, heart valve disorders, and T1DM, and prevalence decreased with increasing deprivation for eating disorders. [See Additional File 2; Supplementary Fig. 2]

Age categories

There was a general trend of increasing lifetime prevalence for all cardio-renal-metabolic conditions (except for T1DM) with increasing age. There was a marked increase in prevalence of all mental health conditions after the age of 16. There was typically a gradual increase in lifetime prevalence of each mental health condition up until the age of 40–60 followed by a gradual decrease in recorded prevalence in the oldest age categories. Lower lifetime prevalence of a MH condition in those over 60 years old was most pronounced for substance abuse and PTSD. [See Additional File 2; Supplementary Fig. 3]


There was considerable variation in the prevalence of CRM and MH conditions by ethnicity. Among those of black and Asian ethnicities diabetes, hypertension, and CKD, were more prevalent than in those of white ethnicity, whilst aortic aneurysms, AF, PVD, heart valve disorders and T1DM were less prevalent in black or Asian people.

In CPRD data, mental health conditions were typically around twice as prevalent in those of white ethnicity as in those of black or Asian ethnicity, except for PTSD and schizophrenia, which were 33% more prevalent and twice as prevalent in those of black ethnicity.

[See Additional File 2; Supplementary Fig. 4]

Comparison of prevalence of health conditions in CPRD against literature

Figures 1, 2 and 3 compare the prevalence estimates from the literature within other UK primary care EHRs (Fig. 1), surveys of self-reports of doctor-diagnosed conditions (Fig. 2) and screening studies (Fig. 3), against the prevalence of each condition in an age matched population within CPRD Aurum. Prevalence estimates from the literature, with the data sources and methods of data collection are reported in Additional File 1; Tables 1, 2 and 3.

Prevalence in UK primary care EHRs

Figure 1 shows that for 5/10 CRM conditions, the prevalence in CPRD Aurum was similar to (< 20% difference relative to) available prevalence estimates for age-matched populations in QOF and other UK primary care EHRs [18]. However, the prevalence of heart valve disorders in CPRD Aurum in 65–95-year-olds (5.2% (95%CI 5.2–5.3%)) was more than double the prevalence reported in age-matched patients in THIN data (1.6% (95%CI 1.6–1.7%) [28]. The prevalences of IHD, T1DM, stroke and HF were between 20 and 55% higher in CPRD Aurum than in other UK primary care EHRs [18, 29,30,31].

Fig. 1
figure a

Comparison of condition prevalences in CPRD Aurum with prevalence estimates from other electronic health records. CKD = chronic kidney disease, IHD = ischaemic heart disease, AF = atrial fibrillation, HF = heart failure, PVD = peripheral vascular disease, BPAD = bipolar affective disorder

The prevalence of bipolar disorder in CPRD Aurum (0.4% (95%CI 0.4–0.4%)), was similar to (< 20% higher than) the prevalence estimate in the IQVIA Medical Research Database (IMRD) in 2018 (0.4% (95%CI 0.4–0.4%)) [32]. The prevalence of eating disorders and schizophrenia in CPRD Aurum were 20% and 34% higher respectively than prevalence estimates in CPRD Gold [33,34,35]. For depression and anxiety the age-matched prevalence in CPRD Aurum was around twice as high as in QOF and THIN data [18, 36].

Self-reported doctor-diagnosed prevalence

Figure 2 shows the prevalence of stroke, diabetes, and IHD in CPRD Aurum were similar to (< 20% difference relative to) self-reported doctor-diagnosed prevalence estimates in HSE [23, 37]. However, prevalence of CKD in over 16 year olds was more than twice as high in CPRD Aurum (4.4% (95%CI 4.4–4.4%) than in HSE data (2.0% (95%CI 1.6–2.4%)) [38]. Prevalence of T1DM and hypertension in CPRD Aurum were 23% and 34% higher than were reported by the National Diabetes Audit and HSE respectively [23, 30]. Prevalence of PVD in CPRD Aurum was 43% lower compared with the prevalence reported in UK Biobank [39].

Fig. 2
figure b

Comparison of condition prevalences in CPRD Aurum with self-reported doctor-diagnosed prevalence estimates from the literature. CKD = chronic kidney disease, IHD = ischaemic heart disease, HF = heart failure, PVD = peripheral vascular disease, BPAD = bipolar affective disorder, T1 diabetes = type 1 diabetes, PTSD = post-traumatic stress disorder

The prevalence of depression, schizophrenia and bipolar disorder in CPRD Aurum in over 16 year olds closely matched (< 20% relative difference to) those reported in HSE and APMS [14, 37]. The prevalence of eating disorders in CPRD Aurum was 41% lower reported in the HSE, [34]whilst for generalised anxiety disorder prevalence was 69% higher in CPRD Aurum than in the HSE [37]. Prevalence of alcohol misuse was three times higher in CPRD Aurum (5.4% (95%CI 5.4–5.4%)) than in HSE (1.2% (95%CI 1.0-1.5%)) [37]. However, prevalence of PTSD in CPRD Aurum (0.6% (95%CI 0.6–0.7%) was three times lower than that reported by HSE (1.9% (95%CI 1.5–2.2%)) [37].

Screen-detected prevalence

Figure 3 shows that for aortic aneurysms, CKD, IHD, AF and PVD, the prevalence estimates reported in CPRD Aurum matched (< 20% difference relative to) estimates of screen-detected prevalence in the same age groups in the literature [29, 38, 40,41,42]. For diabetes, hypertension, heart failure, and heart valve disorder the prevalence estimates in CPRD Aurum were around a third lower than in screening studies [23, 43, 44].

Fig. 3
figure c

Comparison of condition prevalences in CPRD Aurum with screening study prevalence estimates from the literature. BP = blood pressure, CKD = chronic kidney disease, IHD = ischaemic heart disease, AF = Atrial fibrillation, HF = heart failure, PVD = peripheral vascular disease, BPAD = bipolar affective disorder, PTSD = post-traumatic stress disorder

For substance misuse disorder, depression, and schizophrenia the prevalence estimates in CPRD Aurum were around 30% lower than in the APMS (2014) [14]. For generalised anxiety disorder and alcohol misuse disorder, the prevalence in CPRD Aurum were around 80% higher than in the European Study of the Epidemiology of Mental Disorders and APMS respectively [14, 45]. However, for eating disorders, bipolar disorder and PTSD, prevalence reported in the APMS and HSE were 4–6 times higher than in CPRD Aurum [14, 33].

Biomarkers for hypertension and CKD

When defined by use of antihypertensive medication or most recent blood pressure reading > 140/90mmHg (to match methodology in HSE) the prevalence of hypertension in CPRD Aurum in over 16-year-olds was 31.6% (95%CI 31.6–31.6%)), which was almost twice as high as the prevalence when defined by using clinical codes (19.1% (95%CI 19.0- 19.1%)). However, as shown in Fig. 3 it was similar to (< 20% difference relative to) the HSE screen-detected prevalence estimate (27.9% (95%CI 26.5–29.2%)) [23].

Prevalence of CKD in CPRD in over 16-year-olds was similar (< 20% relative difference) when measured using the most recent eGFR < 60ml/min/1.73 m² (to match methodology in HSE) (5.0% (95%CI 5.0–5.0%)) to both the prevalence in CPRD estimated using clinical codes (4.4% (95%CI 4.4–4.4%)) and the screen-detected prevalence in HSE (5.1% (95%CI 4.4–5.8%)) [38].


Main findings

This was a comprehensive analysis of the prevalence of cardio-renal-metabolic (CRM) and mental health (MH) conditions in 12 million patients in a primary care electronic health records (EHRs) database. There was a high burden of depression, anxiety, and hypertension across the population. As expected, most conditions reported in EHRs were increasingly prevalent with increasing deprivation and age, although mental health conditions were potentially under-represented in children. Most CRM conditions, schizophrenia and substance misuse were more prevalent in men, whilst anxiety, depression, bipolar and eating disorders were more common in women. Hypertension and diabetes were twice as prevalent in black patients compared with white patients and diabetes was three times as common in Asian patients. However, black and Asian patients generally had lower recorded prevalences of cardiovascular disease (aortic aneurysms, AF, PVD, HF, heart valve disorder, IHD, stroke) than white patients. Mental health conditions were reported twice as frequently in those of white ethnicity as in those of black or Asian ethnicity in EHRs, except for PTSD and schizophrenia, which were 33% more prevalent and twice as prevalent in those of black ethnicity respectively.

Estimates for prevalence of most clinically detected CRM conditions, as well as depression, anxiety, bipolar disorder, and schizophrenia in the EHR database were broadly similar or greater than the self-reported doctor-diagnosed prevalence reported in the Health Survey for England (HSE) and Adult Psychiatric Morbidity Survey (APMS). This suggests these conditions are well represented in EHRs. However, there were sizable differences in the prevalence of hypertension, diabetes, and depression in the EHR compared to other prevalence estimates from studies screening for these conditions. Screen-detected prevalence estimates for PTSD, bipolar disorder and eating disorders were 4–6 times higher than prevalence of these conditions in primary care EHR records, potentially reflecting a significant burden of underdiagnosed or less well documented MH morbidity.

Comparisons with other literature

In EHR the risk factors for cardiovascular disease (i.e., hypertension and diabetes) were more prevalent in black and Asian people than white people, but paradoxically this was not typically matched by higher prevalence of cardiovascular disease itself (i.e., PVD, aortic aneurysms, stroke, IHD). This has also been reported in other cohort studies analysing variation in prevalence of aortic aneurysms and peripheral artery disease by ethnicity [46, 47]. We found that AF was recorded twice as frequently in white patients compared with black and Asian patients. A previous cross-sectional analysis also found lower prevalence of AF recorded in African American patients’ records compared with white American patients, but no difference in prevalence with systematic unbiased testing [48]. Potential explanations have included differential uptake of screening in the case of aortic aneurysms, and under-diagnosis due to language barriers or lower-health literacy in Asian people regarding PVD symptoms [20].

These disparities may also reflect the higher premature death rate from IHD in Asian people compared to white people, thus susceptible Asian people do not survive long enough to.

develop symptoms of PVD [20]. Additionally, although these analyses of variation in prevalence by ethnicity are adjusted for age, given the strength of the association between age and cardiovascular diseases, the lower prevalence of cardiovascular disease in those of black and Asian ethnicity could reflect that in this database these populations were on average significantly younger than those of white ethnicity.

In mental health conditions there was typically significant reduction in prevalence in the over 70-year-olds compared with those aged 40–50, which may reflect earlier mortality for those diagnosed with these conditions at younger ages [49]. Reduced prevalence in the oldest adults is especially notable in eating disorders (see Additional File 2; Supplementary Fig. 3), which is the MH condition with highest mortality rate [50]. In the analyses of prevalence conditions by socio-demographic factors; it is important to note that those who have died before the index date were excluded from the sample so those with non-fatal disease may be over-represented in the survivors.

Prevalence of MH conditions recorded in the primary care EHR was comparatively very low in children. Depression was recorded 40 times more frequently in 17–30 year-olds compared with under-16 year-olds. The latest Mental Health of Children and Young People in England survey found that one in six people aged 6–16 years had a “probable” MH condition [51]. However, this reflects a wide range of mental health symptoms from mood and anxiety to attention and hyperactivity, rather than specific diagnoses. Nevertheless, there is likely to be considerable under-representation of the true prevalence of MH conditions in children in EHRs. Qualitative research suggests that whilst parents and children do not always report mental health symptoms to GPs, [52] in turn GPs report feeling ill-equipped to diagnose MH conditions in children, and there are considerable challenges in accessing child and adolescent mental health specialists [52,53,54].

The gap between screen-detected prevalence and primary care EHR prevalence was more apparent for MH conditions than for CRM conditions, notably for depression, bipolar disorder, eating disorders and PTSD. Financial incentives for accurate coding of certain conditions may have impacted the accuracy of recording diagnoses in EHR. All but one (aortic aneurysms) of the CRM conditions are included in QOF, which financially incentivises practices to have accurate disease coding, whilst eating disorders and PTSD are not included in QOF [18]. Longitudinal analysis of atrial fibrillation coding in UK EHRs suggests that the introduction of QOF did lead to practices refining the diagnostic coding for this condition [55].

PTSD had the most notable discrepancies between both screen-detected prevalence and self-reported doctor-diagnosed prevalence compared with prevalence in the EHR, which suggests that this condition may be especially under-recognised and under-diagnosed. PTSD is typically diagnosed in secondary care and case-detection within primary care EHR is also dependent on accurate transfer of information between primary and secondary care. Studies exploring the accuracy of stroke and cancer diagnoses in primary care EHR, have shown that between 10 and 40% of these diagnoses in hospital records are missing in primary care EHR [14]. Many people with symptoms of common MH conditions do not present to primary or secondary care [13]. However, self-reported screening questionnaires also consistently overestimate the prevalence of MH conditions in epidemiological studies, [56] thus CPRD Aurum and other EHR databases may be more reliable for case-detection of these conditions. Results from the SAIL EHR databank, showed that ten-year prevalence of depression and/or anxiety was 16.2% and of anxiety/depression symptom codes was 21.4% which is similar to our estimates (16.0% had depression (95%CI 16.0–16.0%) [57].

Women had double the rates of reported depression and anxiety compared with men in the primary care EHR. However, in the AMPS survey screening for symptoms of depression and anxiety prevalence of these conditions is only around 25–50% higher in women [13]. In the EHR, depression and anxiety were three times as common in those of white ethnicity compared with those of black or Asian ethnicity. However, in the AMPS survey, symptoms of depression and anxiety were more common in people of black and Asian ethnicity [13]. Like previous studies, we found that black people were twice as likely to be diagnosed with schizophrenia as other ethnicities [13]. Research in this area is limited by small sample sizes. However, it is recognised that there are considerable barriers to accessing mental healthcare for people from black and minority ethnic communities, which may lead to under-diagnosis in primary care [58]. These disparities between screening prevalence and prevalence of mental health conditions in EHR likely reflect patterns of help-seeking behaviour and barriers to access, which are influenced by both gender and ethnicity [58, 59].

There was also an overall gap between screen-detected prevalence in HSE and CPRD Aurum prevalence for diabetes and hypertension, whilst doctor-diagnosed prevalence estimates were similar [23]. However, it is important to note that the methods used for screening in HSE are not diagnostic, for example, a single raised HbA1c measurement was used to estimate the prevalence of diabetes, whereas clinical guidelines state that two raised HbA1c measurements are required to confirm the diagnosis.

Replicating the screening methods used in HSE with clinical biomarkers such as blood creatinine and blood pressure produced a similar prevalence rate of hypertension and CKD [23]. These biomarkers may be useful for some studies looking at short term outcomes. A previous study in CPRD Gold found that clinical codes underestimate the prevalence of CKD and concluded that a combination of codes and test results is most appropriate to detect CKD [60]. However, for studies investigating multimorbidity and detection of disease accumulation over several years, clinical codes are more likely to be more specific and most reflective of long-term conditions.

The prevalence of all CRM and MH conditions in CPRD Aurum typically ranged from 5 to 50% higher than prevalence rates reported in other UK primary care EHR databases (predominantly QOF data). Our codelists were more comprehensive than QOF codelists; for example, the codelists for heart failure and depression included more codes related to interventions, abnormal test results, disease monitoring, and referral to secondary care services. In both these conditions the prevalence estimates in CPRD Aurum were similar to the self-reported doctor-diagnosed prevalence estimates. Therefore, our codelists may be more sensitive but less specific than QOF codelists.

A diagnosis of anxiety was more prevalent in CPRD Aurum data (15.8% (95%CI 15.8–15.8%)) in comparison with a previous analysis of THIN data (7.2% (95%CI 7.1–7.2%)) [36]. However, the THIN analysis reported prevalence of anxiety codes entered between 2002 and 2004 only, whereas we included any case prior to 2020. Doctor-diagnosed prevalence of generalised anxiety disorder was also higher in CPRD Aurum (9.4% 95%CI 9.4–9.4%)) compared with self-reported doctor-diagnosed generalised anxiety in HSE (5.5% (95%CI 4.9–6.1%)) [37]. The most frequently used code within our anxiety codelist by some margin was “Anxiety with depression”, reflecting the established overlap between these two conditions.

As in previous studies, the prevalence of all conditions increased with increasing socio-economic deprivation (with the exception of eating disorders) [61]. A recent systematic review showed no consistent pattern of association between socio-economic status and eating disorders, but that historically those in more affluent groups were more likely to access diagnosis and treatment, which may explain the inverse association between social deprivation and eating disorders [62].

The prevalence of alcohol misuse in CPRD Aurum in over 16-year-olds (5.4% (95%CI 5.4–5.4%)) was considerably higher than HSE reports of both self-reported doctor-diagnosed alcohol misuse (1.2% (95%CI 1.0-1.5%)) and the screen-detected prevalence of alcohol misuse in the same age group (3.1% (95%CI 2.7–3.5%)). Participants may potentially under-report their true drinking practices in surveys, whilst GPs may be entering clinical codes for alcohol misuse but not conveying the extent of their concerns to patients [63]. On the other hand, substance misuse appears to be under-diagnosed in CPRD Aurum compared with self-reported substance misuse. The prevalence reported in CPRD Aurum 2.1% (95%CI 2.1–2.1%) was lower than the screen-detected prevalence of drug dependence in APMS analysis 3.1% (95%CI 2.7–3.5%), which is in keeping with findings from other studies [64].

Strengths and limitations

This CPRD Aurum database contains EHR from over 12 million patients reflecting a nationally representative sample of the UK population in terms of geographic spread, deprivation, age, and gender [7]. For half of the 18 conditions (almost all the CRM conditions) primary care clinicians are financially incentivised via the QOF system since 2004 to accurately record diagnosis codes in EHRs.

Our codelists for identifying conditions within CPRD Aurum were created using a rigorous and systematic process by a team of experienced clinicians, building on a strong foundation of previous research using clinical codes in EHRs. Our findings demonstrate that these codelists appear to have high sensitivity to detect the majority of CRM and MH conditions within EHRs.

The literature review was more pragmatic than a systematic review methodology as it would not have been feasible to do a systematic review for each of the 18 conditions. However, the majority of the comparisons are from the latest official UK government commissioned studies or audits of disease prevalence (e.g., QOF, HSE, APMS, National Diabetes Audit, etc.) [30]. Comparisons with studies reliant on self-reported health status (e.g., HSE) are subject to response bias which may have influenced their findings.

An important limitation is that the prevalences we report are lifetime prevalences, thus conditions that have resolved will still be captured in our results. This was done for comparison with the analysis periods of the comparator data sources. Many of the included conditions are likely to be lifelong conditions (e.g., type 1 diabetes or heart failure). However, others such as depression or anxiety may later resolve or follow a relapsing-remitting course, rather than having persisting symptoms. Therefore, the duration of the data collection period significantly affects the reported prevalence of these types of conditions.

For pragmatic reasons, only age (and sex in the case of aortic aneurysms) was used to stratify CPRD Aurum data to make comparisons with prevalence estimates from the literature. Where disease prevalence has changed over time, especially given the ageing population, there can be far less certainty in the comparisons with prevalence estimates from less recent studies in the literature. Caution should be taken in analysis of prevalence of conditions by ethnicity, given that these categories aggregate together very diverse communities and ranges of cultural practices and countries of ethnic origin. Where researchers wish to examine specific conditions or sub-populations in more depth or wish to understand prevalence within a specific sub-population these factors may need to be explored in greater detail.

Implications for policy and practice

Primary care EHR data are a reliable source for clinically diagnosed cases of most cardio-renal-metabolic (CRM) conditions and for depression, bipolar disorder, and schizophrenia. Caution should be taken in interpreting analyses of anxiety disorders using primary care EHR data as the prevalence may be over-reported, whereas cases of PTSD and eating disorders may be under-reported. Policymakers should explore whether incentivising accurate coding for more MH conditions, for example through QOF (in the UK), could improve reporting quality [18]. Policymakers may also wish to consider how both public awareness and primary care and mental health services can be configured to improve case-detection of these more neglected MH conditions, especially in men and children and those of black or Asian ethnicity. Healthcare providers should be encouraged to adopt culturally sensitive practices to ensure that minority populations receive adequate mental health care and support.

We found almost 40% of patients on anti-hypertensive medications or whose latest recorded blood pressure was greater than 140/90mmHg did not have a clinical code for hypertension in their EHR. This is in keeping with other studies that have demonstrated a significant burden of hypertension that is not well documented or acted on in primary care despite financial incentivisation [65]. Practices should consider implementing more robust follow-up systems once hypertension is initially detected. Policymakers should be aware that longer consultation times and higher GP to patient ratios are associated with better hypertension case-detection and management, especially in more deprived areas [65, 66].

Implications for future research

The variation in prevalence of conditions by sociodemographic characteristics, especially sex and ethnicity, warrants further exploration to understand the relative contribution of genetics, and lifestyle, socio-cultural and healthcare-related factors in these disparities. This requires both longitudinal analyses, stratified by these demographic subgroups, to understand how these factors mediate risk of CRM and MH conditions, and qualitative research exploring barriers to accurate case-detection at both a patient and practice level (e.g., staffing ratios, funding, and continuity of care).

For future research using EHRs, additional algorithms may be used to adjust sensitivity or specificity of a codelist for case-detection of these conditions, depending on the purpose of the analysis. These might include use of prescription data, or codes for symptoms or referrals (instead of diagnoses), and use of clinical biomarkers such as blood test results. Future research could also explore the prevalence and demographic variation of other common chronic conditions in this database, such as cancer, respiratory conditions, and autoimmune diseases.


Most clinically diagnosed conditions appeared to be well represented in primary care records. However, we found important variations in prevalence by demographic characteristics, which may reflect true variation in prevalence or systematic differences in likelihood of both presentation to healthcare professionals and of being diagnosed with these conditions. Primary care data may underrepresent the prevalence of undiagnosed conditions particularly in mental health.

Data Availability

Access to anonymized patient data from CPRD Aurum is subject to a data sharing agreement containing detailed terms and conditions of use following protocol approval from the MHRA Independent Scientific Advisory Committee (CPRD Study ID 22_001903). This study-specific analysable dataset is therefore not publicly available but can be requested from the corresponding author at subject to research data governance approvals. Details about Independent Scientific Advisory Committee applications and data costs are available on the CPRD website ( All methods were carried out in accordance with relevant guidelines and regulations. The codelists used to define the disease definitions and perform the analysis are publicly available at




atrial fibrillation


Adult Psychiatric Morbidity Survey


blood pressure


bipolar affective disorder


chronic kidney disease


Clinical Practice Research Datalink


computed tomography


cardiovascular, renal, and metabolic conditions


Data Extraction for Epidemiological Research


estimated glomerular filtration rate


electronic health records


general practitioner


glycosylated haemoglobin


Health Survey for England


ischaemic heart disease


index of multiple deprivation


IQVIA Medical Research Database


interquartile range


mental health conditions


post traumatic stress disorder


peripheral vascular disease


Quality and Outcomes Framework


The Health Improvement Network


type 1 diabetes mellitus


type 2 diabetes mellitus


United Kingdom


United States of America


  1. Vigo D, Thornicroft G, Atun R. Estimating the true global burden of mental illness. Lancet Psychiatry. 2016;3:171–8.

    Article  PubMed  Google Scholar 

  2. Das P, Naylor C, Majeed A. Bringing together physical and mental health within primary care: a new frontier for integrated care. J R Soc Med. 2016;109:364–6.

    Article  PubMed  PubMed Central  Google Scholar 

  3. GBD 2019 Ageing Collaborators. Global, regional, and national burden of diseases and injuries for adults 70 years and older: systematic analysis for the global burden of Disease 2019 study. BMJ. 2022;376:e068208.

    Google Scholar 

  4. Naylor C. Bringing Together Physical and Mental Health: A New Frontier for Integrated Care. 2016.

  5. Naylor C, Parsonage M, McDaid D, Knapp M, Fossey M, Galea A. (2012). Long-term conditions and mental health: the cost of co-morbidities. London: The King’s Fund and Centre for Mental Health. Available at:

  6. Ho IS-S, Azcoaga-Lorenzo A, Akbari A, Black C, Davies J, Hodgins P, et al. Examining variation in the measurement of multimorbidity in research: a systematic review of 566 studies. Lancet Public Health. 2021;6:e587–97.

    Article  PubMed  Google Scholar 

  7. Wolf A, Dedman D, Campbell J, Booth H, Lunn D, Chapman J, et al. Data resource profile: clinical Practice Research Datalink (CPRD) Aurum. Int J Epidemiol. 2019;48:1740–1740 g.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Violán C, Foguet-Boreu Q, Hermosilla-Pérez E, Valderas JM, Bolíbar B, Fàbregas-Escurriola M, et al. Comparison of the information provided by electronic health records data and a population health survey to estimate prevalence of selected health conditions and multimorbidity. BMC Public Health. 2013;13:251.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Sulieman L, Cronin RM, Carroll RJ, Natarajan K, Marginean K, Mapes B, Roden D, Harris P, Ramirez A. Comparing medical history data derived from electronic health records and survey answers in the all of us Research Program. J Am Med Inform Assoc. 2022;29(7):1131–41.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Okura Y, Urban LH, Mahoney DW, Jacobsen SJ, Rodeheffer RJ. Agreement between self-report questionnaires and medical record data was substantial for diabetes, hypertension, myocardial infarction and stroke but not for heart failure. J Clin Epidemiol. 2004;57(10):1096–103. PMID: 15528061.

    Article  PubMed  Google Scholar 

  11. Moynihan R, Doust J, Henry D. Preventing overdiagnosis: how to stop harming the healthy. BMJ. 2012;344:e3502.

    Article  PubMed  Google Scholar 

  12. Salaheddin K, Mason B. Identifying barriers to mental health help-seeking among young adults in the UK: a cross-sectional survey. Br J Gen Pract. 2016;66:e686–92.

    Article  PubMed  PubMed Central  Google Scholar 

  13. McManus S, University of Leicester. Department of Health Sciences, National Centre for Social Research (Great Britain), Bebbington P, Jenkins R, Brugha T. Mental Health and Wellbeing in England: Adult Psychiatric Morbidity Survey 2014: a Survey carried out for NHS Digital by NatCen. Social Research and the Department of Health Sciences, University of Leicester; 2016.

  14. Arhi CS, Bottle A, Burns EM, Clarke JM, Aylin P, Ziprin P, Darzi A. Comparison of cancer diagnosis recording between the clinical Practice Research Datalink, Cancer Registry and Hospital Episodes Statistics. Cancer Epidemiol. 2018;57:148–57.

    Article  PubMed  Google Scholar 

  15. Gokhale KM, Chandan JS, Toulis K, Gkoutos G, Tino P, Nirantharakumar K. Data extraction for epidemiological research (DExtER): a novel tool for automated clinical epidemiology studies. Eur J Epidemiol. 2021;36:165–78.

    Article  PubMed  Google Scholar 

  16. Herrett E, Gallagher AM, Bhaskaran K, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol. 2015;44:827–36.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Ho ISS, Azcoaga-Lorenzo A, Akbari A, Davies J, Khunti K, Kadam UT, et al. Measuring multimorbidity in research: Delphi consensus study. BMJ Med. 2022;1.

  18. Quality. and Outcomes Framework (QOF), enhanced services and core contract extraction specifications (business rules). In: NHS Digital. [cited 15 Jul 2022]. Available:

  19. SAIL Databank-Swansea University. HDRUK Phenotype Library. [cited 20 Jul 2022]. Available:

  20. Bennett Institute for Applied Data Science, University of Oxford. OpenSafely Codelists. In: OpenCodelists. [cited 20 Jul 2022]. Available:

  21. CPRD @ Cambridge - Codes Lists (GOLD). In: Primary Care Unit. Primary Care Unit, University of Cambridge; 30 Sep 2018 [cited 20 Jul 2022]. Available:

  22. IHTSDO, NHSDigital. SNOMED CT Browser. [cited 22 Jul 2022]. Available:

  23. NHS Digital. Health Survey for England 2019. 2019. Available:

  24. Office for Health Improvement, (OHID). Public health profiles - OHID. [cited 15 Jul 2022]. Available:

  25. NHS Digital. NHS Digital Data. [cited 15 Jul 2022]. Available:

  26. The National Institute for Health and Care Excellence. Clinical Knowledge Summaries. [cited 15 Jul 2022]. Available:

  27. Clopper CJ, Pearson ES, The use of confidence or fiducial limits, illustrated in the case of the binomial. Biometrika. 1934. pp. 404–13.

  28. Cea-Soriano L, Fowkes FGR, Johansson S, Allum AM, García Rodriguez LA. Time trends in peripheral artery disease incidence, prevalence and secondary preventive therapy: a cohort study in the Health Improvement Network in the UK. BMJ Open. 2018;8:e018184.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Scholes S, Mindell JS. Health Survey for England 2017 Cardiovascular diseases. 2017. Available:

  30. Dankner R. Review for completion of annual diabetes care processes and mortality: a cohort study using the National Diabetes Audit for England and Wales. 2021.

  31. Centers for Disease Control, Prevention National Center for Health Statistics. National Health and Nutrition Examination Survey (NHANES) - National Cardiovascular Disease Surveillance System. 2016. Available:

  32. Ng VWS, Man KKC, Gao L, Chan EW, Lee EHM, Hayes JF, et al. Bipolar disorder prevalence and psychotropic medication utilisation in Hong Kong and the United Kingdom. Pharmacoepidemiol Drug Saf. 2021;30:1588–600.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Wood S, Marchant A, Allsopp M, Wilkinson K, Bethel J, Jones H et al. Epidemiology of eating disorders in primary care in children and young people: a clinical Practice Research Datalink study in England. BMJ Open. 2019;9: e026691.

  34. Marcheselli F, Light R. Health Survey for England 2019: Eating Disorders. NHS Digital; 2020. Available:

  35. Rassen JA, Bartels DB, Schneeweiss S, Patrick AR, Murk W. Measuring prevalence and incidence of chronic conditions in claims and electronic health record databases. Clin Epidemiol. 2019;11:1–15.

    Article  PubMed  Google Scholar 

  36. Martín-Merino E, Ruigómez A, Wallander M-A, Johansson S, García-Rodríguez LA. Prevalence, incidence, morbidity and treatment patterns in a cohort of patients diagnosed with anxiety in UK primary care. Fam Pract. 2010;27:9–16.

    Article  PubMed  Google Scholar 

  37. Bridges S. Health Survey for England, 2014. NHS Digital; 2015.

  38. Hounkpatin HO, Harris S, Fraser SDS, Day J, Mindell JS, Taal MW, et al. Prevalence of chronic kidney disease in adults in England: comparison of nationally representative cross-sectional surveys from 2003 to 2016. BMJ Open. 2020;10:e038423.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Klarin D, Lynch J, Aragam K, Chaffin M, Assimes TL, Huang J, et al. Genome-wide association study of peripheral artery disease in the million veteran program. Nat Med. 2019;25:1274–9.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Walford H, Ramsay L, Soljak M, Gordon F, Birger R. Coronary Heart Disease Prevalence Modelling Briefing Document. Eastern Region Public Health Observatory (ERPHO); 2011 Dec. Available:$FILE/CHDPrevalenceModellingBriefingDocument2.pdf.

  41. Hobbs FDR, Fitzmaurice DA, Mant J, Murray E, Jowett S, Bryan S et al. A randomised controlled trial and cost-effectiveness study of systematic screening (targeted and total population screening) versus routine practice for the detection of atrial fibrillation in people aged 65 and over. The SAFE study. Health Technol Assess. 2005;9: iii–iv, ix–x, 1–74.

  42. AAA screening standards: data report 1 April. 2018 to 31 March 2019. In: GOV.UK. [cited 18 Jul 2022]. Available:

  43. Galasko GIW, Senior R, Lahiri A. Ethnic differences in the prevalence and aetiology of left ventricular systolic dysfunction in the community: the Harrow heart failure watch. Heart. 2005;91:595–600.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Nkomo VT, Gardin JM, Skelton TN, Gottdiener JS, Scott CG, Enriquez-Sarano M. Burden of valvular heart diseases: a population-based study. Lancet. 2006;368:1005–11.

    Article  PubMed  Google Scholar 

  45. Ruscio AM, Hallion LS, Lim CCW, Aguilar-Gaxiola S, Al-Hamzawi A, Alonso J, et al. Cross-sectional comparison of the epidemiology of DSM-5 generalized anxiety disorder across the Globe. JAMA Psychiatry. 2017;74:465–75.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Jacomelli J, Summers L, Stevenson A, Lees T, Earnshaw JJ, editors. ’s Choice - Inequalities in Abdominal Aortic Aneurysm Screening in England: Effects of Social Deprivation and Ethnicity. Eur J Vasc Endovasc Surg. 2017;53: 837–843.

  47. Bennett PC, Silverman S, Gill PS, Lip GYH. Ethnicity and peripheral artery disease. QJM. 2009;102:3–16.

    Article  PubMed  CAS  Google Scholar 

  48. Heckbert SR, Austin TR, Jensen PN, Chen LY, Post WS, Floyd JS, et al. Differences by Race/Ethnicity in the prevalence of clinically detected and monitor-detected Atrial Fibrillation: MESA. Circ Arrhythm Electrophysiol. 2020;13:e007698.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Walker ER, McGee RE, Druss BG. Mortality in mental disorders and global disease burden implications: a systematic review and meta-analysis. JAMA Psychiatry. 2015;72:334–41.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Arcelus J, Mitchell AJ, Wales J, Nielsen S. Mortality rates in patients with anorexia nervosa and other eating disorders. A meta-analysis of 36 studies. Arch Gen Psychiatry. 2011;68:724–31.

    Article  PubMed  Google Scholar 

  51. Newlove-Delgado T, Williams T, Robertson K, McManus S, Sadler K, Vizard T et al. Mental Health of Children and Young People in England, 2021. NHS Digital; 2021 Sep. Available:

  52. Sayal K, Taylor E. Detection of child mental health disorders by general practitioners. Br J Gen Pract. 2004;54:348–52.

    PubMed  PubMed Central  Google Scholar 

  53. Hinrichs S, Owens M, Dunn V, Goodyer I. General practitioner experience and perception of child and adolescent Mental Health Services (CAMHS) care pathways: a multimethod research study. BMJ Open. 2012;2.

  54. O’Brien D, Harvey K, Young B, Reardon T, Creswell C. GPs’ experiences of children with anxiety disorders in primary care: a qualitative study. Br J Gen Pract. 2017;67:e888–98.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Adderley N, Nirantharakumar K. Marshall TTemporal variation in the diagnosis of resolved atrial fibrillation and the influence of performance targets on clinical coding: cohort studyBMJ Open 2019;9:e030454.

  56. Thombs BD, Kwakkenbos L, Levis AW, Benedetti A. Addressing overestimation of the prevalence of depression based on self-report screening questionnaires. CMAJ. 2018;190:E44–9.

    Article  PubMed  PubMed Central  Google Scholar 

  57. John A, McGregor J, Fone D, Dunstan F, Cornish R, Lyons RA, et al. Case-finding for common mental disorders of anxiety and depression in primary care: an external validation of routinely collected data. BMC Med Inform Decis Mak. 2016;16:35.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Bignall T, Jeraj S, Helsby E, Butt J. Racial disparities in mental health: Literature and evidence review. Race Equality Foundation; 2019. Available:

  59. Thompson AE, Anisimowicz Y, Miedema B, Hogg W, Wodchis WP, Aubrey-Bassler K. The influence of gender and other patient characteristics on health care-seeking behaviour: a QUALICOPC study. BMC Fam Pract. 2016;17:38.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Ramagopalan S, Leahy TP, Stamp E, Sammon C. Approaches for the identification of chronic kidney disease in CPRD-HES-linked studies. J Comp Eff Res. 2020;9:441–6.

    Article  PubMed  Google Scholar 

  61. Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. Lancet. 2012;380:37–43.

    Article  PubMed  Google Scholar 

  62. Huryk KM, Drury CR, Loeb KL. Diseases of affluence? A systematic review of the literature on socioeconomic diversity in eating disorders. Eat Behav. 2021;43:101548.

    Article  PubMed  Google Scholar 

  63. Aira M, Kauhanen J, Larivaara P, Rautio P. Factors influencing inquiry about patients’ alcohol consumption by primary health care physicians: qualitative semi-structured interview study. Fam Pract. 2003;20:270–5.

    Article  PubMed  Google Scholar 

  64. Davies-Kershaw H, Petersen I, Nazareth I, Stevenson F. Factors influencing recording of drug misuse in primary care: a qualitative study of GPs in England. Br J Gen Pract. 2018;68:e234–44.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Baker R, Wilson A, Nockels K. Et alLevels of detection of hypertension in primary medical care and interventions to improve detection: a systematic review of the evidence since 2000. BMJ Open. 2018;8:e019965.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Bankart MJ, Anwar MS, ,Walker N, et al. Are there enough GPs in England to detect hypertension and maintain access? A cross-sectional study. Br J Gen Pract. 2013;63:339–44.

    Article  Google Scholar 

Download references


This work uses data provided by patients and collected by the NHS as part of their care and support.


This study was undertaken as part of a National Institute for Health Research (NIHR) Intelligence for Multiple Long-Term Conditions (AIM) funded project. OPTIMising therapies, disease trajectories, and AI assisted clinical management for patients Living with complex multimorbidity (OPTIMAL study) Award ID: NIHR202632 The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations



JC, KN, SH and TM conceived and directed the study. JC directed the clinical coding team (including NG) in creating the clinical codelists. KG and AA managed the data, JC conducted the data analyses. TJ, JC, and SH gave clinical advice. JC wrote the manuscript, with input from FC, SH, AA, KN and CM. All authors provided critical feedback on the manuscript and approved the final version.

Corresponding author

Correspondence to Krishnarajah Nirantharakumar.

Ethics declarations

Ethics approval and consent to participate

This study has scientific approval from CPRD (CPRD Study ID 22_001903). There is overall ethical approval from the NHS Health Research Authority for collection of anonymised data in CPRD Aurum. GP practices consent to share patient data with CPRD for research purposes and individual patients can opt-out of such data-sharing. The informed consent statement was waived by the NHS Health Research Authority (East Midlands - Derby Research Ethics Committee Ethics Committee reference 21/EM/0265) due to the fact that individual patients cannot be identified from the database. Further details about CPRD’s process to safeguard patient data can be found at All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

For EHR data: Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cooper, J., Nirantharakumar, K., Crowe, F. et al. Prevalence and demographic variation of cardiovascular, renal, metabolic, and mental health conditions in 12 million english primary care records. BMC Med Inform Decis Mak 23, 220 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: