Skip to main content


Case-finding for common mental disorders of anxiety and depression in primary care: an external validation of routinely collected data



The robustness of epidemiological research using routinely collected primary care electronic data to support policy and practice for common mental disorders (CMD) anxiety and depression would be greatly enhanced by appropriate validation of diagnostic codes and algorithms for data extraction. We aimed to create a robust research platform for CMD using population-based, routinely collected primary care electronic data.


We developed a set of Read code lists (diagnosis, symptoms, treatments) for the identification of anxiety and depression in the General Practice Database (GPD) within the Secure Anonymised Information Linkage Databank at Swansea University, and assessed 12 algorithms for Read codes to define cases according to various criteria. Annual incidence rates were calculated per 1000 person years at risk (PYAR) to assess recording practice for these CMD between January 1st 2000 and December 31st 2009. We anonymously linked the 2799 MHI-5 Caerphilly Health and Social Needs Survey (CHSNS) respondents aged 18 to 74 years to their routinely collected GP data in SAIL. We estimated the sensitivity, specificity and positive predictive value of the various algorithms using the MHI-5 as the gold standard.


The incidence of combined depression/anxiety diagnoses remained stable over the ten-year period in a population of over 500,000 but symptoms increased from 6.5 to 20.7 per 1000 PYAR. A ‘historical’ GP diagnosis for depression/anxiety currently treated plus a current diagnosis (treated or untreated) resulted in a specificity of 0.96, sensitivity 0.29 and PPV 0.76. Adding current symptom codes improved sensitivity (0.32) with a marginal effect on specificity (0.95) and PPV (0.74).


We have developed an algorithm with a high specificity and PPV of detecting cases of anxiety and depression from routine GP data that incorporates symptom codes to reflect GP coding behaviour. We have demonstrated that using diagnosis and current treatment alone to identify cases for depression and anxiety using routinely collected primary care data will miss a number of true cases given changes in GP recording behaviour. The Read code lists plus the developed algorithms will be applicable to other routinely collected primary care datasets, creating a platform for future e-cohort research into these conditions.


Common mental disorders (CMD) are an important public health problem comprising depression, anxiety, panic and somatisation and usually presenting as mixed syndromes with mixed symptoms. They are significant contributors to impaired health, well-being and health services utilisation, direct and indirect costs to the economy and overall mortality [14] CMD have a combined community prevalence of between 15 % and 30 %, depending on the population and case definition used [5, 6].

The community prevalence of CMD is significantly greater than in primary are because only around one-third of affected people seek help in primary care [7]. Among attendees, depression and anxiety (the most common CMDs) often go unrecognised [711]. Ten years ago, of those recognised, fewer than one-third received treatment [12]. Up to three-quarters may have visited their physician in the previous year with seemingly unrelated complaints [1316]. In primary care settings, decisions about who should receive treatment for recognised CMD seem to be made on a patient-by-patient basis, influenced by the severity of symptoms demonstrated [17]. In the UK over the last decade, GPs have become more likely to record individual symptoms rather than specific CMD diagnoses [18, 19].

Routinely collected electronic health records have the potential to support policy and enhance the practice of health and social care. The validity and reliability of research using routinely collected primary care electronic data depends upon its quality and completeness. Overall, the validity of primary care diagnoses in the UK tends to be high, although the quality of reporting and detailed descriptions of validation methods are often variable and inadequate [20].

Any method clarifying the most robust way to identify a case of anxiety and depression from routinely collected primary care data would support research in this area. Previously we identified suitable participants for a trial in depression using electronic general practice data by an internal validation methodology involving two independent clinicians [21]. In this study we assess our ability to identify cases of CMD in the primary care dataset by an external validation method using a survey administered mental health score.



We aimed to create a robust research platform for CMD (anxiety and depression) using population-based, routinely collected primary care electronic data.


We planned to create Read code lists and algorithms to identify primary care recorded CMD anxiety and depression in order to assess any changes in diagnostic recording for CMD anxiety and depression in the General Practice Dataset (GPD) between January 1st 2000 and December 31st 2009 in those aged over 18 years. These algorithms for case-finding of primary care recorded anxiety and depression will then be compared to the five-item Mental Health Inventory (MHI-5), as a measure of mental health status.

Ethical approval

The ethical considerations of this project were covered by permissions granted to the Swansea Secure Anonymised Information Linkage (SAIL) Databank and the Caerphilly Health and Social Needs Survey (CHSNS) [22]. Additionally we gained approval from the Swansea University Information Governance Review Panel, an independent body consisting of a range of government, regulatory and professional agencies, including the NHS Research Ethics Service and members of the public, which grants approval to studies conducted within the SAIL Databank.

Data source

Caerphilly health & social needs survey

The CHSNS is a community study of health inequality set in Caerphilly county borough, Wales [2224]. The current dataset comprises a two-wave cross-sectional postal questionnaire survey. The baseline survey sample (wave 1) was drawn from a total population of 132,613 residents aged 18 years and over recorded in the then current local General Medical Practice administrative register on May 31st 2001. This produced a representative dataset on 10,892 residents of the borough, aged 18 to 74 years. The response rate for wave 1 was 63.0 %. In 2008, a follow-up postal questionnaire survey was carried out (wave 2) of surviving baseline respondents, then aged 25 to 81 years, with a response rate of 50.2 % [23, 24].

Measure of mental health

Both waves of the survey included responses to SF-36, version 2, health status questionnaire. The measure of mental health in this study was the five-item MHI-5, a sub-scale of the SF-36. Its validity and reliability, as a measure of mental health status, are well established [25, 26] and reflect the continuously distributed, single dimensional nature of CMD symptoms in the population [27, 28]. We restricted this analysis to those aged under 75 years at baseline, the MHI-5 being less reliable in UK elderly populations [29, 30]. The assessment of CMD using the MHI-5 scale has been performed previously and is well established [22, 3134]. It performs at least as well as the General Health Questionnaire [22, 3136]. The response scores were transformed using the scale developers’ published method, with imputation of missing data, to a scale of range 0 to 100, with lower scores indicating poorer mental health [37].

Our previous work [38] has demonstrated a range of methods for producing a cut-point on the MHI-5 thus overcoming earlier limitations [31, 39]. We state that investigators should consider carefully the cut-point most suitable for their study based primarily on the intended application of the resulting cut-point. We chose a cut-point of ≤60 on the MHI-5 to minimise misclassification rate since we were interested in the within-borough comparisons between MHI-5 scores and GP diagnoses.

SAIL Databank

The SAIL databank is the national data repository of anonymised, person-based, linkable data in Wales covering a population of 3 million [40, 41]. Robust policies, structures and controls are in place to protect privacy through a reliable matching, anonymisation and encryption process achieved in conjunction with the NHS Wales Informatics Service (NWIS) using a split file approach as detailed by Ford et al. (2009) and Lyons et al., (2009). The split file approach involves separation of identifiers from clinical content, identity matching and creation of pseudonymised linkage keys (Anonymised Linking Fields) by NWIS prior to reassembling and further encryption of data sets using different algorithms within the University (Fig. 1).

Fig. 1

Split file approach to linking survey data to primary care data

The primary care dataset in SAIL (GPD) contains Read codes for each registered individual in a SAIL supplying practice. Read codes are a hierarchical nomenclature used to record clinical summary information. Primary care physicians enter medical diagnoses and symptoms using Read codes. The GPD does not contain any accompanying free text on referral or discharge to or from secondary/tertiary care. It is regularly updated.

E-cohort creation

The survey dataset was imported into the SAIL databank and an electronic cohort created by record-linking the baseline survey data demographics to the primary care dataset using the Welsh Demographic Service (WDS) dataset in NWIS. WDS contains the unique NHS number for all individuals who register with the free to use general medical practitioner service (Fig. 1).

Primary care case selection

Read code lists, corresponding to ICD-10 Chapter V [42] diagnoses of anxiety and depression, were developed by clinical members of the study team with reference to Rait et al. [18] and the Quality Outcomes Framework [43] (Additional file 1: Table S1). These included GP recording of i) anxiety diagnoses e.g. generalized anxiety disorder; ii) anxiety symptoms e.g. anxiousness; iii) mixed anxiety and depression; iv) panic attacks and panic disorders; v) depression diagnoses; vi) depression symptoms. We excluded codes for other psychosis, phobias, obsessive compulsive disorders, post traumatic stress disorder, behavioural disorders, hyperkinetic disorders, conduct disorders and disorders of social functioning in keeping with other studies of this type [18, 19]. We excluded adjustment disorders as conceptually they are an intermediate health condition between normal responses to stress and more severe emotional disorders such as anxiety and depression [44]. We also compiled a Read code list of British National Formulary listed antidepressants, anxiolytics and hypnotics [45] (Additional file 2: Table S2).

We queried the GPD using db2 structured query language, implementing Read Codes Version 2 (5-byte set). We used, devised algorithms and evaluated multiple methods to define a case of anxiety and depression (12 in total as listed in Table 2) incorporating, in various combinations, current and historical diagnoses, symptoms and treatments. Our definition of ‘current’ was a search for relevant Read codes over a one-year period with the date of the survey response at the midpoint. This was in order to capture those who may present to their GP with CMD but not be diagnosed for a period of time and also those who may delay seeing their GPs for a period of time. Our definition of ‘historical’ was a search for relevant Read codes through the retrospective longitudinal data housed in the GPD outside the ‘current’ period. The length of retrospective data varied between individuals depending on the length of their registration with a SAIL supplying practice and was longer for wave 2 respondents. Treatment was at least one prescription for an antidepressant, anxiolytic or hypnotic in the one-year current period.

Statistical analysis

The data were analysed using SPSS Statistics software (Version 20). We calculated a ten-year period prevalence of anxiety and depression diagnosis in the GPD population between January 1st 2000 and December 31st 2009 in those aged over 18 years using the study Read code lists. We calculated annual incidence rates of recorded anxiety and depression, based on diagnoses, symptoms and diagnoses and symptoms combined, between 2000 and 2010 to assess changes in diagnostic recording practice for the inclusion of certain codes in our study. A new episode was defined as an entry in the records with no previous entry of that problem recorded in the previous year. We therefore reviewed data from January 1st 1999.

Annual incidence rates were calculated per 1000 person years at risk (PYAR). We calculated person-time at risk using the start of each year (1st January) or start of registration (plus 6 months) whichever was the later for each of the years. The end date was the earliest of the following: date of leaving a SAIL supplying practice, date of death or the end of the target year (31st December). We excluded the first six months of data following registration. This allowed for retrospective recording to reduce the chance of prevalent cases being recorded as incident. It was a requirement that each individual contributed at least one year’s follow-up data.

We then compared the ‘primary care cases’, for each of the 12 definitions, to the ‘survey cases’ separately for both waves of the survey. Sensitivity, specificity and positive predictive values (PPV) for the 12 defined algorithms were calculated, using a cut-point of ≤60 for the MHI-5 (as gold standard) [38]. Sensitivity measures the proportion of cases and specificity the proportion of non-cases, identified in the survey data, correctly identified as such in primary care. The PPV is the probability that the person identified by the algorithm is a survey case and is expressed as a proportion.

We explored the reasons for algorithm 9, based on diagnosis and treatment codes only, identifying false negatives and false positive cases (including the role of symptom codes in this) in the Wave 2 sample since these are the codes used in current literature.


Identification of READ codes corresponding to anxiety and depression diagnosis in the SAIL databank GPD

Five hundred twenty two thousand and five hundred seventy eight patients were registered continuously with a GP within the SAIL databank from 1st January 2000 to 31st December 2009. The ten-year period prevalence of GP recorded diagnoses of anxiety and depression during this time was 16.2 % (n = 84,835). Where individuals had more than one type of diagnosis, the ten-year period prevalence for depressive disorders was 9.3 % (n = 48,382), mixed anxiety and depression 4.8 % (n = 25,256), and anxiety disorders 6.4 % (n = 33,284). If depression and anxiety symptom codes were included, the ten-year prevalence was 21.4 % (n = 111,768).

GPs’ electronic recording of depression and anxiety

Table 1 and Fig. 2 show trends in the incidence of depression and anxiety between 1st January 2000 and 31st December 2009. The incidence of recorded depression and anxiety diagnoses (combined) remained stable over the ten-year period but the incidence of recorded symptoms increased substantially from 6.5 to 20.7 per 1000 PYAR. The combined incidence of diagnoses and symptoms of depression and anxiety increased from 24.4 to 34.9 1000 PYAR.

Table 1 Incidence of GP recorded depression/anxiety broken down by diagnoses and symptoms in the SAIL databank 2000–2010 per 1000 person-years at risk (PYAR)
Fig. 2

Trends in the incidence of GP recorded anxiety and depression (2000–2010)

Study population

There were 691,762 individuals aged 18–74 (38.5 % of relevant population in Wales) registered to a SAIL supplying practice continuously between 1st May 2001 and 30th April 2002 (wave 1 period) and 805,929 (44.8 % of relevant population in Wales) individuals aged 18–74 registered continuously to a SAIL supplying practice between 1st May 2008 and 30th April 2009 (wave 2 period).

The CHSNS dataset included 10,653 MHI-5 scores from wave 1 and 4,426 MHI-5 scores from wave 2. We matched 2,584 of 10,653 (24.3 %) wave 1 survey responders and 1,195 of 4,426 (27.0 %) wave 2 responders, corresponding to 2799 individuals with MHI-5 scores, to their primary care data. This was on the basis of continuous registration with a SAIL supplying practice for the 12-month period at the time of the survey response. Among these, 824 (32 %) had scores of ≤60 on the MHI-5 at Wave 1 and 395 (33 %) at Wave 2.

There were 84 (3.3 %) CHSNS MHI-5 responders in wave 1 and 54 (4.5 %) in wave 2 linked in SAIL with a current diagnosis or symptom of anxiety and depression in the GPD. There were 198 (7.7 %) responders in wave 1 and 188 (15.7 %) in wave 2 linked in SAIL with a historical diagnosis or symptom of anxiety or depression currently treated in GPD.

Validation results

Table 2 shows the sensitivity, specificity and PPV for the comparison of GP Read codes for anxiety and depression diagnoses, symptoms and treatment with MHI-5 data from the CHSNS waves using a cut-point of ≤60. As expected, all the algorithms underestimated the prevalence of CMD compared to the MHI-5, particularly those based only on current diagnosis or symptoms. The addition of historical diagnoses with current treatment contributed most to increasing the sensitivity.

Table 2 Sensitivity, specificity, positive predictive value-results comparing GP Read codes for anxiety and depression diagnoses, symptoms and treatment with MHI-5 data (as the gold standard) from the CHSNS waves

False positives

Based on algorithm 9 (historical diagnosis currently treated plus current diagnosis treated or untreated) using wave 2 data we incorrectly identified 36 of 800 patients (4.5 %) as a case of CMD in SAIL, all of whom had at least one historical code for a diagnosis of depression and/or anxiety (Table 3). Of these, 35 (97.2 %) were being currently treated.

Table 3 False positives and false negatives for Algorithm 9

False negatives

A total of 280 of 395 (70.9 %) survey cases were not identified using this algorithm (Table 3). Of these, 76 (27.1 %) were currently being treated but had no diagnostic Read codes for anxiety or depression recorded in their GP records. However, 25 (8.9 %) individuals did have current symptom codes, and 32 (11.4 %) had historical diagnosis codes (17 of whom had a historical symptom code), but were not being currently treated and nine (3.2 %) were registered with a practice but had no record of any attendances with their GP during that one-year period.

The majority of false negatives 200/280 (71 %) had a historical recording of pain and 139/280 (50 %) had one of the following chronic physical diseases–cardiovascular disease, diabetes, digestive disorder, kidney disorder, thyroid problem, chronic migraines, chronic obstructive pulmonary disease or one of the top seven cancers (as defined by World Health Organisation). Thirty-nine of the 280 (14 %) subjects had CMD other than depression and anxiety such as stress or other neurotic disorders. Alcohol dependency accounted for 6.8 % (19/280). Other mental health related disorders such as psychotic disorders and drug dependencies were present; however, they accounted for a very small number.


Main findings

We have created a set of algorithms to identify cases of CMD anxiety and depression from routinely collected GP data. We linked survey data containing a validated instrument (MHI-5) for CMD to the GP record and compared results with recorded anxiety and depression diagnoses, symptoms and treatment codes to assess the sensitivity, specificity and PPV of the group of codes and algorithms chosen.

We then calculated the ten-year period prevalence and incidence of anxiety and depression diagnosis in a large primary care population. The recorded incidence of combined anxiety and depression diagnoses in Wales has remained stable over time while the incidence of symptoms has increased. We therefore included symptom codes in our analysis.

We found that using algorithm 9 (a historical diagnosis currently treated plus a current diagnosis whether treated or untreated) at wave 2 resulted in a specificity of 0.96, sensitivity 0.29 and PPV 0.76. This method improves sensitivity considerably from using current diagnosis alone, as is done in many studies using routine data, whilst specificity remains high. It had the optimal combination of specificity and PPV.

All false positive cases had historical diagnoses and were currently treated. These may be cases that were benefiting from their treatment and were asymptomatic at the time of survey. False negatives, i.e. those identified in the survey as having anxiety and depression but not identified in primary care, were mainly those with historical diagnoses of pain (71 %) or who had chronic co-morbidities (50 %). Approximately 9 % of false negatives had current symptom codes and 14 % had CMD other than depression and anxiety recorded in primary care. The latter is a limitation of using the MHI-scale in its assessment of mental health status.

The MHI-5 scale asks questions relating to current status (previous four weeks) and therefore current symptoms will play an important role in questionnaire completion. Additionally there appears to be an increasing preference over time for recording symptoms over diagnoses in primary care [18, 19]. Adding current symptom codes to the algorithm improved sensitivity (0.32) with a marginal effect on specificity (0.95) and PPV (0.74). This algorithm (algorithm 10) produces a sample size for analysis that is over five times that where current diagnosis (algorithm 2) alone is used to identify cases (Table 2).

Strengths and limitations

The assessment of CMD using MHI-5 has been made previously [31, 36], although not through data linkage. However the gold standard for identifying cases would be to use the Clinical Interview Schedule-Revised [46]. A strength of this study is the analysis of large population level data giving large sample sizes for calculating the incidence of recorded anxiety and depression. The use of routinely available data minimises the costs of epidemiological research. The linkage of survey data to SAIL data allowed for the validation of Read codes and the use of these Read codes and algorithms in further epidemiological research [47].

Response bias is a potential issue in the CHSNS. The response rate to the first wave of the CHSNS is comparable to many population surveys, but the wave 2 response was lower at 50.2 %. Selection bias is a potential issue in using the GPD. The GPD population studied is large but covers 168 out of 474 practices in Wales and 38.5 % (wave 1 period)/44.8 % (wave 2 period) of the population aged 18–74 years. Those practices that are not currently signed up to SAIL may be in some way different to those that are. This may introduce bias in the estimation of prevalence. However, it is unlikely that the relationship between GP-recorded CMD and CMD measured using the MHI-5 would be different among practices in SAIL and not in SAIL. As such our estimates of the relative performance of the different algorithms are unlikely to have been affected. The sensitivity of our case definitions is higher in wave 2 than wave 1 since more historical data are available and also, possibly, because of the increase in the number of prescriptions issued for antidepressants, anxiolytics and hypnotics over the last decade [18].

A further strength of this study is the inclusion of symptom codes. A recent study [48] found that CMD and sub threshold psychiatric symptoms were both independently associated with new-onset functional disability and significant days lost from work. They suggest leaving symptoms unaccounted for in surveys may lead to gross underestimation of disability related to psychiatric morbidity.

Given the changing patterns of coding behaviour by GPs and the constantly evolving Read code system, assessment of prevalence based on these data alone would be flawed. Analysis of GP utilisation data extrapolates an estimate of community demand and need based upon existing utilisation experience. It says more about mental health resources, practice priorities, help-seeking behaviour and service capacity than about the real prevalence or need in the population. Because we have been able to externally validate these codes using the MHI-5, they can be used as a platform for epidemiological research into the CMD anxiety and depression and as an outcome measure using routinely available GP data for various studies including electronically linking participants in trials and traditional cohorts.

For epidemiological research, using large computerised databases of routinely collected medical records, robust findings rely on validation of methods for case ascertainment. The method of identifying a case of anxiety and depression from routinely collected primary care data, the trade-off between sensitivity and specificity and hence the choice of algorithm, will vary with study design. High specificity and therefore PPV, if necessary at the expense of low sensitivity, is more crucial for validity in creating primary care e-cohorts so that most cases identified have the disorder of interest. This is particularly important for case control studies, a common design in e-studies where identification of large numbers of controls is facilitated. Where the ratio of controls to cases is high, misclassification of cases as controls, i.e. a high false negative rate, may not bias results significantly. However, misclassifying controls as cases has the potential for bias. In such studies high PPVs and specificity are important. We have explored how different combinations of codes and algorithms affect these measures allowing for a more robust understanding when using routine data. For example using the algorithm based on current diagnosis alone is highly specific and comparable with current literature. However the sensitivity is low and the sample size in this study would be small.

Comparison with previous literature

The prevalence of the CMDs of anxiety and depression found in this study is in keeping with other studies [4, 6, 9]. Changes in primary care recording of anxiety and depression in Wales mirror those found in other large primary care datasets, [18, 19] and justify the inclusion of symptom codes in our analysis rather than reflecting a true decrease in the incidence of the diagnosis of anxiety and depression. Strategies adopted in the Quality and Outcomes framework for those with diagnosed depression may be having unintended consequences on coding patterns possibly resulting in a disincentive to record diagnoses [49].

Based on an MHI-5 cut point of ≤60, GPs diagnosed around 30 % of participants defined as a case of CMD by the MHI-5 score as having anxiety or depression (diagnosis or symptom). This is similar to findings in the Dutch population [39]. Several explanations exist why those with probable CMD do not seek healthcare or, if they do, why the general practitioner does not diagnose them. These include stigma, spontaneous resolution and patients presenting with physical symptoms/problems [47].

What this study adds

We have developed an algorithm with excellent specificity and high PPV for detecting CMD (anxiety and depression) with a trade off of low but expected sensitivity from routinely collected electronic primary care data for research purposes. The Read code (diagnosis, symptoms and treatment) lists plus the developed algorithms will be applicable to other primary care datasets of routinely collected data, thus creating a platform for future e-cohort research into these conditions. We have demonstrated that using diagnosis and current treatment alone to identify cases for depression and anxiety using routinely collected primary care data will miss a number of true cases given changes in GP recording behaviour. We have also shown that the algorithms are enhanced by the inclusion of current symptoms.


De-identified databanks of routinely collected clinical data such as SAIL, CPRD (Clinical Practice Research Database) [50] and THIN (The Health Improvement Network) [51] [provide a rich resource for research. The development of algorithms from complex datasets that identify cases with high PPV is an important step in epidemiological research. This work has implications for future research on the common mental disorders anxiety and depression, and on antidepressants. By understanding the performance of the different algorithms we gain a lot of insight into their potential use for research. We are now including the CMD as mental health outcomes in studies across a range of areas, including the environment, housing, suicide and for clinical research [47, 5254]. We plan to externally validate the algorithms developed to assess CMD in children and young people using age-appropriate survey data.


The assessment of cases of CMD (anxiety and depression) based on those with a historical diagnosis currently treated plus a current diagnosis or symptom treated or untreated (algorithm 10) will be useful as a platform for future research in e-cohort studies using routinely collected primary care data. Assessing diagnosis only, in addition, would allow comparison with other literature.

Availability of data and materials

Data will not be shared due to permissions granted under the SAIL databank.



caerphilly health and social needs survey


common mental disorder


clinical practice research database


general practice dataset


mental health inventory (sub-scale of SF-36)


positive predictive value


person years at risk


secure anonymised information linkage


short form health status questionnaire


the health improvement network


welsh demographic service


  1. 1.

    Croft-Jeffreys C, Wilkinson G. Estimated costs of neurotic disorder in UK general practice 1985. Psychol Med. 1989;19(3):549–58.

  2. 2.

    Ford E, Clark C, McManus S, Harris J, Jenkins R, Bebbington P, Brugha R, Meltzer H, Stansfeld SA. Common mental disorders, unemployment and welfare benefits in England. Public Health. 2010;124:675–81.

  3. 3.

    Dewa CS, Thompson AH, Jacobs P. The association of treatment of depressive episodes and work productivity. Can J Psychiatr. 2011;56:743–50.

  4. 4.

    Russ TC, Stamatakis E, Hamer M, Starr JM, Kivimaki M, Batty GD. Association between psychological distress and mortality: individual participant pooled analysis of 10 prospective cohort studies. Br Med J. 2012;345, e4933.

  5. 5.

    Kessler RC, Chui WT, Demler O, Walters EE. Prevalance, severity, and co-morbidity of twelve-month DSM-IV Disorders in the National Comorbidity Survey Replication (NCS-R). Arch Gen Psychiatry. 2005;62(2):617–27.

  6. 6.

    Weich S. Prevention of the common mental disorders: a public health perspective. Psychol Med. 1997;27(4):757–64.

  7. 7.

    Bebbington PE, Meltzer H, Brugha TS, Farrell M, Jenkins R, Ceresa C, Lewis G. Unequal access and unmet need: neurotic disorders and the use of primary care services. Psychol Med. 2000;30(6):1359–67.

  8. 8.

    Egede LE. Failure to recognize depression in primary care: issues and challenges. J Gen Intern Med. 2007;22(5):701–3.

  9. 9.

    Cepoiu M, McCusker J, Cole MG, Sewitch M, Belzile E, Ciampi A. Recognition of depression by non-psychiatric physicians--a systematic literature review and meta-analysis. J Gen Intern Med. 2008;23(1):25–36.

  10. 10.

    Simon GE, Fleck M, Lucas R, Bushnell DM. Prevalence and predictors of depression treatment in an international primary care study. Am J Psychiatr. 2004;161(9):1626–34.

  11. 11.

    Brugha TS, Bebbington PE, Singleton N, Melzer D, Jenkins R, Lewis G, Farrell M, Bhugra D, Lee A, Meltzer H. Trends in service use and treatment for mental disorders in adults throughout Great Britain. Br J Psychiatry. 2004;185:378–84.

  12. 12.

    Bebbington P, Meltzer H, Brugha T, Farrell M, Jenkins R, Ceresa C, Lewis G. Unequal access and unmet need: neurotic disorders and the use of primary care services. Int Rev Psychiatry. 2003;15(1–2):115–22.

  13. 13.

    Jarman B. Identification of underprivileged areas. Br Med J (Clin Res Ed). 1983;286(6379):1705–9.

  14. 14.

    Jarman B. Underprivileged areas: validation and distribution of scores. Br Med J (Clin Res Ed). 1984;289(6458):1587–92.

  15. 15.

    Jarman B. Jarman index. Br Med J. 1991;302(6782):961–2.

  16. 16.

    Jarman B, Hirsch S, White P, Driscoll R. Predicting psychiatric admission rates. Br Med J. 1992;304(6835):1146–51.

  17. 17.

    Hyde J, Evans J, Sharp D, Croudace T, Harrison G, Lewis G, Araya R. Deciding who gets treatment for depression and anxiety: a study of consecutive GP attenders. Br J Gen Pract. 2005;55(520):846–53.

  18. 18.

    Rait G, Walters K, Griffin M, Buszewicz M, Petersen I, Nazareth I. Recent trends in the incidence of recorded depression in primary care. Br J Psychiatry. 2009;195(6):520–4.

  19. 19.

    Walters K, Rait G, Griffin M, Buszewicz M, Nazareth I. Recent trends in the incidence of anxiety diagnoses and symptoms in primary care. PLoS One. 2012;7(8), e41670.

  20. 20.

    Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4–14.

  21. 21.

    McGregor J, Brooks C, Chalasani P, Chukwuma J, Hutchings H, Lyons RA, Lloyd K. The Health Informatics Trial Enhancement Project (HITE): Using routinely collected primary care data to identify potential participants for a depression trial. Trials. 2010;11:39.

  22. 22.

    Fone DL, Dunstan F, John A, Lloyd K. Associations between common mental disorders and the Mental Illness Needs Index in community settings. Multilevel analysis. Br J Psychiatry. 2007;191:158–63.

  23. 23.

    Fone DL, Dunstan F, White J, Kelly M, Farewell D, Lyons R, Lloyd K. The Caerphilly Health and Social Needs electronic cohort study (E-CATALYsT). Int J Epidemiol. 2013;42(6):1620–8.

  24. 24.

    Fone DL, White J, Farewell D, Kelly M, John G, Lloyd K, Williams G, Dunstan F. Effect of neighbourhood deprivation and social cohesion on mental health inequality: multilevel population-based longitudinal study. Psychol Med. 2013;44(11):2449–60.

  25. 25.

    Ware Jr JE, Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. J Clin Epidemiol. 1998;51(11):903–12.

  26. 26.

    Ware JE, Snow KK, Kosinski M. SF-36 Health Survey: Manual and Interpretation Guide. Lincoln, RI: Quality Metric Incorporated; 2000.

  27. 27.

    Goldberg D, Bridges K, Duncan-Jones P, Grayson D. Detecting anxiety and depression in general medical settings. Br Med J. 1988;297(6653):897–9.

  28. 28.

    Lewis G, Booth M. Regional differences in mental health in Great Britain. J Epidemiol Community Health. 1992;46(6):608–11.

  29. 29.

    Hayes V, Morris J, Wolfe C, Morgan M. The SF-36 health survey questionnaire: is it suitable for use with older adults? Age Ageing. 1995;24(2):120–5.

  30. 30.

    Hill S, Harries U, Popay J. Is the short form 36 (SF-36) suitable for routine health outcomes assessment in health care for older people? Evidence from preliminary work in community based health services in England. J Epidemiol Community Health. 1996;50(1):94–8.

  31. 31.

    McCabe CJ, Thomas KJ, Brazier JE, Coleman P. Measuring the mental health status of a population: a comparison of the GHQ-12 and the SF-36 (MHI-5). Br J Psychiatry. 1996;169(4):516–21.

  32. 32.

    Winston M, Smith J. A trans-cultural comparison of four psychiatric case-finding instruments in a Welsh community. Soc Psychiatry Psychiatr Epidemiol. 2000;35(12):569–75.

  33. 33.

    Fone DL, Dunstan F, Christie S, Jones A, West J, Webber M, Lester N, Watkins J. Council tax valuation bands, socio-economic status and health outcome: a cross-sectional analysis from the Caerphilly Health and Social Needs Study. BMC Public Health. 2006;6:115.

  34. 34.

    Skapinakis P, Lewis G, Araya R, Jones K, Williams G. Mental health inequalities in Wales, UK: multi-level investigation of the effect of area deprivation. Br J Psychiatry. 2005;186:417–22.

  35. 35.

    Weinstein MC, Berwick DM, Goldman PA, Murphy JM, Barsky AJ. A comparison of three psychiatric screening tests using receiver operating characteristic (ROC) analysis. Med Care. 1989;27(6):593–607.

  36. 36.

    Berwick DM, Murphy JM, Goldman PA, Ware Jr JE, Barsky AJ, Weinstein MC. Performance of a five-item mental health screening test. Med Care. 1991;29(2):169–76.

  37. 37.

    Ware JE, Kosinski M, Dewey JE. How to score version two of the SF-36 Health Survey. Licoln: QualityMetric Incorporated; 2000.

  38. 38.

    Kelly MJ, Dunstan FD, Lloyd K, Fone DL. Evaluating cutpoints for the MHI-5 and MCS using the GHQ-12: a comparison of five different methods. BMC Psychiatry. 2008;8:10.

  39. 39.

    Hoeymans N, Garssen AA, Westert GP, Verhaak PF. Measuring mental health of the Dutch population: a comparison of the GHQ-12 and the MHI-5. Health Qual Life Outcomes. 2004;2:23.

  40. 40.

    Ford DV, Jones KH, Verplancke JP, Lyons RA, John G, Brown G, Brooks CJ, Thompson S, Bodger O, Couch T. The SAIL Databank: building a national architecture for e-health research and evaluation. BMC Health Serv Res. 2009;9:157.

  41. 41.

    Lyons RA, Jones KH, John G, Brooks CJ, Verplancke JP, Ford DV, Brown G, Leake K. The SAIL databank: linking multiple health and social care datasets. BMC Med Inform Decis Mak. 2009;9:3.

  42. 42.

    International Classification of Diseases []. Accessed 14 March 2016.

  43. 43.

    NHS UK Read Codes Version 2. In.: Health and Social Care Information Centre. []. Accessed 14 March 2016.

  44. 44.

    Casey P, Doherty A. Adjustment disorder: implications for ICD-11 and DSM-5. Br J Psychiatry. 2012;201:90–2.

  45. 45.

    British National Formulary []. Accessed 14 March 2016.

  46. 46.

    Lewis G, Pelosi AJ, Araya R, Dunn G. Measuring psychiatric disorder in the community: a standardised assessment for use by lay interviewers. Psychol Med. 1992;22:465–86.

  47. 47.

    Rodgers SE, Heaven M, Lacey A, Poortinga W, Dunstan FD, Jones KH, Palmer SR, Phillips CJ, Smith R, John A, et al. Cohort profile: the housing regeneration and health study. Int J Epidemiol. 2012;43(1):52–60.

  48. 48.

    Rai D, Skapinakis P, Wiles N, Lewis G, Araya R. Common mental disorders, subthreshold symptoms and disability: longitudinal study. Br J Psychiatry. 2010;197411–412.

  49. 49.

    Mitchell C, Dwyer R, Hagan T, Mathers N. Impact of the QOF and the NICE guideline in the diagnosis and management of depression: a qualitative study. Br J Gen Pract. 2011;61(586):279–89.

  50. 50.

    Clinical Practice Research Datalink - CPRD []. Accessed 14 March 2016.

  51. 51.

    The Health Improvement Network []. Accessed 14 March 2016.

  52. 52.

    Farroha A, McGregor J, Paget T, John A, Lloyd K. Using anonymized, routinely collected health data in wales to estimate the incidence of depression after burn injury. J Burn Care Res. 2013;34(6):644–8.

  53. 53.

    White J, Greene G, Dunstan F, Rodgers S, Lyons R, Humphreys I, John A, Webster C, Palmer S, Elliot E. The Communities First (ComFi) study: protocol for a prospective controlled quasi-experimental study to evaluate the impact of area-wide regeneration on mental health and social cohesion in deprived communities. BMJ Open. 2014;4(10), e006530.

  54. 54.

    John A, Dennis M, Kosnes L, Gunnell D, Scourfield J, Ford D, Lloyd K. Suicide Information Database Cymru: a protocol for a population-based, routinely collected data linkage study to explore risks and patterns of healthcare contact prior to suicide to identify opportunities for intervention. BMJ Open. 2014;4, e006780.

Download references


This work was supported by the National Institute for Social Care and Health Research (Welsh Government), NISCHR Grant No: H07-3-03 and the Farr Institute. The Farr Institute is supported by a 10-funder consortium: Arthritis Research UK, the British Heart Foundation, Cancer Research UK, the Economic and Social Research Council, the Engineering and Physical Sciences Research Council, the Medical Research Council, the National Institute of Health Research, Health and Care Research Wales (Welsh Assembly Government), the Chief Scientist Office (Scottish Government Health Directorates), the Wellcome Trust, (MRC Grant No: MR/K006525/1). The funding bodies played no role in the design, analysis and interpretation of the data nor in the writing of the manuscript.

Author information

Correspondence to Ann John.

Additional information

Competing interests

The authors declare they have no competing interests.

Authors’ contributions

AJ conceived the study, led in it’s design and coordination, conducted the analysis and drafted the manuscript; JM participated in study design, extracted the data, conducted the analysis and drafted the paper; DF, FD, RC, RAL and KL conceived the study and participated in it’s design. All authors read and approved the final manuscript.

Additional files

Additional file 1:

Depression and Anxiety Read Codes V2 used in algorithms. (DOCX 16 kb)

Additional file 2:

Drug treatment Read Codes V2 used in algorithms. (DOCX 16 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

John, A., McGregor, J., Fone, D. et al. Case-finding for common mental disorders of anxiety and depression in primary care: an external validation of routinely collected data. BMC Med Inform Decis Mak 16, 35 (2016) doi:10.1186/s12911-016-0274-7

Download citation


  • Anxiety
  • Depression
  • Electronic health records
  • Validation