- Research article
- Open Access
- Open Peer Review
Detecting referral and selection bias by the anonymous linkage of practice, hospital and clinic data using Secure and Private Record Linkage (SAPREL): case study from the evaluation of the Improved Access to Psychological Therapy (IAPT) service
© de Lusignan et al; licensee BioMed Central Ltd. 2011
- Received: 19 December 2010
- Accepted: 13 October 2011
- Published: 13 October 2011
The evaluation of demonstration sites set up to provide improved access to psychological therapies (IAPT) comprised the study of all people identified as having common mental health problems (CMHP), those referred to the IAPT service, and a sample of attenders studied in-depth. Information technology makes it feasible to link practice, hospital and IAPT clinic data to evaluate the representativeness of these samples. However, researchers do not have permission to browse and link these data without the patients' consent.
To demonstrate the use of a mixed deterministic-probabilistic method of secure and private record linkage (SAPREL) - to describe selection bias in subjects chosen for in-depth evaluation.
We extracted, pseudonymised and used fuzzy logic to link multiple health records without the researcher knowing the patient's identity. The method can be characterised as a three party protocol mainly using deterministic algorithms with dynamic linking strategies; though incorporating some elements of probabilistic linkage. Within the data providers' safe haven we extracted: Demographic data, hospital utilisation and IAPT clinic data; converted post code to index of multiple deprivation (IMD); and identified people with CMHP. We contrasted the age, gender, ethnicity and IMD for the in-depth evaluation sample with people referred to IAPT, use hospital services, and the population as a whole.
The in IAPT-in-depth group had a mean age of 43.1 years; CI: 41.0 - 45.2 (n = 166); the IAPT-referred 40.2 years; CI: 39.4 - 40.9 (n = 1118); and those with CMHP 43.6 years SEM 0.15. (n = 12210). Whilst around 67% of those with a CMHP were women, compared to 70% of those referred to IAPT, and 75% of those subject to in-depth evaluation (Chi square p< 0.001). The mean IMD score for the in-depth evaluation group was 36.6; CI: 34.2 - 38.9; (n = 166); of those referred to IAPT 38.7; CI: 37.9 - 39.6; (n = 1117); and of people with CMHP 37.6; CI 37.3-37.9; (n = 12143).
The sample studied in-depth were older, more likely female, and less deprived than people with CMHP, and fewer had recorded ethnic minority status. Anonymous linkage using SAPREL provides insight into the representativeness of a study population and possible adjustment for selection bias.
- Practice Population
- Electronic Patient Record System
- Common Mental Health Problem
- Strong Identifier
- False Negative Count
Selection bias may distort the results about the effectiveness of a new service . In the NHS nearly all the population are registered with a single family practitioner; and have a single unique identifier (NHS number) which can be linked to health services utilisation, making it possible in theory to quantify selection bias and if needed adjust for it . Although computerised records make it technically straightforward to link population, practice, hospital and clinic data, it is not possible to extract a patient's records without their consent. For a large population based study this is not feasible; and obtaining this consent may result in further bias . Methods are needed which allow selective mining of key variables from individual patients' records to enable researchers to know the extent of any selection bias. Such methods should allow anonymous extraction and linkage of data with only the data needed to make comparisons extracted; and the privacy of the patient is primarily maintained through the researcher not having access to any strong identifiers (e.g. name, date of birth etc.) [4–7].
The Improving Access to Psychological Therapies (IAPT) programme is a Department of Health (DH) quality improvement initiative . The DH also commissioned a comprehensive evaluation of the IAPT programme, which included a case-control study of those referred to the IAPT clinics against age-sex practice matched controls, and an in-depth study of a cohort of people who attended the IAPT clinic and consented to provide further information for the evaluation . IAPT offers a series of stepped interventions, including the use of cognitive behavioural therapy (CBT) which aims to reduce the economic burden to society of psychological illness and enable people to cope better with their mental health problems. The target population for the IAPT programme is people with common mental health problems (CMHP) in primary care, specifically people suffering from depression and/or anxiety disorders. The thresholds of severity of CMHP for referral to IAPT were not always strictly adhered to by referrers. Access to IAPT is further complicated as patients also have direct access to the service without seeing their GP. We linked practice, hospital and IAPT clinic data to conduct this evaluation - linking anonymised data using privacy enhanced fuzzy matching to maximise join quality. We called this process SAPREL - s ecure a nd p rivate re cord l inkage. The resulting merged data table enabled the tracking of health utilisation of individuals across primary care, hospital services and within the IAPT clinics.
The purpose of this paper is to explore any selection bias in the people referred to IAPT and those who underwent in-depth evaluation. We compares the characteristics of the populations linked using the SAPREL process: (1) The practice registered population; (2) Those referred to IAPT; (3) Uses of hospital services; (4) People with CMHP (a subset within the practice population); and (5) In-depth evaluation group (an enhanced subset within the IAPT population). We contrasted the age, gender, ethnicity and level of deprivation of these five populations.
Twenty practices consented to participate in this study, 10 each in two localities which piloted the first IAPT services in England. One was within London in an area with a diverse ethnic population; the other a northern city with a predominantly white population. We extracted data from their electronic patient record (EPR) systems using MIQUEST (Morbidity Information Query and Export Syntax) - a Department of Health sponsored application which allows the same data extraction query to be run on different branded EPR systems. These data were extracted, processed and cleaned using well established methods [10, 11].
The hospital and IAPT clinic data for these 20 practices were exported using their standard data export methods. Hospital episode statistics, or SUS data (Secondary Uses Services) between 01/10/2007 and 30/04/2009 were retrieved by the information services of the primary care trust of the 2 study sites. Customised output from the IAPT clinic of all those referred between 1/10/2007 and 30/09/08 (on an 'intention to treat' basis) was exported from two different bespoke applications developed specifically to support IAPT clinics. These data were de-identified within the premises where the data were held or accessed, and then subsequently linked using the SAPREL method. This method means that no person identifiable data left the premises where such data were held, and at no time did the researchers hold strong identifiers.
The secure and private record linkage (SAPREL) process
SAPREL fuzzy linking can be characterised as a deterministic algorithm with dynamic linking strategies determined by the lowest measured link error estimates; (i.e. we flexibly apply the algorithm which appear to generate fewest errors.) The method also incorporates some probabilistic elements to enable wider record matching. The privacy preserving nature of SAPREL flows from its ability to operate on data fully de-identified within each contributing site.
Normalise and cleanse fields used for linking (dates to ISO format, postcodes to a standard form,)
Create derived values from fields used for linking (year of birth from date of birth, deprivation Index from postcodes, and so on)
We encrypt every field with a key held at the contributing facility. Fields used for cross-site linking are encrypted again with a common key and a salt (random pad) to prevent dictionary attacks by the data intermediary.
Generate multiple link strategies (forename, surname, date of birth (DoB), postcode) vs. (surname, DoB, postcode) vs. etc.
Estimate the false positives in each strategy via the "Duplicate Method" 
Choose the link strategy from those with the lowest false positive error estimate that also have the highest number of distinct links. This helps to eliminate strategies with high false negative counts.
The data set
The GP practice data set included: personal identifier for data linkage - forename, surname, date of birth, NHS number and postcode; demographic information: gender, ethnicity, registered date; and postcode to map to the Index of Multiple Deprivation (IMD)  using Geographical Information System (GIS) methods. IMD is divided into deciles of equal sizes, where the first decile (IMD ≤ 5.63) is the least deprived and decile ten (IMD ≥ 45.33) the most deprived. The ethnicity codes are mapped to the National Statistics "5+1" categories . The categories are: Not stated, white, mixed, Asian or Asian British, Black or Black British, Chinese or other ethnic group. We extracted clinical information which enabled us to report whether a patient had a CMHP, namely a diagnosis of depression or anxiety, coded in the clinical computer system. Where we compare the people with CMHP with the other groups we make the comparison with the adult (≥16 years) population. Additional data were extracted for the DH commissioned evaluation, but are not reported in this paper.
Missing, temporary or duplicate unique identifiers (NHS number) by practice
Duplicate or default NHS number
Missing NHS Number
Total of NHS Number problems
Missing Forename or Surname
Demographic data errors
Errors as %
As the NHS number field is poorly populated in the IAPT data, the GP practice and IAPT clinic data only had forename, surname, date of birth and postcode in common. As part of the SAPREL process, various functions of these fields were paired in various combinations. The link strategy with the lowest estimated false positive and false negative count was ultimately selected. Where an IAPT forename, surname, date of birth or postcode was missing - that record was discarded (n = 98; 91 missing postcodes, 7 missing dates of birth).
Validity of the data linkage
When NHS number is present, and used to link hospital and practice data there is a relatively low risk of failure in the linkage. However, for the clinic data linkage where NHS number is less available we compared the distribution of two readily available variables age and IMD for the patients with linked data with data available for other cases referred to IAPT but not part of the study.
We used descriptive statistics to compare the samples. We quoted 95% confidence intervals (CI) and standard error of the mean (SEM) to allow comparison between age groups; and used a t-test to give the probability that age were significantly different. We used Chi square to compare proportions of categorical variables. We used the Wilcoxon non-parametric test to compare the distribution of 5-year age-bands between different populations.
A national research ethics committee (reference No: 08/H0715/101) provided ethical review and the research office of the local healthcare organizations involved provided local site approval. Specific approval was obtained from the former Patient Information Advisory Group (PIAG) for Section 60 exemption for the transient holding of patient identifiable information while it is pseudonymised and encrypted on health service premises (Reference number: PIAG 6-06(h)/2008). The SAPREL process complies with both research and information governance frameworks in England in the use and protection of patient information; and was commended by PIAG as an example of best practice.
Age and gender
Comparison by gender of practice list, using hospital services, common mental health problems (CMHP), referred to IAPT and part of the in-depth evaluation cohort
In-depth study cohort
Use hospital services
p = 0.023
p = 0.016
Rates of recording of ethnicity by patient group
In-depth study cohort
IAPT clinic referral
Use hospital services
GP list population
No ethnicity code/not stated
Asian or Asian British
Black or Black British
Chinese/other ethnic group
Index of multiple deprivation (IMD)
Mean deprivation (IMD) score for each study population
In-depth study cohort
IAPT clinic referral
Use hospital services
GP list population
Mean IMD score
95% Confidence Intervals
People with common mental health problems (CMHP)
Comparing those with and without common mental health problems within the four study populations
In-depth study population
Referred to IAPT
Hospital adult population
GP listed adult population
People with NO record of Common Mental Health Problem (CMHP):
% of population
Gender (F%: M%)
People with a Common Mental Health Problem (CMHP):
% of population
Gender (F%: M%)
Total population size and differences between people with and without a CMHP
Total population size
(n = 166)
(n = 1,118)
(n = 53,318)
(n = 121,199)
+ 1.5 years
(p < 0.05)
(p < 0.001)
(p < 0.001)
(p < 0.001)
(p < 0.001)
(p < 0.001)
(p < 0.001)
(p < 0.05)
(p < 0.01)
In the in-depth cohort those with CMHP were around seven years younger and more likely to be female compared with those with no CMHP; there are no significant differences in multiple deprivation scores. In the group referred to IAPT those with CMHP came from slightly less deprived areas, and more likely to be female compared with those with no CMHP; there were no significant difference in age. In the adult population who had attended hospital the people with CMHP were approximately two years younger, came from slightly more deprived areas, and more likely to be female. Taking the practice population as a whole people with CMHP tended to be a little older and more likely to be female.
Inter-practice and inter-locality variation
All the practices except one referred patients to IAPT; and two other practices were low referrers. All the practices referred between 0.5% and 2.5% of their adult registered list, which is also between 3.5% and 12.5% of the number of people with CMHP (though not all those referred came from this group). The age distributions in the two localities was not significantly different from the population in the sample practices (Wilcoxon non-parametric tests comparing 5-year age bands) p = 0.58 and p = 0.145 for the northern and southern localities respectively. The northern was almost a perfect match, there was an excess of young adults in the southern locality.
Validity of the linkage
We compared those referred to IAPT in the study (n = 1,118) with other cases where data were not linked (n = 4,353) and found that they were of similar age: mean 40.2 years, vs. 39.2 years respectively; and for IMD mean score 38.7 vs. 35.5. The distributions of these data were similar between the two groups (See Additional file 1). The age distribution was left-skewed with almost no referrals below age 20 years (as shown for the IAPT referred population Figure 1; Chi-square test p = 0.03). The distribution of IMD showed a similar ranking in both groups with increasing proportions as deprivation worsens (as shown for IAPT referred population in Figure 3; Wilcoxon ranking test suggest no statistically significant difference p = 0.460).
By linking three differently structured health databases we were able to characterise the population referred and the group who were part of the in-depth evaluation; without knowing the researchers knowing the identities of the people they were linking. The process was able to take place as a result of visits to the practice, hospital or clinic safe haven; and so was mobile and flexible. Had we applied this process a priori we would have been able to more purposively sample; applied post hoc it at least allows allowance to be made for population differences.
The practice populations were not significantly different from the locality from which they were drawn. The people referred to IAPT were not exclusively drawn from those recorded as having CMHP; and the group studies in-depth for the service evaluation were relatively older, more likely to be women, and included fewer with a recorded ethnic minority status.
This approach has allowed databases designed to serve a different purposes and using different coding systems to be linked. The SAPREL process also meets with stringent research and information governance requirements for the ethical use and protection of patient information.
The finding also shows that over 60% of the patients in the study practices' populations lived in the 20% most deprived areas as measured by the Index of Multiple Deprivation. However, men, older people and some ethnic groups were apparently less likely to be referred.
Implications of the findings
These data were linked so that we could conduct a case controlled, before and after study of hospital utilisation of people referred to IAPT, and by the time that permissions were obtained and this linkage completed, the IAPT evaluation programme was well underway. However, the SAPREL technology would allow in future studies a sample for in-depth study to be purposively sampled and therefore to be more representative of the population under study. This approach has the potential to quantify any selection bias and allow researchers to avoid or adjust for it.
Linking data at the individual level across care boundaries offers the opportunity of evaluating system impact of policy initiatives which cannot be effectively measured separately in different sectors of the health and social care provision. The resulting file tracks a cohort of individual patients through the system across primary and secondary healthcare organisational boundaries, and avoids the potential biased conclusion through the analysis of different cohorts of patients from separate cross-sectional databases.
Comparison with the literature
There are two approaches to achieving record linkage probabilistic (making the most likely matches) and deterministic (requiring precise matching). Probabilistic linking requires the readable sensitive values from different data stores to be brought together for similarity checking. This approach increases the breach risk of that data by creating a larger pool and so reduces the number of sites willing to contribute health data [18, 19]. For these reasons SAPREL is based on the deterministic approach but introduces some probabilistic methods to keep error rates down .
Analysis in this paper found that compared with the 2001 census, the study practices' populations had more people of working age (20 to 50 for men, and 20 to 35 for women), and a larger proportion of children aged under 5 . Other studies and household surveys conducted in the UK report a higher prevalence of common mental health problems in females [22, 23]. Deprivation is associated with poor outcomes including in mental health, so the accessibility of the IAPT service to people of low socioeconomic status in important [24–26].
Strategies such as placing researchers with honorary contracts into practices have been suggested as an alternative method of accessing records , methods have also been piloted to use "agents" (software to flag eligible patients) to meet this need . However, all of these methods, including SAPREL rely on the clinicians responsible for patient's health data having trust in the professionalism of the person extracting these data .
Limitations of the method
Our reporting of ethnicity could have been more complete. We could have also increased the ethnicity recording using data from hospital information but did not retain this data field .
Recruitment of patients for research projects is challenging; and it is possible that older patients had more time available to participate. The fuzzy logic linkage used in SAPREL was not independently checked for accuracy. We did not find any contradictions in these data, however this does not mean there was a perfect match. We feel that this linkage is sufficiently good for a research study but because there will be errors (false positive and false negative) it is not recommended as a method for identifying the care in practice of individual patients. We only had permission to link these data privately so cannot precisely comment precisely on their performance metrics. However, we presume linkage accuracy close to the 87-88% reported in similar approaches . Both these methods are likely to be progressively improved over time with higher rates of matching achieved . Where sensitive data can be pooled probabilistic methods may ultimately outperform deterministic methods .
It is challenging to defend whether private data linkage is truly linking the data we claim it is. We could not go beyond comparing the age and IMD in the linked sample with the unlinked people referred to IAPT. The linked IAPT referred group were a much better (though not perfect) match to the unlinked IAPT referred group than either the groups referred to hospital or in-depth evaluation group. In future studies we will build in the capacity to check the match in demographics more thoroughly.
Data quality is always an issue for studies using routine data , and our CMHP category relies on the clinician coding the problem title. Problem titles not always entered into primary care records and the nature of the short consultation in primary care means that not all data are recorded; incompleteness of data is a considerable limitation in its interpretation. Data quality in mental health is challenging . People without a mental health problem coded in their computer record may have had appropriate data recorded as free-text or have a physical health problem label (e.g. headache) .
Call for further research
Further research is needed to assess the acceptability of this approach to patients and to test its reliability - using a dataset where we can do open matching as well as using the SAPREL fuzzy logic.
Patients referred to the IAPT are predominantly of working age, i.e. aged between 16 and 64 and white population is overrepresented in the IAPT referred group. The sample who volunteered for the in-depth IAPT study was not entirely representative of the total population referred to the IAPT programme. They tended to be drawn from less deprived areas, were even more likely to be female, and older. These biases must be borne in mind when attempting to extrapolate the findings of the study to other populations. Linking data using SAPREL, a flexible largely deterministic method of private data linkage, has been shown to be technically feasible, ethically acceptable and has provided insight into selection bias.
RN: Technical Director of Sapior Ltd - who supply privacy enhanced solutions for the collection, de-identification and linking of sensitive data. The algorithms and fuzzy logic used are the intellectual property of Sapior.
Participating practices and colleagues in the participating NHS Trusts and IAPT clinics. This project was funded by Department of Health (IAPT team) and NIHR SDO project funding (Principal Investigator Professor Glenys Parry, Sheffield University).
- Henderson M, Page L: Appraising the evidence: what is selection bias?. Evid Based Ment Health. 2007, 10 (3): 67-8. 10.1136/ebmh.10.3.67.View ArticlePubMedGoogle Scholar
- de Lusignan S, Chan T: The development of primary care information technology in the United kingdom. J Ambul Care Manage. 2008, 31 (3): 201-10.View ArticlePubMedGoogle Scholar
- Kho ME, Duffett M, Willison DJ, Cook DJ, Brouwers MC: Written informed consent and selection bias in observational studies using medical records: systematic review. BMJ. 2009, 338: b866-10.1136/bmj.b866.View ArticlePubMedPubMed CentralGoogle Scholar
- de Lusignan S: Using routinely collected patient data with and without consent: trust and professionalism. Inform Prim Care. 2008, 16 (4): 251-4.PubMedGoogle Scholar
- Navarro R: An ethical framework for sharing patient data without consent. Inform Prim Care. 2008, 16 (4): 257-62.PubMedGoogle Scholar
- Neame R: Privacy and health information: health cards offer a workable solution. Inform Prim Care. 2008, 16 (4): 263-70.PubMedGoogle Scholar
- de Lusignan S, Sullivan F, Krause P: Vault, cloud and agent: choosing strategies for quality improvement and research based on routinely collected health data. Inform Prim Care. 2010, 18 (1): 1-4.PubMedGoogle Scholar
- Clark DM, Layard R, Smithies R, Richards DA, Suckling R, Wright B: Improving access to psychological therapy: Initial evaluation of two UK demonstration sites. Behav Res Ther. 2009, 47 (11): 910-20. 10.1016/j.brat.2009.07.010.View ArticlePubMedPubMed CentralGoogle Scholar
- Parry G, Brazier J, Dent-Brown K, Hardy G, Kendrick T, Rick J, Chambers E, Chan T, Connell J, Hutten R, de Lusignan S, Mukuria C, Saxon D, Bower P, Lovell K: An evaluation of a new service model: Improving Access to Psychological Therapies demonstration sites 2006-2009Final report. NIHR Service Delivery and Organisation programme;. 2010Google Scholar
- van Vlymen J, de Lusignan S, Hague N, Chan T, Dzregah B: Ensuring the Quality of Aggregated General Practice Data: Lessons from the Primary Care Data Quality Programme (PCDQ). Stud Health Technol Inform. 2005, 116: 1010-5.PubMedGoogle Scholar
- van Vlymen J, de Lusignan S: A system of metadata to control the process of query, aggregating, cleaning and analysing large datasets of primary care data. Inform Prim Care. 2005, 13 (4): 281-91.PubMedGoogle Scholar
- Ng S: Posets and protocols - Picking the right three-party protocol. IEEE J Sel Areas Commun. 2003, 21 (1): 55-61. 10.1109/JSAC.2002.806126.View ArticleGoogle Scholar
- National Archives: The Soundex Indexing system. [http://www.archives.gov/research/census/soundex.html]
- Arehart M: Indexing Methods for Faster and More Effective Person Name Search. Proceedings of Language Resources and Evaluation Conf.(LREC). 2010, 3601-5. [http://www.lrec-conf.org/proceedings/lrec2010/pdf/166_Paper.pdf]Google Scholar
- Blakely T, Salmond C: Probabilistic record linkage and a method to calculate the positive predictive value. Int J Epidemiol. 2002, 31 (6): 1246-1252. 10.1093/ije/31.6.1246.View ArticlePubMedGoogle Scholar
- UK Statistics Authority: Index of Multiple Deprivation (IMD). 2004, [http://data.gov.uk/dataset/imd_2004]Google Scholar
- Kumarapeli P, Stepaniuk R, de Lusignan S, Williams R, Rowlands G: Ethnicity recording in general practice computer systems. J Public Health (Oxf). 2006, 28 (3): 283-7. 10.1093/pubmed/fdl044.View ArticleGoogle Scholar
- Wajda A, Roos LL: Simplifying record linkage: software and strategy. Comput Biol Med. 1987, 17 (4): 239-48. 10.1016/0010-4825(87)90010-2.View ArticlePubMedGoogle Scholar
- Jaro MA: Probabilistic linkage of large public health data files. Stat Med. 1995, 15;14 (5-7): 491-8.View ArticleGoogle Scholar
- Wajda A, Roos LL, Layefsky M, Singleton JA: Record linkage strategies: Part II. Portable software and deterministic matching. Methods Inf Med. 1991, 30 (3): 210-4.PubMedGoogle Scholar
- Office for National Statistics: The 2001 Census. [http://www.ons.gov.uk/ons/guide-method/census/census-2001/index.html]
- Koopmans PC, Roelen CA, Bültmann U, Hoedeman R, van der Klink JJ, Groothoff JW: Gender and age differences in the recurrence of sickness absence due to common mental disorders: a longitudinal study. BMC Public Health. 2010, 10: 426-10.1186/1471-2458-10-426.View ArticlePubMedPubMed CentralGoogle Scholar
- NHS Information Centre for Health and Social Care: Adult psychiatric morbidity in England, 2007. Results of a household survey. Leeds; NHS Information Centre. 2009, [http://www.ic.nhs.uk/pubs/psychiatricmorbidity07]Google Scholar
- Kingsford R, Webber M: Social deprivation and the outcomes of crisis resolution and home treatment for people with mental health problems: a historical cohort study. Health Soc Care Community. 2010, 18 (5): 456-64. 10.1111/j.1365-2524.2010.00918.x.View ArticlePubMedGoogle Scholar
- Riva M, Bambra C, Curtis S, Gauvin L: Collective resources or local social inequalities? Examining the social determinants of mental health in rural areas. Eur J Public Health. 2011, 21 (2): 197-203. 10.1093/eurpub/ckq064.View ArticlePubMedGoogle Scholar
- Glover G, Arts G, Wooff D: A needs index for mental health care in England based on updatable data. Soc Psychiatry Psychiatr Epidemiol. 2004, 39 (9): 730-8. 10.1007/s00127-004-0779-8.View ArticlePubMedGoogle Scholar
- Gibson-White A, Majeed A: The Wellcome Trust Report: moving forward the use of general practice electronic patient records for research. Inform Prim Care. 2009, 17 (3): 141-2.PubMedGoogle Scholar
- Treweek S, Pearson E, Smith N, Neville R, Sargeant P, Boswell B, Sullivan F: Desktop software to identify patients eligible for recruitment into a clinical trial: using SARMA to recruit to the ROAD feasibility trial. Inform Prim Care. 2010, 18 (1): 51-8.PubMedGoogle Scholar
- de Lusignan S, Chan T, Theadom A, Dhoul N: The roles of policy and professionalism in the protection of processed clinical data: a literature review. Int J Med Inform. 2007, 76 (4): 261-8. 10.1016/j.ijmedinf.2005.11.003.View ArticlePubMedGoogle Scholar
- Hull SA, Rivas C, Bobby J, Boomla K, Robson J: Hospital data may be more accurate than census data in estimating the ethnic composition of general practice populations. Inform Prim Care. 2009, 17 (2): 67-78.PubMedGoogle Scholar
- Grannis S, Overhage J, McDonald C: Analysis of Identifier Performance using a Deterministic Linkage Algorithm. AMIA 2002 Annual Symposium Proceedings. 2002, 305-9.Google Scholar
- Durham E, Xue Y, Kantarcioglu M, Malin B: Private medical record linkage with approximate matching. AMIA Annu Symp Proc. 2010, 2010: 182-6.PubMedPubMed CentralGoogle Scholar
- Tromp M, Ravelli AC, Bonsel GJ, Hasman A, Reitsma JB: Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol. 2011, 64 (5): 565-72. 10.1016/j.jclinepi.2010.05.008.View ArticlePubMedGoogle Scholar
- de Lusignan S, van Weel C: The use of routinely collected computer data for research in primary care: opportunities and challenges. Family Practice. 2006, 23 (2): 253-63.View ArticlePubMedGoogle Scholar
- Chan T, van Vlymen J, Dhoul N, de Lusignan S: Using routinely collected data to evaluate a leaflet campaign to increase the presentation of people with memory problems to general practice: a locality based controlled study. Inform Prim Care. 2010, 18 (3): 189-96.PubMedGoogle Scholar
- de Lusignan S, Wells SE, Hague NJ, Thiru K: Managers see the problems associated with coding clinical data as a technical issue whilst clinicians also see cultural barriers. Methods Inf Med. 2003, 42 (4): 416-22.PubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/11/61/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.