Case identification of mental health and related problems in children and young people using the New Zealand Integrated Data Infrastructure

Background In a novel endeavour we aimed to develop a clinically relevant case identification method for use in research about the mental health of children and young people in New Zealand using the Integrated Data Infrastructure (IDI). The IDI is a linked individual-level database containing New Zealand government and survey microdata. Methods We drew on diagnostic and pharmaceutical information contained within five secondary care service use and medication dispensing datasets to identify probable cases of mental health and related problems. A systematic classification and refinement of codes, including restrictions by age, was undertaken to assign cases into 13 different mental health problem categories. This process was carried out by a panel of eight specialists covering a diverse range of mental health disciplines (a clinical psychologist, four child and adolescent psychiatrists and three academic researchers in child and adolescent mental health). The case identification method was applied to the New Zealand youth estimated resident population for the 2014/15 fiscal year. Results Over 82,000 unique individuals aged 0–24 with at least one specified mental health or related problem were identified using the case identification method for the 2014/15 fiscal year. The most prevalent mental health problem subgroups were emotional problems (31,266 individuals), substance problems (16,314), and disruptive behaviours (13,758). Overall, the pharmaceutical collection was the largest source of case identification data (59,862). Conclusion This study demonstrates the value of utilising IDI data for mental health research. Although the method is yet to be fully validated, it moves beyond incidence rates based on single data sources, and provides directions for future use, including further linkage of data to the IDI.


Background
Mental health problems are common among children and young people, with a worldwide estimated prevalence of 13.4% affected by any mental disorder [35]. In New Zealand, school-based survey results indicate 31% of young people experience at least two weeks of low mood, 15.7% report suicidal ideation, and 24% engage in self-harm each year [14]. The short-term consequences of childhood and adolescent mental health problems can include interference with education [38] and developmental milestones [16]. Longer term, they may be associated with personal costs, such as reduced employment [12,31], poorer quality of life [6], and societal costs such as greater economic burden [39].
Most information regarding the prevalence and treatment of mental health problems originates from small cross-sectional studies with short-term evaluation, and occasional, expensive longitudinal studies with finite long-term outcomes. To date, there has been limited use of administrative data for mental health research [8,19,48], especially in children and young people [36]. However, large amounts of administrative data, including information on hospital attendance, community care specialist services, and medication prescriptions, are routinely collected and stored by national health providers and related institutions and may be valuable for health research [5,17,21]. Use of data for this purpose is permitted in some countries by privacy legislation [34].
The advantages of using administrative data for research include the large, heterogeneous and representative nature of samples, which allows the reflection of real-world populations and practice, ongoing tracking of problems via regular collection of up-to-date data, long observation periods and low cost. Disadvantages include erroneous interpretation of data beyond the scope for which they were intended, variability of data quality, limited clinical detail, and potential public concern about administrative data being used for research purposes [26]. In New Zealand, administrative data on most interactions with government service providers as well as a range of survey data are housed in Statistics New Zealand's Integrated Data Infrastructure (IDI) [45]. The IDI is readily available, free to use, typically national in scope, and linked at the individual level.
Case identification for physical health problems using administrative data, typically utilising International Classification for Diseases (ICD) coding, has been widespread [1,10,20,33,37,49,50], but less so in mental health [15]. A standardised and accessible case identification approach means research can be more comparable, it negates the need to duplicate work, and permits researchers without specialist mental health knowledge to contribute more easily to the field. There are previous examples where New Zealand administrative mental health data have been used for case identification, but these are typically restricted to a narrow range of diagnoses and are not age specific [3,23,40].
This paper describes the development of an administrative data-based case identification method for research into the mental health of children and young people in New Zealand. The method utilises secondary care service use and medication dispensing data held within the IDI. Given that a large number of individuals with mental health problems are treated in primary care, through alternative therapies, or not at all, the method is not intended to calculate prevalence estimates for mental health problems but can be used to identify a relevant population for which we can examine health trajectories, comorbidities and a range of other wellbeing outcomes. These uses are especially important in understanding the burden of disease for mental health conditions and the impact they have on the lives of children and young people.

Integrated Data Infrastructure (IDI)
The IDI is a large research database managed by Statistics New Zealand, containing a wide range of administrative and survey data 1 about people and households [45]. Cabinet directives dating back to 1997 mandated Statistics New Zealand to undertake cross-agency data integration and the IDI was established in 2011. 2 Data in the IDI are held in a secure environment and can be accessed by approved researchers only for projects that are in the public interest. Since 1993 it has been possible to link health datasets together using the National Health Index (NHI) number, however, until the creation of the IDI, it was not possible to routinely combine health and non-health related data. The IDI provides secure access to linked data at an individual level. Data are linked probabilistically by Statistics New Zealand, usually using name, date of birth and sex. 3 After linking, all identifying information is removed before the data are made available to researchers. The IDI enables more extensive use of government data for research, including supporting the evaluation of the long-term impact of health interventions, with the aim of improved public health [2].

Data privacy
Statistics New Zealand's 'five safes' framework [44] is used to ensure data privacy is protected. Only approved researchers can use the IDI for projects that have a statistical purpose and are for the public good. All data are de-identified and only accessible via a secure connection from a secure Datalab. Data and results must be aggregated and confidentialised according to Statistics New Zealand protocols [46], and all results are checked by Statistics New Zealand prior to their release from the secure environment.
The legal requirements to protect the IDI data include the Statistics Act 1975, the Privacy Act 1993, and the Tax Administration Act 1994 [44]. In addition to legal requirements a number of Statistics New Zealand policies, protocols, and guidelines exist [46]. These include guidelines for information privacy, security, confidentiality policy and data integration, microdata access, and privacy and confidentiality. Regular privacy impact assessments for the IDI also provide a systematic evaluation of the benefits and risks associated with integrating data from a number of sources [47].

Data
Five datasets are used in this study. All are from administrative sources held within the IDI accessed in June 2019 using the most recent refresh of IDI data at that time. The datasets are each described below, including associated strengths and weaknesses for case identification of mental health and related problems.

Programme for the integration of mental health data (PRIMHD)
PRIMHD is a national collection of publicly funded specialist mental health service use (PRIMHD activity data) and diagnoses (PRIMHD classification (diagnosis) data). Data are collected from district health boards (DHBs) and non-governmental organisations (NGOs) that provide specialist mental health services and used to report on which services are being provided, and who is providing the services for health consumers across New Zealand's mental health sector [41]. A limitation of PRIMHD data is that it covers only publicly funded specialist mental health care which is targeted to the approximately 3 % of the population with the most serious mental health problems [27]. It does not cover mental health care provided in a primary care setting (the majority of mental health care in New Zealand), or mental health care provided in the private sector.
PRIMHD classification (diagnosis) data PRIMHD classification (diagnosis) data in the IDI were collected from 1 July 2008 to 31 December 2016 and include primary and secondary diagnosis codes (ICD-10-AM and Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV). PRIMHD is the only national collection of formal psychiatric diagnoses in New Zealand but it has some limitations. Some people have contact with mental health services but do not have specific diagnoses recorded in PRIMHD. For most (nearly 85% of these non-specific diagnosis codes), this is because clients were seen for only a brief period and there was insufficient time for a diagnosis to be assigned [29]. This gap in data is reflected in approximately 37% of all classifications (not clients).

PRIMHD activity data
PRIMHD activity data in the IDI were collected between 1 July 2008 and 30 June 2018 and contain a range of variables including (i) an activity type code, i.e. a code that classifies the type of healthcare provided, and/or (ii) a team type code, i.e. a code that identifies which team provided a service. These codes can be used to identify individuals with mental health problems. Using PRIMHD activity data, a diagnosis can sometimes be inferred from the type of service the client receives, as with substance use, and can help to improve the coverage and quality of diagnosis information.

The National Minimum Dataset (NMDS)
NMDS is a national collection of publicly funded New Zealand hospital admissions, including day patients (stays of either 3 hours or more but not overnight) and emergency department visits of greater than 3 hours. Primary and secondary diagnosis codes (ICD-10-AM) are recorded for every hospital event and are used to identify mental health and related problems. NMDS data within the IDI were collected between 1 January 1988 and 31 December 2017. As there have been several changes to NMDS data collection over the years, most notably prior to 1994, for the current study NMDS use is restricted to 1994 onwards [42].
A key advantage of NMDS is the ability to utilise secondary diagnoses, i.e. issues deemed material to an individual's care that are not the main reason for their hospital admission. For example, a patient may be admitted to hospital for a non-mental health reason but their mental health may affect treatment or recovery and is therefore recorded as a secondary diagnosis. In some cases, these individuals will not have accessed services for mental health, or may have done so only in the primary care setting (data not held in the IDI). In these circumstances, NMDS permits case identification of mental health problems not otherwise possible in the IDI.

Socrates
Socrates is the national database of the Ministry of Health's (MoH's) Disability Support Services clients and service providers. Individuals have data recorded in Socrates when they apply for a needs assessment to access home help or other support services via a Needs Assessment and Service Co-ordination Agency (NASC) throughout New Zealand. A range of disabilities can be recorded on an individual's record and these include some mental health diagnoses. These diagnoses come with the referral for the client; for example from a general practitioner, social worker, treatment and rehabilitation provider, or psychologist. Socrates was established in 2008 and data are robust from 1 January 2010. While Socrates data exist in the IDI prior to 2008 they are sparse and not deemed reliable.
There is uncertainty around the diagnostic detail in Socrates; it is not known who provides the diagnosis and therefore the accuracy of the diagnosis may vary. However, people with mental health problems, in particular neurodevelopmental disorders such as Attention Deficit and Hyperactivity Disorder (ADHD) can be identified. Some of these children and young people will not have been referred to specialist mental health services and therefore Socrates offers an additional source of case identifications.

Pharmaceutical collection (pharms)
Pharms contains claim and payment information from pharmacists for government-subsidised medication dispensing throughout New Zealand [43]. Pharms data in the IDI were collected between 1 January 2005 and 31 December 2017.However, as pre-2007 data were collected with less than 90% coverage, only data collected from 2007 onwards were used for research purposes, as per MoH recommendations. 'Chemical IDs' assigned to each dispensing are used to identify the specific medication dispensed and can be used as indications for mental health problems.
The main advantage of pharms data is that they include information about both specialist and general practitioner prescribing. Therefore, pharms data provide some insights into mental healthcare activity at the primary care level. However, diagnostic information can only be inferred from pharmaceutical codes related to dispensing, therefore diagnoses derived from pharmaceutical data are less certain than those derived from PRIMHD and NMDS. They should be considered speculative because, while they provide information on dispensing of psychotropic medications, the use of any individual medication can be for a number of conditions. For the purposes of this paper, we have clustered data by the most likely use for a medication. At present, it is not possible within pharms to identify the health specialty of the prescriber (psychiatrist, general practitioner, or other medical practitioner) or the reason for the prescription [4].

Mortality collection
The mortality collection contains information about the underlying causes of all deaths registered in New Zealand [28]. It uses the ICD-10-AM classification and World Health Organization Rules and Guidelines for Mortality Coding [51]. Mortality data are robust and of high quality. However, due to the timeframe for coronial processes, there is a two-year lag in its availability. For the current analysis, the mortality dataset within the IDI was collected between 1 January 1988 and 31 December 2015 and has been used in this study to identify cases of death by suicide.

Establishing and refining case definitions for mental health problems
Our aim was to create a method for identifying common, clinically relevant mental health problems for New Zealand children and young people (aged 24 and under) using available IDI data, which also included self-harm. The method built on a similar approach taken by the Social Investment Agency (SIA) [40] but was developed specifically for children and young people. The twostage process undertaken to establish the case identification method is summarised below.
In the first stage, a short-list of 13 mental health (and related) problems of interest was derived by a clinically experienced team, based on the IDI data available. Our focus was on disorders that present to primary and secondary services. We were aware of the limitations of finer definition of problems because of the data available; for example, the sub-categories of anxiety disorder. For this reason, we chose broader categories that would allow us to make some limited assumptions about care in the primary setting that were sufficient for the purposes of population surveillance. Our final list comprised anxiety, depression, bipolar disorders, emotional problems (where anxiety and/or depression could not be reliably distinguished), 5 disruptive behaviours, substance problems, eating problems, psychosis, personality disorders, sleep problems, self-harm, other mental health problems, 6 and mental health not defined. 7 The second stage of work involved the systematic classification and refinement of codes used to assign cases into each mental health problem category, by data source. A panel of eight specialists covering a diverse range of mental health disciplines (a clinical psychologist, four child and adolescent psychiatrists, two with dual qualifications as paediatricians, one with specific specialist expertise in substance-use disorders and three academic researchers in child and adolescent mental health) independently assigned diagnostic codes to the 13 mental health problem categories. Most data sources (NMDS, PRIMHD, Socrates, and the mortality collection) provided specific diagnoses (e.g. ICD-10-AM or DSM-IV) that fit naturally into the 13 mental health problem categories. For pharmaceutical data, the panel's clinical expertise was utilised to infer the mental health problem according to the type of medication and patient age. Any disagreements were resolved by discussion and consensus. Five age strata (0-4, 5-9, 10-14, 15-19, and 20-24) were defined and used to increase accuracy of inferred diagnoses. These age bands were used as recommended management strategies depend on developmental level [24]. In addition, secondary mental health services are approximately organised around these age bands. Our method of classification drew on clinical experience and took into account the prevalence of disorder and the likely treatment within these age bands. For example, Amitriptyline is not included in the rubric for depression in children and adolescents but is for those aged 20 years and over. Medications deemed to be used for both anxiety and depression were assigned to the category 'emotional disorders' (e.g. Fluoxetine) and medications used for several mental health problems (e.g. Risperidone, which may be used for the treatment of psychosis, disruptive behaviour, obsessive compulsive disorder, bipolar disorder and emotional dysregulation) to the category 'mental health not defined'. Medications considered more likely to be used for the treatment of non-mental health problems were entirely excluded. Details of all the individual codes used to determine each of the 13 mental health problem categories can be found in the Appendix.
The data sources used to identify each specific mental health problem group are outlined in Table 1. Some mental health problems were derived from as few as two datasets (e.g. eating problems were identified using NMDS and PRIHMD Diagnosis) and others from as many as four datasets (e.g. anxiety was identified using NMDS, PRIMHD, Pharms, and Socrates).

Data management
Data preparation was conducted in SAS 7.1 within the IDI environment. There were three main steps. First, event level data (e.g. medication dispensing for pharms, hospitalisations for NMDS) were extracted separately for each of the five datasets used in the study, for all individuals in the New Zealand youth population (0-24) for the 2014/15 fiscal year. Then, using the coding system for case identification described, 13 dichotomous mental health problem indicator variables were generated for each individual. Each dichotomous indicator was set to one if at least one code from the code list was found in any data source. Finally, data from each of the five datasets were appended and then collapsed to one set of mental health problem indicators per person. For individuals who had a 'mental health not defined' and another specific mental health problem group indicator (excluding self-harm) the 'mental health not defined' indicator was set to zero. The resulting data were analysed using StataMP 15. All counts were randomly rounded to base three in line with Statistics New Zealand confidentiality requirements.

Establishing the New Zealand youth (0-24) population 2014/15
The New Zealand youth population (0-24) was calculated using existing methods for estimating a resident New Zealand population from the IDI [18,52]. More specifically, this method included people whose presence in New Zealand was indicated by activity in key datasets. Individuals who had died 8 or moved overseas were excluded. The total resident population generated using this method was within 2% of the official estimated resident population. Case identifications were restricted to people from within this population and 12-month prevalence rates were derived using this population as a denominator.

Ethics approval
The University of Otago Human Research Ethics Committee reviewed the study for ethics consideration. The study was reviewed as a 'Minimal Risk Health Research -Audit and Audit related Studies' proposal and was approved. Approval to access IDI data was granted by Statistics New Zealand.

Results
The case identification method was applied to data from the 2014/15 fiscal year. Over 82,000 unique individuals aged 0-24 with at least one mental health problem indicator including self-harm, other mental health, and mental health not defined were identified (see Table 2), indicating a 12-month prevalence of 5318 per 100,000 population (equivalent to 5.3%). The most prevalent mental health problem subgroups were 'emotional problems' with 31,266 individuals (2.0% of population), followed by substance problems with 16, 314 individuals (1.7%), and disruptive behaviours with 13,758 individuals (0.9%).
Overall, pharms identified the greatest number of individuals (almost 60,000) and was also the data source used to identify the most individuals in six of the 13 problem groups (anxiety, depression, emotional problems, disruptive behaviours, psychosis, and sleep problems). PRIMHD was the data source that contributed the second most case identifications (over 32,000) and was the biggest contributor to a further six of 13 specific mental health problem groups (bipolar disorders, substance problems, eating problems, personality disorders, mental health not defined, and other mental health). NMDS was the only data source used to identify cases of non-fatal self-harm, and also contributed to the case identifications in all other mental health problem groups. Socrates was used in only eight of 13 mental health problem groups, however, while the corresponding case identifications numbers were generally low, it contributed to identifying nearly 600 disruptive behaviour cases.

Key findings
We have proposed a method to identify and classify mental health problems among New Zealand children and young people using a range of data from the IDI. The method identified over 82,000 individuals aged between 0 and 24 with mental health problems, affecting 5.3% of all youth in 2014/15. Not surprisingly, emotional disorders, when combined with specifically defined anxiety and depressive disorders, comprise, by far, the greatest number of treated mental health problems. This is followed by substance problems.
The method is not designed to estimate prevalence of all diagnosed mental conditions due to an undercount arising from relying mostly on secondary service use data. However, it does provide a method for identifying a population of individuals with mental health problems at least serious enough to require some level of public health funded intervention. Additionally, it can provide information to help to understand the use of mental health services and pharmaceuticals, and more broadly facilitate research on those affected by mental health problems.
The results clearly demonstrate the value of utilising multiple datasets within the IDI, as there was no single dataset that performed well across all categories. The pharmaceutical collection data contributed the highest number of case identifications overall, however, in a number of mental health problem groups, other datasets were the main contributors, e.g. PRIMHD (substance) and NMDS (self-harm).
Having a method that can be used to simultaneously identify a range of mental health problems, including selfharm, at the individual level, and be able to link these data to other data sources (including non-health) could begin to set a framework to address important questions such as risk and protective factors, long term outcomes, health trajectories and burden of disease estimates for individuals with chronic mental health conditions.

Limitations and strengths
A limitation of this method is the current lack of formal validation against other data sources. Formal validation would be useful for two reasons. First, it could establish whether the diagnoses recorded in administrative data and those inferred from pharmaceutical dispensing have been correctly assigned. This is typically done through a detailed review of medical or case notes [13,25]. Second, validation could measure the level of undercount in the identified population, and the extent to which this undercount varies for different age, sex, ethnic, and other groups. One approach to measuring undercount is to compare our method against a dataset in which there is a complete record of mental health diagnoses, such as a sample survey or registry that contains complete information for a subset of the population [7,32]. This may be possible in the future as the New Zealand Health Survey [30], which contains mental health information that may be useful for validation, is scheduled for inclusion in the IDI, but is not currently available. In the absence of a survey or other dataset containing complete mental health diagnosis information, statistical approaches such as capture-recapture may be useful to estimate the extent of undercount. These have been used previously with New Zealand administrative health data [22,23], although they are not without challenges, in particular ensuring that the independence assumption is met [23].
The absence of primary care data means that people treated in primary care without medication (for example, those who are referred to brief intervention services or other publicly or privately funded psychological therapy) are not captured with existing datasets. Pharms data provides a way to account for people treated in primary care, however, it is the weakest dataset used in terms of clinical detail and accuracy. There is an increased risk of false positive case identifications when using pharmaceutical indications because some medications can be prescribed for non-mental health problems (e.g. amitriptyline for neuropathic pain). We have attempted to mitigate this risk by excluding medications that are considered to be used mostly for non-mental health problems, and imposing age restrictions on others to increase the likelihood that they are being used for mental  health. However, until a formal validation process can be undertaken, the risk of over-identification remains and the assignment of diagnostic categories using medications should be considered an informed guess rather than a definitive classification. The structure of the data sets and missing data from major and important sectors, such as the primary healthcare sector, means that this method should be considered with caution and treated as a first attempt to make sense of the national data. We have used a careful and transparent process to assign codes, with input from experts from a range of relevant backgrounds. Although multiple individuals were involved in assigning mental health codes to problem group categories for case identification, limited engagement was undertaken with clinical coders (those who ascribe diagnosis codes based on clinical records) and other clinicians and stakeholders, and this may have limited the accurate interpretation of data.
The case identification method described is based on administrative data that measures service use rather than prevalence of mental health problems. From epidemiological studies we know that for many common disorders such as depression and anxiety, the majority of young people do not access services. Therefore, the prevalence rates in this paper are likely to be lower than rates derived from surveys or other sources that are not based on service use. In addition, as evidenced by the large number of problems classified 'not defined', not all mental health problems can be classified using this method. Given that mental health problems are comprised of overlapping symptom clusters with often limited temporal stability, there may never be a perfect way to identify and track them using administrative data. Furthermore, administrative data lacks clinical detail and often has known quality issues, both of which may affect the accuracy of case identification.
The approach presented in this paper is not a panacea for mental health research in the IDI. Rather, it is an example of a broad approach that could be tailored by other researchers to suit the needs of their individual projects. For example, researchers may wish to exclude cases identified by medications if they want to minimise uncertainty. Furthermore, researchers should be aware of, and make explicit, the limitations of the method and contributing data sources. These limitations notwithstanding, the method provides a better means for identifying mental health problems than existing methods using single source service use data.

Ethical issues
The secondary use of administrative data for research purposes is legal in New Zealand. The development of this administrative data into large linked data sources such as the IDI has raised issues around ethics and guidelines. Further discussion of these issues will be critical to the ongoing development and use of IDI data to ensure ethical use. The increased analytical power of such linked datasets needs to be balanced with the right to privacy for individuals, the lack of true informed consent, issues of data ownership in life and death, the veracity and completeness of available information, mechanisms for managing unexpected findings and agreed limits to the usage of data [11]. The possibility that continual comparison with other ethnic groups could disadvantage Māori and Pasifika people, who already face disparities in health, mental health and in a number of other areas, must be considered. Furthermore, that universal measures may not address the needs of specific cultural populations [9] should be borne in mind when applying data from this source.

Further research and potential uses
Further research is needed to formally validate and potentially refine the described method. This may be undertaken initially using New Zealand Health Survey data that is scheduled to be uploaded into the IDI. Alternative approaches could include medical record review from either primary or secondary care data or capturerecapture methods. The development of a truly robust method is likely to be iterative and might include code weighting and further refinement of age restrictions or code allocations once a data source is available to validate against. Once validity has been demonstrated, the method could be used to track mental health problems in children and young people over time to better understand pathways to risk and resilience. The IDI method could also be used for evaluating the long-term impact of public mental health interventions and, in time, reducing health disparities and inequalities.

Conclusion
We have described how multiple data sources from within the IDI can be used to identify and classify mental health problems according to secondary service use and medication dispensing data among New Zealand children and young people. This novel approach enables improved capabilities for mental health research and evaluation, however its current limitations should be kept firmly in mind. It could be further strengthened by the inclusion of additional data sources in the IDI, in particular primary care data. Undertaking a formal validation would allow for greater confidence in validity and also highlight areas where improvements can be made. The creation of the IDI is an important step forward in tracking health and well-being in New Zealand, but it is a new resource and ongoing work is needed to fully realise its potential for mental health research.