PrAna: an R package to calculate and visualize England NHS primary care prescribing data

Background The objective of this work to calculate prescribed quantity of an active pharmaceutical ingredient (API) in prescription medications for human use, to facilitate research on the prediction of amount of API released to the environment and create an open-data tool to facilitate spatiotemporal and long-term prescription trends for wider usage. Design We have developed an R package, PrAna to calculate the prescribed quantity (in kg) of an APIs by postcode using England’s national level prescription data provided by National Health Service, for the years 2015–2018. Datasets generated using PrAna can be visualized in a real-time interactive web-based tool, PrAnaViz to explore spatiotemporal and long-term trends. The visualisations can be customised by selecting month, year, API, and region. Results PrAnaViz’s targeted API approach is demonstrated with the visualisation of prescribed quantities of 14 APIs in the Bath and North East Somerset (BANES) region during 2018. Once the APIs list is loaded, the back end retrieves relevant data and populates the graphs based on user-defined data features in real-time. These plots include the prescribed quantity of APIs over a year, by month, and individual API by month, general practice, postcode, and medicinal form. The non-targeted API approach is demonstrated with the visualisation of clarithromycin prescribed quantities at different postcodes in the BANES region. Conclusion PrAna and PrAnaViz enables the analysis of spatio-temporal and long-term trends with prescribed quantities of different APIs by postcode. This can be used as a support tool for policymakers, academics and researchers in public healthcare, and environmental scientist to monitor different group of pharmaceuticals emitted to the environment and for prospective risk assessment of pharmaceuticals in the environment.


Background
During the last decade a wide range of pharmaceuticals have been identified and quantified in the aquatic environment across several countries and their impacts on exposed environmental species [1][2][3][4] and humans [5] have been reported. However, currently it is not feasible to monitor all the pharmaceuticals released into the aquatic environment due to analytical feasibility and monetary limitations [6]. Recently, several procedures have been established to predict the total amount of pharmaceuticals emitted to the environment for risk assessment of pharmaceuticals in water by public authorities in both Europe [7,8] and the United States [9]. These guidelines describe how to evaluate the risk of the pharmaceuticals in the environment, based on human pharmaceutical consumption, ecotoxicity of these pharmaceuticals [10] and their removal during wastewater treatment [11], where human pharmaceutical consumption plays a vital role [12].
In addition to its role in the prediction of environmental concentration in water, spatiotemporal data of human pharmaceutical consumption combined with other data could be used to explore and predict public health [13], for example, (1) prescription data for short-acting β2-agonists, and respiratory related prescribing could be used as a predictor for respiratory mortality and/or air pollution health effects [14][15][16], (2) antibiotic prescription data could be used as a predictor to measure antimicrobial resistance in the community [17][18][19], (3) reduction in the number of antidepressant prescriptions can be linked with an increase in the urban green space [20,21], (4) opioids prescription monitoring could be used to assess community mental health [22,23].
Despite its importance in environmental risk assessment and support in predicting public health, estimation of the human pharmaceutical consumption or the total manufactured amount is difficult. In those countries where the data is publicly available, it is complex and not straight-forward to handle. In the UK, since December 2011, the National Health Service (NHS) has made national prescription data publicly available at general practice (GP) level, reporting the item counts, quantity and cost of prescriptions aggregated by British National Formulary (BNF) code [13].
National prescription data from the NHS dataset typically contains over 10 million records per month. And this dataset cannot be used for the direct calculation of the consumption levels of different pharmaceutical drugs. We need to use BNF code and SNOMED code [24,25] to identify the actual pharmaceutical ingredients and contents together with prescribed quantities to calculate the actual amount of each component dispensed. Open Prescribing provides some tools for data exploration for national level spatiotemporal trends for different pharmaceutical drugs using NHS prescription data [26]. However, these tools report used item counts for the drug volume, which has limitations [27], as it does not take the actual dosage or total amount of consumption of pharmaceuticals, which is crucial for the total amount of pharmaceuticals emitted to the environment.
In this article, we present PrAna, an R [28] package to aggregate and estimate the total prescribed quantity of active pharmaceutical ingredients (APIs) in prescription medications for human use in the UK. The PrAna package is complemented by a user-friendly interactive tool, PrAnaViz, developed using R/Shiny framework, where user can explore and visualise the spatiotemporal trends of the total prescribed quantity of APIs and can download the results in various data formats for their further studies.

Data sources NHS prescription and practices location datasets
Monthly national level prescribing datasets for the years 2015 to 2018 were downloaded from NHS Digital [29]. Clinical Commissioning Groups (CCG) names and codes and CCG geographic boundaries, GP practices geographical location were downloaded from NHS Digital [30] and Office for National Statistics data portal [31] used under the terms of the Open Government Licence. Postcode information were downloaded from Office for National Statistics [32] licensed under the Open Government Licence v.3.0. NHS prescription datasets contain dispensed prescriptions from general practitioners and other non-medical prescribers (such as nurses, pharmacists, optometrists, chiropodists and potentially radiographers) but does not cover private prescriptions. Each dataset has more than 10,000 rows, with each row representing a prescription, containing information on the dispensed practice code, British National Formulary (BNF) code, BNF Name, number of items prescribed, net ingredient cost, actual cost, quantity, and period as in Table 1.
As we are interested in calculating the total quantity of an individual active pharmaceutical ingredient (API), the whole dataset needs to be evaluated for each API. The columns BNF code and BNF name could help us in the normalisation, as they carry the information regarding the API. The BNF code is a 15-digit code unique for each formulation, dose and product combination. Together with the quantity prescribed, we use this to identify the amount of each API dispensed with each prescription. '

BNF/SNOMED mapping data
The BNF code in the 2015 to 2018 annual NHS prescription datasets uses the legacy Master Data Replacement (MDR) drug database [29]. We map these BNF codes to individual API with BNF/SNOMED mapping data ( Table 2) provided by the NHS Business Services Authority (BSA) on June 2018 [24], BNF/SNOMED mapping data map legacy MDR drug database and the Dictionary of Medicines and Devices (dm + d) [33]. Once we have mapped each BNF in the dataset to its APIs we can evaluate the quantities prescribed in the NHS prescription dataset.

Data aggregation
First step to quantify an individual API from NHS prescription dataset is to map 'VMPP/AMPP SNOMED' code in the BNF SNOMED mapping dataset to an individual chemical substance. To achieve this, 'VMPP/ Our method also helps to differentiate API from the formulations containing more than one API, for example, consider co-amoxiclav 500 mg/125 mg tablets, with our method we have managed to find 24 unique 'VMPP/ AMPP SNOMED' codes for this and it is matching to 1 unique BNF code, 2 VMPPs, 23 AMPs with 24 AMPPs. For this example, we have identified APIs as amoxicillin (as amoxicillin trihydrate) and clavulanic acid (as potassium clavulanate) with strength 500 mg and 125 mg respectively, medicinal form as tablet, and two different pack levels 21 tablets and 100 cap tablets respectively, from 26 different manufacturers.
In the second step, the 'BNF code' from the dataset generated as above, was matched to the BNF code in the NHS prescription dataset. This enables to identify individual API, along with information on its medicinal form,  strength, and its unit of measurement for each row in the NHS prescription dataset. After the identification of API and its strength, the dataset is grouped by individual API, by GP practice code and ultimately by postcode and the total prescription quantity for individual API was calculated for each GP practice code and postcode. We have used 'Quantity' from the NHS prescription dataset to measure total prescription quantity. Postcode and geographical coordinates were linked to the dataset by matching the GP practice code from the dataset downloaded from NHS Digital and Office for National Statistics data portal. The whole process is summarised below in Fig. 1. Full methodology for this complete matching process is available in the technical documentation online [34].

PrAna: an R package implementation
PrAna is an R package providing a comprehensive workflow including data preparation, data conversion, data visualisation and the ability to export/download the generated plots and data, as outlined in the Table 3. The package is now available from https:// github. com/ PrAna Viz/ PrAna under an MIT license and we are in the process of uploading it in the Comprehensive R Archive Network (CRAN). Deploying the PrAna requires R version 3.5 or greater and a small number of package dependencies, detailed instructions can be found in the online documentation [34].

PrAna installation
Several functions wrapped in the PrAna package utilised in the workflow, we recommend using RStudio to install PrAna, to keep directories specific to a single `.Rproj`. Since the code is published on GitHub, it can be installed using `devtools`. With `devtools` installed you can download and install the latest version of 'PrAna' in the R console with `install_github("PrAnaViz/PrAna")`.

Data preparation
It is strongly suggested to setup the destination folder as the working directory using setwd() function. `csv-2dat()` function is used to combine several monthly NHS prescription dataset "Practice Level Prescribing Presentation-level Data: Practice prescribing data" files, downloaded from NHS digital [29] into a single data table. For example, if the user wants to combine all the monthly prescription dataset downloaded in the "C:/ Datasets/Prescription Datasets/2018/PDPI" location to a destination folder "C:/Datasets/2018", following commands need to be executed in the R console. `importdmd()` function is used to import different extracted dm + d files [33] and return multiple data objects including a data table which map each BNF code to its corresponding API(s), strength and medicinal form.
Recommended to read the documentation of importdmd() function to know more regarding the different data objects it generate [34].
## Read the extracted dm + d files   First step, is to use the function `combine_addr()` to generate a combined ADDR file.
The file will have columns: GP practice code, name, and address with postcode.
The next step is to generate the list of GP practices for the defined CCG region using the function `generate_gp_list_ccg()`.
This function requires following parameters, • shape_file-Generalised Clipped Boundaries of CCG regions in England [35] • postcode_ons-NSPL file [32] • gp_data_file-gp_address_file generated in the using `combine_addr()` function • region-Name of the CCG region Briefly, this function identifies the postcodes of the GP practices from the ADDR files and finds its geographic location from NSPL file by matching the postcodes. To estimate the GP practices within each CCG region, generated GP practices geographic locations were intersected with the freely available geospatial shape file of high resolution CCG maps [35].

Data normalisation and conversion
The final step in the data conversion is to use practice_ wise() to generate the prescription dataset mapped with the individual API, prescription quantity, medicinal form, and strength for the defined GP practices. The practice_ wise() function require following six parameters, as demonstrated in the example below, • Combined NHS prescription dataset, generated using csv2dat() function.

Database service
As a result of the large datasets, to run PrAnaViz we recommend uploading the processed dataset to a local or a remote database service, for example, MySQL, and link it to the PrAnaViz. More information on the linking databases to PrAnaViz is explained in our technical documentation online.

PrAnaViz: an interactive data analysis tool
We developed an interactive tool PrAnaViz to visualise, explore and export different spatiotemporal prescription trends for wider use using the GP specific prescription data generated using PrAna package. Created using R/ Shiny, PrAnaViz, uses a web-based interactive dashboard layout that most users are familiar with from common websites and web-based tools.
To launch PrAnaViz run the following command in your R Console: > library(PrAna) > PrAna::runShiny("PrAnaViz") The PrAna::runShiny("PrAnaViz") function will pop-up the PrAnaViz tool which will allow you to explore different spatiotemporal and long-term prescription trends with the sample dataset.
The basis of PrAnaViz functionality is that any user can explore prescribing trends broken down by chemical substance, and to explore and visualise the variation in prescribing at region, postcode, and individual GP practice level, where both the overall trends and the relative contribution from each API, medicinal form can be seen. PrAnaViz contains two different dashboards tabs: (1) Targeted API approach in which a user can input their target(s), i.e., API(s) of interest and calculate, visualise, and explore total prescribed quantity of targeted API(s) in a selected region, (2) Non-targeted API approach where the user can select an individual API and visualise

PrAna
We extracted the available monthly prescription dataset from January 2015 to December 2018 and mapped it to the prescribed general practices and by postcode. And the data generated will be in comma separate value (.csv) files dedicated to each general practice with file name corresponding to the GP code, we obtained the GP code from NHS Digital [30] and geographical coordinates to the postcodes from Office for National Statistics data portal [31]. This mapping also enables to identify the medicinal form (e.g., tablet, capsule, solution for injection, etc.), strength with its unit of measurement (e.g., mg, ng, µg, etc.), packing level information and manufacturer information. We then excluded compounds related to medical devices/appliances, bath additives, washes, foams, shampoos, sprays, multivitamin capsules, and tablets for our analysis.

PrAnaViz
Information on prescribing quantity of different APIs in a particular year and a year-long trend of an API are interesting for various applications including community health monitoring [23,[36][37][38], environmental monitoring [39,40], policy interventions [41], and environmental and water quality management [21,42,43]. This tool can facilitate research in these aspects and in the identification of most prescribed API in particular region or in a practice and able to identify the variation across the prescription of API over a period or region or practice. For example, Fig. 2A shows the annual prescription (in kg/ year) of 15 APIs corresponding to antibiotics group in Bath and North East Somerset CCG region in 2018 and shows that amoxicillin (209.76 kg/year) and sulfasalazine (231.29 kg/year) are the most prescribed in the group at selected condition, but when comparing the same group of APIs in month wise prescription (in kg/month) Fig. 2B shows that like amoxicillin and clarithromycin, shows the higher prescription rate in winter comparing to summer, but sulfasalazine prescription is same throughout the year, with an average of 19.2 kg/month and standard deviation of 0.88 kg/month. Apart from the prescription trends for the whole region variation across the tool helps to monitor individual API prescription by month (Fig. 3A), by individual GP level (Fig. 3B), by individual postcode level (Fig. 3C) and medicinal form for the selected CCG region (Fig. 3D) over the selected year. The second dashboard in PrAnaViz generates spatiotemporal trends for an API. It helps to compare prescription quantity of an API by individual postcode level in a selected region over a period and it helps to generate annual trends. For example, as shown in Fig. 4A, prescription quantity of clarithromycin in January 2018 at an individual postcode levels in the Bath and North East Somerset CCG region, 77% of the practices in that postcodes prescribed more than 0.1 kg/month; while the top 17% in that prescribed more than 0.35 kg/month, For comparison, the prescription of more than 0.1 kg/month of clarithromycin at the practices in these postcodes for the month June 2018 decreased to 47%; while only one GP practice prescribed more than 0.35 kg/month in that period. Users can also visualise these results as a bar plot, as in Fig. 4C.
The tool also visualises the annual trends of clarithromycin at an individual postcode (for example, BA1 7NP in Fig. 4B) with an information about the corresponding GP practices at the postcode as presented in Fig. 4B.

Summary
We have developed an R package, PrAna to calculate prescribed quantity (in kg) of an APIs by postcode using England's national level prescription data provided by National Health Service, for the years 2015-2018. There were 27,848 unique BNF codes identified in the compiled prescription datasets, after excluding compounds related to medical devices/appliances, bath additives, washes, foams, shampoos, sprays, multivitamin capsules, and tablets, these were matched to 97% of the BNF codes generated. We have also created PrAnaViz, an inbuilt visualisation tool within the R package where user can explore spatiotemporal trends in prescription of an API or group of APIs. This tool helps to understand the GP practice level and postcode level variation of the prescribed drug in the selected CCG region, with resolution to each month and different medicinal forms. The tool also enables to download the generated dataset in comma separated value (.csv) files and publication ready images. Users can modify the colour palette for the navigation bar, side bar, accents bar and the plots colour schemes. The colour palette includes colour-blind/publication/ printer friendly options.

Strength and limitations
The R package enables to process annual data for the whole of England's prescription data for the years 2015-2018, not a sample. The tool successfully calculated the quantity of APIs prescribed in several presentation levels such as tablets, capsules, oral form. We have used 'Quantity' from the NHS prescription dataset to measure total prescription quantity. The major limitations with the calculations are (1) the data used for the calculations are monthly time resolution, (2) delay in the release of the data, and (3) in some cases, people live far from their registered GP practices, for example, GP practices in the university premises, and (4) in some cases, the prescribed medicines were dispensed in a pharmacy located in a different city. We have developed PrAnaViz, a free, openly accessible, in-built interactive tool to visualise and analyse PrAna generated dataset in real-time. PrAnaViz facilitates wider use with spatiotemporal and long-term trends. The tool enables to calculate and visualise prescription quantity (in kg) per postcode, medicinal form, and GP practice. Users can download the produced graphs as a publication ready image or a dataset (.csv) to carry out their own analyses using different software.
We have installed a demo version of the PrAnaViz tool as web application with limited data, hosting only 2018, Bath and North east somerset CCG dataset and it is publicly available to get feedback and to monitor user volume. The tool is under continuous development.

Conclusion
We have developed an R package, PrAna, incorporated with a method to aggregate and normalise NHS prescription dataset to calculate total prescription quantity for an individual API specified to a postcode or GP practice. We have also successfully calculated the total prescription quantity for England for the year 2015-2018. Apart from the R package, we have also developed a standalone, comprehensive, PrAnaViz, with the generated dataset to analyse, visualise, and explore prescription data. With its spatio-temporal prescribing trends of different APIs including resolution to postcode and GP Practices, PrAna and PrAnaViz, can be used as a support tool for policymakers, academics, and researchers in public healthcare, and by environmental scientists to monitor different group of pharmaceuticals emitted to the environment and for prospective risk assessment of pharmaceuticals in the environment.