From: An informatics consult approach for generating clinical evidence for treatment decisions
Task | Traditional approach | Approaches to automate |
---|---|---|
Evidence from randomised trials | ||
Search for recruiting trials | Perform search on the clinicaltrials.gov website. Requires manual decisions on relevant search terms | Perform search on the clinicaltrials.gov website using search terms collated from free text input in the Informatics Consult platform. Potential to leverage developments on computable machine-readable trial protocols (https://doi.org/10.1177/009286150704100312) and computable phenotypes (i.e., algorithms to identify clinical characteristics derived from electronic health records) to identify potentially eligible patients for trial recruitment (https://doi.org/10.1161/CIRCOUTCOMES.119.006292). Potential to use sentence embedding and Google BERT as approaches for matching natural language queries with relevant trial protocols |
Summarise data of recruiting trials | Download search results from the clinicaltrials.gov website. Manually format tables. Extract additional information not present in downloaded data from the website by inputting NCT numbers | Download search results from the clinicaltrials.gov website. Generate scripts for automated table formatting to retain relevant information. Create a Python web-scraping tool to extract free texts from specific clinical trials and return information on inclusion and exclusion criteria. Note that some websites do not allow web-scraping and exclusions may apply to the clinicaltrials.gov website |
Evidence from meta-analysis | ||
Search strategy | Requires manual decisions on relevant search terms | Potential for mapping SNOMED-CT terms to MeSH descriptors used in PubMed (PMID:17238584) |
Identifying existing evidence from published sources and assessing eligibility | Perform searches on PubMed. Manual curation and review of publications. Does not scale | Semi-automated systematic reviews using machine learning and natural language processing for expedited evidence synthesis. For example, using 'bag of words' for classifying documents and using learned coefficients for predicting the probability of an unseen document. Examples of platforms for automating evidence synthesis include RobotReviewer and ExaCT, where the latter employs an information extraction engine that identifies and extracts text fragments that describe clinical trial characteristics on unseen articles (https://doi.org/10.1186/s13643-019-1074-9; https://doi.org/10.1186/1472-6947-10-56) |
Extracting data and performing the meta-analysis | Manual extraction of relevant tables and information. Not practical for batch extraction of data | Semi-automated tool for converting PDF documents to XML using a rule-based system such as PDFX. Batch extraction of data from PDF documents can also be performed using the open-source CyberPDF, which improves the accuracy and efficiency in batch data processing. Extracted data is formatted into data frames for subsequent meta-analysis using the meta package in R or other existing packages. (https://doi.org/10.1145/2494266.2494271; https://doi.org/10.1145/3278576.3281274) |
Evidence from target trial emulation | ||
Specifying the target trial protocol | This process requires a discussion between the clinician and informatician to determine the appropriate criteria, treatment strategies and outcomes | Previous insights on specifying the target trial protocol can be collated automatically and be used to inform future target trial designs |
Cohort creation based on eligibility criteria in the target trial protocol | Manual cohort creation for each target trial. Does not scale | This process can be pipelined using several functions to create cohorts in a consistent format with the covariates of interest. The DExtER tool for automated cohort creation can be employed |
Propensity score matching to match initiators and non-initiators | Once a cohort is created in the correct format containing all the covariates of interest, propensity score matching can be performed using the MatchIt package | Additional approaches for causal inference analyses, including causal machine learning using the targeted maximum likelihood estimation approach can be investigated and pipelined |
Descriptive summary of the cohort before and after matching | The tableone package can be used to generate the baseline tables before and after propensity score matching | Previous descriptive summaries on other related studies can be collated and featured in future target trials that investigate related clinical queries |
Cox regression on the matched cohort | Cox regression analyses is performed by fitting the coxph function using the survival package | Additional regression analyses can be automated into the pipeline |
Kaplan Meier analysis on the matched cohort | Survival or cumulative incidence curves are plotted using the survminer package | This can be pipelined to look at multiple outcomes at a time |
Scaling to other examples and datasets | Limited tractability | Pipeline scalable to other datasets for cohort generation. Free text input from the Informatics Consult request form and report will inform additional opportunities to scale to other clinical questions |
Genetic evidence | ||
Identifying genetic variants associated with the exposure (e.g., drug) or risk factor in genome-wide association studies (GWAS). Identifying GWAS summary data for genetic variants associated with the exposure and outcome for a two sample Mendelian randomisation analysis | Manual curation of GWAS summary data. Literature search for published genetic variants for the risk factor | The MR-base platform for Mendelian randomisation can be employed to rapidly identify instruments for the exposure and outcome using GWAS summary data from their catalog. Additional GWAS summary data can be obtained from PhenoScanner, EMBL-EBI GWAS catalog and Integrative Epidemiology Unit OpenGWAS database |
Performing Mendelian randomisation | Extract and format data identified above. Run Mendelian randomisation in R | MR-base also includes an analytical platform for performing MR analysis. For exposures and outcomes not available in MR-base, this process can be pipelined to transform the GWAS summary data from other public sources into an analysis-ready format. Mendelian randomisation can be performed using the MendelianRandomisation package |