Skip to main content

Table 1 Traditional versus automated approaches for evidence synthesis

From: An informatics consult approach for generating clinical evidence for treatment decisions

Task

Traditional approach

Approaches to automate

Evidence from randomised trials

Search for recruiting trials

Perform search on the clinicaltrials.gov website. Requires manual decisions on relevant search terms

Perform search on the clinicaltrials.gov website using search terms collated from free text input in the Informatics Consult platform. Potential to leverage developments on computable machine-readable trial protocols (https://doi.org/10.1177/009286150704100312) and computable phenotypes (i.e., algorithms to identify clinical characteristics derived from electronic health records) to identify potentially eligible patients for trial recruitment (https://doi.org/10.1161/CIRCOUTCOMES.119.006292). Potential to use sentence embedding and Google BERT as approaches for matching natural language queries with relevant trial protocols

Summarise data of recruiting trials

Download search results from the clinicaltrials.gov website. Manually format tables. Extract additional information not present in downloaded data from the website by inputting NCT numbers

Download search results from the clinicaltrials.gov website. Generate scripts for automated table formatting to retain relevant information. Create a Python web-scraping tool to extract free texts from specific clinical trials and return information on inclusion and exclusion criteria. Note that some websites do not allow web-scraping and exclusions may apply to the clinicaltrials.gov website

Evidence from meta-analysis

Search strategy

Requires manual decisions on relevant search terms

Potential for mapping SNOMED-CT terms to MeSH descriptors used in PubMed (PMID:17238584)

Identifying existing evidence from published sources and assessing eligibility

Perform searches on PubMed. Manual curation and review of publications. Does not scale

Semi-automated systematic reviews using machine learning and natural language processing for expedited evidence synthesis. For example, using 'bag of words' for classifying documents and using learned coefficients for predicting the probability of an unseen document. Examples of platforms for automating evidence synthesis include RobotReviewer and ExaCT, where the latter employs an information extraction engine that identifies and extracts text fragments that describe clinical trial characteristics on unseen articles (https://doi.org/10.1186/s13643-019-1074-9; https://doi.org/10.1186/1472-6947-10-56)

Extracting data and performing the meta-analysis

Manual extraction of relevant tables and information. Not practical for batch extraction of data

Semi-automated tool for converting PDF documents to XML using a rule-based system such as PDFX. Batch extraction of data from PDF documents can also be performed using the open-source CyberPDF, which improves the accuracy and efficiency in batch data processing. Extracted data is formatted into data frames for subsequent meta-analysis using the meta package in R or other existing packages. (https://doi.org/10.1145/2494266.2494271; https://doi.org/10.1145/3278576.3281274)

Evidence from target trial emulation

Specifying the target trial protocol

This process requires a discussion between the clinician and informatician to determine the appropriate criteria, treatment strategies and outcomes

Previous insights on specifying the target trial protocol can be collated automatically and be used to inform future target trial designs

Cohort creation based on eligibility criteria in the target trial protocol

Manual cohort creation for each target trial. Does not scale

This process can be pipelined using several functions to create cohorts in a consistent format with the covariates of interest. The DExtER tool for automated cohort creation can be employed

Propensity score matching to match initiators and non-initiators

Once a cohort is created in the correct format containing all the covariates of interest, propensity score matching can be performed using the MatchIt package

Additional approaches for causal inference analyses, including causal machine learning using the targeted maximum likelihood estimation approach can be investigated and pipelined

Descriptive summary of the cohort before and after matching

The tableone package can be used to generate the baseline tables before and after propensity score matching

Previous descriptive summaries on other related studies can be collated and featured in future target trials that investigate related clinical queries

Cox regression on the matched cohort

Cox regression analyses is performed by fitting the coxph function using the survival package

Additional regression analyses can be automated into the pipeline

Kaplan Meier analysis on the matched cohort

Survival or cumulative incidence curves are plotted using the survminer package

This can be pipelined to look at multiple outcomes at a time

Scaling to other examples and datasets

Limited tractability

Pipeline scalable to other datasets for cohort generation. Free text input from the Informatics Consult request form and report will inform additional opportunities to scale to other clinical questions

Genetic evidence

Identifying genetic variants associated with the exposure (e.g., drug) or risk factor in genome-wide association studies (GWAS). Identifying GWAS summary data for genetic variants associated with the exposure and outcome for a two sample Mendelian randomisation analysis

Manual curation of GWAS summary data. Literature search for published genetic variants for the risk factor

The MR-base platform for Mendelian randomisation can be employed to rapidly identify instruments for the exposure and outcome using GWAS summary data from their catalog. Additional GWAS summary data can be obtained from PhenoScanner, EMBL-EBI GWAS catalog and Integrative Epidemiology Unit OpenGWAS database

Performing Mendelian randomisation

Extract and format data identified above. Run Mendelian randomisation in R

MR-base also includes an analytical platform for performing MR analysis. For exposures and outcomes not available in MR-base, this process can be pipelined to transform the GWAS summary data from other public sources into an analysis-ready format. Mendelian randomisation can be performed using the MendelianRandomisation package