Redefining precision care
Individualized treatment, e.g. tailored pharmacotherapy, is not the sole component of precision medicine. From a utilitarian point of view, it may be useful to break down precision medicine by its components across the continuum of care (Fig. 1a), to be met under specific time constraints:
-
Disease prevention, or prediction of disease risk before the disease symptoms manifest,
-
Differential diagnosis, or timely/instantaneous identification of an illness, and
-
Disease treatment, i.e. strategies to cure or optimally treat once disease has been identified.
These components reflect the move from a focus on treatment only in health care to include also prevention, as well as prognosis, and post-disease survivorship, as critical aspects of medicine and health in general.
In the context of disease prevention, the goal of precision medicine is accurate prediction of disease risk –even years in advance. For instance, many risk factors associated with lung cancer are well understood, and even though behavior change is difficult, the timeframe is sufficient to intervene in order to decrease risks. For all diseases with potentially severe health outcomes, whose etiology is not entirely understood, prediction modelling should be used to identify disease markers as early as possible. However, prediction models are only useful if they can include risk factors that are modifiable –such as dietary habits and lifestyle (because genes, age, race are not). Such models would allow changing the odds of the disease onset, and possibly with enough time for an intervention. In many modelling approaches, as we will see in the next sections, the value of the data from the modifiability point of view is not taken in the account, as well as from a ‘comprehensibility’ point of view (i.e. understanding the mechanics of the underlying biological processes by decomposing the model functions).
With differential diagnosis, the timeframe is reduced to a matter of days or even hours. Acute abdominal pain can have very different etiology, ranging from aortic aneurysm to kidney stones, or peritonitis. Mistaking a chronic condition, e.g. Meniere’s, can severely affect quality of life [13]. Besides genetic markers, in this case we can think of high-sensitivity tests with rapid turnaround time, like metagenomics sequencing to screen for multiple pathogens. Many diseases and diagnoses are still not clearly defined: “Descriptions of disease phenotypes often fail to capture the diverse manifestations of common diseases or to define subclasses of those diseases that predict the outcome or response to treatment. Phenotype descriptions are typically sloppy or imprecise” [14, 15]. For instance, asthma is an umbrella disease, with possibly different underlying endotypes [16]. Rheumatoid arthritis and its related gamut of symptoms, or other types of autoimmune conditions, as well as Alzheimer’s disease are considered system-level illnesses [17]. Therefore, one additional constituent of precision medicine is the notion of ‘precise phenotyping’ (Fig. 1b).
The last point, i.e. disease treatment, is the last stand against unfavorable health outcomes. It necessarily builds upon the former two and moves forward by adding more complexity, i.e. the space of treatments. Different outcomes can be obtained by running prediction models that set a patient’s status (e.g. genes, metabolic profile, drug exposures) and vary treatments (e.g. new drugs, diet, physical activity). Treatment optimization, seen as an operational research problem, explores this outcome prediction space looking for the most favorable ones.
Genetic epidemiology: the big short
The widespread availability of sequencing methods along with a drastic reduction of their associated costs were largely responsible for the rise and evolution of precision medicine. Today, genome-wide sequencing can cost about $1000, down from close to $98 M in 2001. Moreover, several commercial companies are offering services that provide partial genome sequencing for a little over $100, along with mapping to demographic traits and specific disease conditions (yet with unproven clinical utility), which isn’t without raising concerns related to privacy of health information [18]. Although a genome stays relatively immutable during a lifetime, a genomic screening obtained at birth or before birth will be optimal for the most accurate disease prediction [19]. Despite the initial enthusiasm for genetics-focused precision medicine, the results have been underwhelming and have not delivered on its promises so far. The predictive ability and ensuing clinical utility of risk assessment from genetic variations has been found to be modest for many diseases [20, 21], and genome-wide association studies (GWAS) have not led to understanding genetic mechanisms underlying the development of many diseases [22].
Among the shortcomings of GWAS, one is the missing heritability problem –heritability is a measure of the proportion of phenotypic variation between people explained by genetic variation– for which single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes.
Another limitation of GWAS relates to studying a single phenotype or outcome (often imprecise, as we pointed out in the previous section), and accounting for heterogeneous phenotypes would require studies massive in size [23]. Other GWAS issues include design, power, failure of replication, and statistical limitations. In practice, only univariate and multivariable linear regression is performed. Looking at gene-gene interactions, and including other variables rather than basic demographics or clinical traits is rarely done and often computationally burdensome.
There are very few examples of high-effects common genetic variants influencing highly-prevalent diseases, and common genetic variants usually have low predictive ability. The rarer a genetic variant, the harder it is to power a study and ascertain the effect size. There are rare high-effect alleles causing Mendelian diseases and a glut of low-frequency variants with intermediate effects. Low-effect rare variants are very difficult to find and may be clinically irrelevant, unless implicated in more complex pathways [24]. In fact, it is known that gene expression pairs can jointly correlate with a disease phenotype, and higher-order interactions likely play a role too [25,26,27]. Few algorithms have been proposed to seek jointly-expressed genes, and existing methods are computationally inefficient [28].
However, these issues are only partially responsible for precision medicine not yet meeting its original promises. Indeed, for precision medicine and precision public health models to be valid and effective, incorporating and testing factors beyond genetics is key. While genetics remains mostly static over time, other health-related factors are constantly changing and need to be evaluated periodically. Epigenetics, e.g. methylation data, which has a time component, can contribute to a relevant portion of unexplained heritability [29]. Cheaper and faster production of sequence data with next-generation sequencing technologies has opened the post-GWAS era, allowing for a whole new world of –omics [30, 31].
Domain-wide association studies
The GWAS revolution, and arguably saturation, has brought a surfeit of epigenome-wide –methylation-wide, transcriptome-wide– [32], microbiome-wide [33], and environment-wide association studies [34]. Interestingly, phenome-wide association studies reverse the canon, as all health conditions found in medical histories are used as variables and associated to single genetic traits [35].
Genomics, transcriptomics, metabolomics, and all other –omics can be seen as input domains to a prediction model. Merging of two or more domain-wide association studies is the next step toward a better characterization of disease mechanics and risks [36].
However, modelling and computational challenges arise with multi-domain integration, because of increased dimensions, variable heterogeneity, confounding, and causality. Formalizations of cross-domain-wide association studies, under the general umbrella term of multiomics, have been proposed [37,38,39]. Despite the cheaper and faster production of sequence data, most of the multiomics studies are limited by small samples: in general, the more heterogeneous the experimental data to be generated or the data sources to be included in the study are, the more difficult it is to get larger sample size.
The most interesting utility of multiomics, rather than prediction of health outcomes, is their ‘unsupervised’ analysis, i.e. the identification of patterns/endotypes that can help unveiling biological pathways, and eventually redefine disease spectra and phenotypes. However, there is mounting evidence that to ensure that precision medicine and precision public health deliver on their promises across the care continuum, we need to go beyond the –omics.
Beyond traditional domains
In the era of precision medicine, multi-domain studies need to extend beyond ‘omics’ data and consider other domains in a person’s life. Specifically, genetic, behavioral, social, environmental, and clinical domains of life are thought to be the five domains that influence health [40]. Further, the ubiquitous nature of Internet access and the widespread availability and use of smartphone technologies suggest that the clinical domain can be significantly enhanced with patient-generated data, such as physical activity data, dietary intake, blood glucose, blood pressure, and other similar variables that can be seamlessly collected using smartphones and wearable devices [41].
Moreover, such tools, combined with social networks platforms provide a window into the behavioral and social domains of health, data-rich environments that need to be considered in the context of precision medicine and precision public health, to create a ‘digital phenotype’ of disease [42]. For instance, images from Instagram have been used to ascertain dietary habits [43] in lieu of a food diary or dietary intake questionnaires, which can be inaccurate, and are cumbersome and time-consuming; Instagram, again, has been used to identify predictive markers of depression [44]. The passively collected data from Twitter can be used for insomnia types characterization and prediction [42]. Research into the environmental domain has shown that the environment in which we live in impacts our health and mortality [45]. However, research using non-traditional health-related data from these domains have been conducted with some success as well as with some controversy [46, 47].
The health avatar
As precision medicine fundamentally reduces to fine-grained, individual-centric data mining, the observational unit of such data, and pivot for domain linkage, can be defined with the theoretical model of the health avatar. The health avatar is a virtual representation of a person with all their associated health information (Fig. 2), and intelligent ways to manage and predict their future health status [48].
Even with the widespread use of electronic health records (EHR) and integrated data repositories, individuals are generally detached from their health information and opportunities to be actively involved in research remain limited, despite initiatives such as Apple’s HealthKit. Well-known barriers to linking and efficiently exploiting health information across different sites slow down healthcare research and the development of individualized care. Further, EHR are not translationally integrated with diagnostic or treatment optimization tools. A doctor can get and transfer lab results online, but then diagnoses are often made in a traditional manner, based on average population data.
The personal health record (PHR) is a collation of all health information from different healthcare providers or other sources that is stored in the cloud, a directly accessible property of the individual [49, 50]. The PHR is complementary to the EHR, which is usually stored at the provider level, with vendors’ software, such as Epic [51] or Cerner [52]. However, the health avatar should not simplistically be identified with the PHR, as the PHR is inherently passive, with little involvement from the patient. We now propose a model of what the modern health avatar should be, in an era of large patient-generated data sets. An individual can see their health information using a provider’s PHR, but cannot easily merge the information with data from other providers nor ask a provider to upload their data from the EHR to the PHR simply during a doctor’s visit, e.g. via a smartphone app. An intelligent algorithm that matches people to research studies based on their full medical history does not exist yet. Both doctors and patients who are interested in computer-aided diagnosis, usually have to upload information to a third-party service (e.g. to analyze susceptibility to antibiotics). Finally, data shares are cumbersome, not only in terms of steps required to respect ethical principles, practice, and to protect human subjects, which are necessary but could be modernized, but also because the only data considered reliable are those coming from EHR. This means that big data shares happen solely at the population level via institutional or corporations’ liaise. Research and analytics that follow are not streamlined; the long-awaited research objects –semantically rich aggregations of resources that bring together data, methods and people in scientific investigations– are still in their infancy [53]. Integration of different types and sources of data should retain original context and meaning while meaningfully mapping their relationships to other health-related variables; such semantic integration will need to be flexible and comprehensive.
Physical data integration of EHRs requires enormous efforts and resources, but currently is the most successful approach to health information linkage because it is supported by rigorous governance standards and solid infrastructure. Efforts like the national patient-centered clinical research network [54] is a prominent example. Data sharing for matching research participants, one of the long-awaited prerogatives of NIH, is finally being exploited, via ResearchMatch [55].
The health avatar should link all and new types of health-related data, from genomics, to the myriad of -omics, mobile and wearable technology-based, and environmental sources. These data capture information from other domains which impact health far greater than clinical care alone. Such integrations have already begun around the world with healthcare systems like Geisinger conducting genetic sequencing and returning some of the results to the patients and with initiatives such as electronic Medical Records and Genomics (eMERGE) and Implementing Genomics in Practice (IGNITE) networks [56]. However, these efforts have been limited to genomics. More generally, the health avatar should be able to connect with and exploit non-EHR information potentially useful for health assessment, even coming from highly unstructured sources, such as social media. This is exemplified recently with Epic partnering with Apple to allow Apple’s HealthKit to display patient’s EHR data. Epic’s App Orchard also allows the collection of wearable technology data and storage into the EHR. For instance, an artificial intelligence tool could process images from Instagram and Facebook/Twitter posts to ascertain dietary habits, and this information can then be used to populate a food questionnaire, encoded into some type of structured information and stored in the EHR. Moving out of the individual level, environment-level information pertinent to the individual –for instance through residence ascertainment or mobile geolocation– could also populate EHR fields, storing information such as exposure to allergens and pollutants.
However, the collation of non-standard data, e.g. momentary ecological assessment via Twitter, Facebook, or smartphone GPS monitoring, is prone to serious privacy and security concerns. Ubiquitous approaches must be foreseen, as in the Internet of things [57, 58]. Data integration, and even more data share, must be secure to meet popular support. In this sense, the research in differential privacy aims at developing new algorithms not only to protect identities, but also to generate masked or synthetic data that can be shared publicly and freely used for preliminary research [59,60,61]. While differential privacy has facilitated data sharing, it remains challenging to safely anonymize data while preserving all their multivariate statistical properties [62]. The individual-centric approach of the health avatar can facilitate the match of individuals with research programs, with blurred boundaries between clinical care and research, while respecting ethics but modernizing informed consent concepts.
In terms of active features, i.e. not only data storage, the health avatar would feature linkage to personalized predictive tools for health status. Within the context of appropriate ethics bylaws and informed consents, health avatars could directly feed individual-level health information to multiple research projects for creating new and more accurate precision medicine tools. This would require data privacy and protection measures to avoid identity or data theft and misuse. Further, wide access to patient-generated data, along with integration with clinical and health databases provide a unique opportunity to expand precision medicine to the population level. We discuss this specific expansion in the following section.
Precision public health
The Director of Office of Public Health Genomics at the Centers for Diseases Control and Prevention (CDC) defined ‘precision’ in the context of public health as “improving the ability to prevent disease, promote health, and reduce health disparities in populations by: 1) applying emerging methods and technologies for measuring disease, pathogens, exposures, behaviors, and susceptibility in populations; 2) developing policies and targeted implementation programs to improve health” [63]. Top priorities included: early detection of outbreaks, modernizing surveillance, and targeted health interventions. To achieve such improvements, comprehensive and real-time data to learn from are necessary. Epidemiology must expand surveillance on to multiple, different information domains, such as the Internet and social media, e.g. infodemiology [64]. Big data does not only mean large sample size or fine-grained sampling, but also large variety of variables. So far, the big data emphasis is on sequencing genomes population-wide [65, 66], but research has started to consider other domains, such as in integrating classical surveillance with geospatial modelling [67].
In yet another step-by-step guide to precision public health –focused on developing countries– better surveillance data, better data analyses, and rapid actions are urged: again, big data is the key, with emphasis on public data sharing, and on the data attributes of speed-accuracy-equity. Notably, this is an epidemiological projection of the canonical big data characteristics, known as the Vs [68, 69].
Winston Churchill famously stated that “healthy citizens are the greatest asset any country can have”, and to achieve health for all citizens, there needs to be a transition from precision medicine, which is individualized, to precision public health. In fact, precision medicine can be used to improve an individual’s health, but this does not necessarily translate into a uniform benefit for the population [70]. For instance, a precision medicine model tuned for majority of a population may improve the average health outcomes overall yet neglect minorities. To some extent, the term precision put next to population-wise priorities seems conflictual, and this may be due to application of single precision public health model to an entire population, rather than use of multiple segmented/cluster level models. Another fuzzy aspect of current precision public health approach is the lack of consensus on observational units used for inference or intervention [71]. Is it the individual? Is it a common geographic area? Is it a particular subpopulation? A theory-based approach in this sense would be useful, as we will show in the next section.
Precision public health has to face societal challenges, including racial disparities (both in terms of welfare and genetic background), environmental niches (e.g. tropical climate with higher rates of arboviral diseases, industrial areas with high pollution), and general ethical concerns (religious beliefs, political views). An individual-centric model such as the health avatar here poses a number of limitations, because it may lack of higher-level dynamics happening at the societal-environmental level (Fig. 3). Interestingly, such dynamics can also influence the individual itself, and therefore should be accounted for and projected on to the person-centric models.