Learning semi-supervised enrichment of longitudinal imaging-genetic data for improved prediction of cognitive decline

Background Alzheimer’s Disease (AD) is a progressive memory disorder that causes irreversible cognitive decline. Given that there is currently no cure, it is critical to detect AD in its early stage during the disease progression. Recently, many statistical learning methods have been presented to identify cognitive decline with temporal data, but few of these methods integrate heterogeneous phenotype and genetic information together to improve the accuracy of prediction. In addition, many of these models are often unable to handle incomplete temporal data; this often manifests itself in the removal of records to ensure consistency in the number of records across participants. Results To address these issues, in this work we propose a novel approach to integrate the genetic data and the longitudinal phenotype data to learn a fixed-length “enriched” biomarker representation derived from the temporal heterogeneous neuroimaging records. Armed with this enriched representation, as a fixed-length vector per participant, conventional machine learning models can be used to predict clinical outcomes associated with AD. Conclusion The proposed method shows improved prediction performance when applied to data derived from Alzheimer’s Disease Neruoimaging Initiative cohort. In addition, our approach can be easily interpreted to allow for the identification and validation of biomarkers associated with cognitive decline.


Background
Alzheimer's disease (AD) is a neurodegernative condition in which people suffer from the progressive deterioration of cognitive functions, such as memory, language, and judgment.The World Health Organization (WHO) predicts that AD will affect 75 million people by 2030 and 132 million people by 2050 [1].To address this major public health challenge, it is critical to detect AD at an early stage from both the therapeutic and research standpoints.Recent works [2,3] have analyzed the progression of AD through modeling and predicting clinical assessments.Furthermore, in the last decade [3], rich neuroimaging measurements, such as magnetic resonance imaging (MRI), have been widely used to predict the clinical outcomes associated with AD.
Despite these efforts, many existing approaches [3,4] suffer from the following limitations.First, because a lot of models routinely carry out the learning tasks at each time point of the AD progression separately, they cannot leveage the temporal relationships across the longitudinal records.
Given that AD is a progressive neurodegenerative disorder, multiple consecutive records should be analyzed together for keeping track of the disease progression.Therefore, it is ideal that modern statistical learning techniques can study the temporal variations in the records that are consistent with how we expect a progressive disease to behave.Second, temporal records are often missing at certain time points, which results in an inconsistent number of records per participant.This make it difficult to apply traditional statistical methods that work in the setting when the data at all time points provided.Third, current longitudinal methods [4,5] tend to focus on measurements derived from MRI scans, such as FreeSurfer (FS) and voxel-based morphometry (VBM), rather than genotype information, such as single-nucleotide polymorphisms (SNPs).It is known [6] that genetic factors can be strong predictors of future cognitive decline, therefore it is important to integrate longitudinal phenotype measurements with genetic data that remain constant when AD develops.Finally, the clinical outcomes of participants assessed from cognitive ability tests, such as Ray's Auditory Verbal Learning Test (RAVLT), are often provided in resources such as the Alzheimer's Disease Neuroimaging Initiative (ADNI), which can be used as data labels for better predicting a future AD diagnosis.Therefore it is of great interest to explore how to use such labeled data to learn data representations with improved predictive power.
In an attempt to overcome the first limitation and uncover the temporal structure of brain phenotypes, several longitudinal prediction models [7,8] have been proposed.However, these models represent the temporal imaging records as a tensor, which inevitably increase the complexity of the prediction problem and require that each participant has the same number of temporal observations.Since each participant must have the same number of observations, the user of these approaches must discard samples that have a number of records below a given threshold, which may potentially lose valuable information of the input data.Other approaches [9,10] have relied on imputation techniques to estimate the missing records.Yet these imputation methods may incur undesirable artifacts, which may introduce biases into the final predictions of the longitudinal models.
To handle the longitudinal multi-modal prediction problem with incomplete temporal neuroimaging records, in this work we propose a semi-supervised learning method to learn participant-specific projections to enrich the multi-modal phenotypic measurements, which is an extension of our earlier short conference paper [11].We analyze the consecutive imaging records simultaneously and learn a projection for each participant.To take advantage of temporal and modality relationships, we introduce trace-norm regularization over the concatenation of all participant-specific projections to maintain their global consistency.Furthermore, a structured sparsity-induced norm regularization is applied to learn the group-structured representations of genetic data, which are integrated with the enriched representations for imaging data.Finally, our model factorizes the enriched biomarker representations, available clinical scores, and genetic biomarkers of participants with the common participants representations.The aim of these factorizations is to extract the representations for a participant shared across the different modalities.As a result, the learned projections from imaging data are tightly coupled with the genetic modality and available clinical scores.Provided with the learned projections per-participant, we can transform the multi-modal representations extracted from phenotypes with varied data sizes and the measurements of the genetic biomarkers into an enriched biomarker representation with a fixed length.With a fixed-length vector per participant, we can freely make use of conventional machine learning models to predict clinical scores associated with AD.

Methods
In this section, first we will formalize the problem to learn a fixed-length biomarker representation for each participant.Then we will then gradually develop our learning objective.Finally, an efficient computational algorithm will be derived to solve our proposed objective.

Notations and problem formalization
Throughout this paper, we write matrices as bold uppercase letters and vectors as bold lowercase letters.The i-th row, the j-th column, and the element at i-th row and j-th column of the matrix M = m i j are denoted as m i , m j , m i j , or e T i M , Me j , e T i Me j , respectively, where we define e j as the j-th column of the identity matrix I .When p ≥ 1 , the ℓ p -norm of a vector v ∈ ℜ d is defined as The trace norm of M is defined as �M� * = min{n,m} i=1 σ i , where σ i is the i-th singular value of M.
Given a neuroimaging dataset, phenotypic measurements are usually described by the biomarkers extracted from brain scans.Mathematically, the medical records of the i-th participant in a studied cohort can be denoted as collects the available medical records of the i-th participant from the baseline (first time point) to the second last visit, such that the total number of the medical records of the i-th participant is n i + 1.We note that n i varies across the dataset due to inconsistent/missing temporal records of the participants.We use x i ∈ ℜ d to denote the last medical record of the i-th participant and use X = [x 1 , • • • , x n ] to sum- marize these records of all the participants in the studied cohort.Because multiple types of biomarkers, such as VBM and FS markers, can be extracted from the set of brain scans, we concatenate the vector representations of these biomarkers as the phenotypic assessment of a participant.For example, in our study we write , together with x i , describe the temporal changes of the phenotypes of the i-th participant over time, X i is a summarization of the dynamic measurements of the i-th participant, which is also broadly called as the longitudinal measurements in the literature of medical image computing [4,7,8,12,13].To make use of X i and x i together, we can use longitudinal enrichment to learn a fixed-length vector from them [14][15][16][17].Specically, we learn a projection tensor , by which we can compute the fixed-length biomarker representations x n ] ∈ ℜ r 1 ×n for the entire cohort, i.e., we project x i by W i for the i-th partici- pant by computing z i = W T i x i ∈ ℜ r 1 .A schematic illustra- tion of the projected (enriched) biomarker representations is shown in Fig. 1.
In addition to the phenotypic measurements used in a neuroimaging data set, genotypes of the same cohort may also be available, such as the SNP profiles of the participants, that can be represented by , where x SNP i is the vector representation of the SNP profile of the i-th participant.
Here we note that X SNP is static that does not vary over time when AD develops.
Besides the input phenotypic (dynamic) and genotypic (static) data, the outputs of the prediction tasks are cognitive status of the participants, which are usually assessed by the clinical scores of a set of cognitive tests.We use Y l ∈ ℜ c×l to list the clinical scores of the first l partici- pants at her or his last visit, where c is the total number of clinical scores in studied in our work.Here, without losing generality we consider the first l samples as the labeled data for training.Apparently, Y l can be used as the labeled data to enable us to learn the data representation with supervision, which could potentially improve the predictive power of the learned data representations.
In the following subsection, we will develop our learning objective gradually.

Our objective
We start by learning the representations of the static genetic data to utilize the group structures of SNPs [18,19].Recent developments in high-throughput genotyping techniques allow new methods to investigate the effect on brain structures and functions of genetic variation.Many previous association studies treated the SNPs as independent units and ignored underlying relationships between the units.However, multiple SNPs from the same gene are naturally related so that such SNPs often jointly perform the genetic functionalities together.To incorporate the group structures associated with SNPs, we propose to learn the representations of the genetic data of a studied cohort by minimizing the following objective: In Eq. ( 1), the first term factorizes X SNP into H 0 and G 0 , where H 0 can be seen as the compressed view of the SNP features [20] and G 0 describes the new representations of the n participants in the subspace spanned by H 0 [21].To find the group structure of SNPs, we leverage the linkage disequilibrium (LD) [22] which defines the non-random association between alleles at various loci.Then we capture the group-wise sparsity in H 0 by making use of the group ℓ 2,1 -norm ( G 2,1 -norm) regularization term , where con- sists of K groups derived from the LD correlations of the SNPs [18,19].Here we choose to use the ℓ 2,1 -norm dis- tances to improve the robustness of our model against outliers [23][24][25][26].
Next we study how to learn a vector representations with fixed length for every participant from their image data in varied sizes.While the genetic profiles of the participants remain constant over time, the functions and structures of the brains of the participants change as AD progresses.Therefore AD progression is characterized by the longitudinal imaging records extracted from the multiple brain scans that change over time.However, the longitudinal imaging records pose a critical challenge to build the predictive models, because different participants may take the brain scans at different time and the number of brain scans of different participants are not same in general.To deal with this difficulty and summarize the brain variations of every participant individually, we propose to learn a vector representation with the fixed (1) length from the image data of each participant {X i , x i } with the varied size n i by computing z i = W T i x i ∈ ℜ r 1 .First, to preserve as much dynamic information of X i as possible, we propose to learn the projection W i for the i-th participant by minimizing the following objective of the principal component analysis (PCA) [27]: Here again we use ℓ 2,1 -norm objective in the PCA to enhance the robustness of the learned projection W i against outlying samples which is unavoidable in the large dataset [23][24][25][26].
Second, besides using the projection learned from each individual participant separately, to maximize the consistency across all the learned projections for the same cohort, we enforce the low-rank consistencies onto the learned projection matrices by introducing two tracenorm regularization terms as following [7,13,17]: where d×n) are two unfolded matri- ces of the local projection tensor W.
Finally, equipped with the learned representations for imaging features in multiple modalities and genetic features, we integrate them together to explore the full potential of an imaging-genetic dataset.First, we write the temporally enriched representations for image data together as Z = [z 1 , . . ., z n ] = W T ⊗ X ∈ ℜ r 1 ×n .Follow- ing the same idea as before, we factorize Z and align the factorized data representation with that learned from the static genetic data G 0 by minizing the following objective: where γ 1 , γ 2 , • • • , γ 7 are hyperparameters of our learning model.Now we can perform the association studies between the clinical scores and the new data representations learned from our model.Suppose that the clinical scores of Y l ∈ ℜ c×l are obtained in c cognitive assessments for the l training samples, we use F = [F l , F u ] ∈ ℜ c×n to denote our estimated clinical scores and use the constraint F l = Y l to make use of the training data Y l , by which we can conduct the regression analyses by minimizing the following objective: (2) Our new method is schematically illustrated in Fig. 2.

The solution algorithm
Although our objective in Eq. ( 5) has clearly motivated, it is difficult to solve in general, because it is non-smooth.Thus in this subsection we drive an efficient solution to optimize our objective.Using the optimization framework presented in the earlier work [28,29] that proposed the iterative reweighted method to solve non-smooth objectives, we can solve Eq. ( 5) by an iterative procedure (Algorithm 1 in [28]) in which the key step is to minimize the following objective: (5)  and d k denotes the number of rows (the number of SNPs) of k-th block of The dimensions of the matrices in Eq. ( 7) are: To minimize the smoothed objective Eq. ( 6), we use the Alternating Direction Method of Multipliers (ADMM), which is proposed in [30,31].By introducing two more constraints A = U and B (2 decouple the U and W , we rewrite Eq. ( 6) with the following equivalent objective: where 1 , 2,i , 3 , 4,i are the Lagrangian multipliers for the constraints F l = Y l , W T i W i = I, A = U, and B i = W i .The detailed algorithm to minimize Eq. ( 8) is presented in Algorithm 1.In Algorithm 1, we use solution of Sylvester (7) (8) Algorithm 1 Solve minimization problem in Eq. ( 8)

Results
In this section, we introduce our experimental results about the clinical scores prediction task with the enriched biomarker representation and original biomarker representation to evaluate changes in the prediction performance from enrichment.Then we analyze the AD risk factors identified by the learned projections.

Data preparation
We obtain the data used in our experiments from the ADNI database.We downloaded the MRI scans, SNP genotypes, and the longitudinal scores of Rey's Auditory Verbal Learning Test (RAVLT) of 821 ADNI-1 participants.We perform voxel-based morphometry (VBM) and FreeSurfer automated parcellation on the MRI data as described by [12] and extract mean modulated gray matter (GM) measures for 90 target regions of interest (ROI).We follow SNP quality control steps discussed in [32].Among 821 ADNI-1 participants, 412 participants are selected on the basis of existence of MRI records at Month 0/Month 6/Month 12/Month 24.Then we intentionally discard Month 24 scans with 50% probability to evaluate the learning capability of our model from longitudinal data with missing record.Our model learns the enrichment with the neuroimaging records from baseline to the second last visit, and project the last record (Month 12 or Month 24 with 50% probability) to predict the clinical scores at the last time point.

Experimental settings
In our experiments, we aim to predict RAVLT clinical scores in the test set using two types of the inputs -the learned enriched representation and original representation of the most recent biomarkers.We use the different concatenations of SNPs, FS, and VBM modalities to assess the prediction performance of our model with diverse modalities.We split the dataset into a training and test set with a proportion of 80% and 20% each, therefore the number of participants is l = 323 in the training set and n − l = 89 in the test set.The SNPs and MRI images of all n participants and clinical scores of only the l participants in training set are provided for our model to learn enriched representation.To predict the n − l clinical scores in test set, we use the following conventional prediction models: Ridge linear Regression (RR), Convolutional Neural Network (CNN), and Support Vector Regression (SVR) which is the regression version of Support Vector Machine.We conduct a 5-fold cross-validation to search the set of best hyperparameters for each conventional model.We conduct a 5-fold cross-validation to search the set of best hyperparameters for each conventional model.However the naive grid search can be time consuming especially when the combination of many hyperparameters is tuned.Instead of trying all the combinations of hyperparameters, we randomly choose the value from the grid of each hyperparameter.In order to increase the possibility of finding the better hyperparameters in the fewer searches, a randomly selected half of hyperparameters remain the best values found in the previous searches.In the 5-fold cross-validation, we search the best regularization parameter of RR in {10 3 , 10 2 , 10, 1, 10 −1 , 10 −2 , 10 −3 } .For SVR, we fine tune the kernel function among sigmoid and radial basis function and box constraints in {10 3 , 10 2 , 10, 1, 10 −1 , 10 −2 , 10 −3 } .We construct a 1-dimensional CNN configured as follows: (1) a convolutional layer with a window size of 5 × 16 (width × depth), followed by a rectified linear unit (ReLU) and a max pooling layer with a window size of 1 × 2; (2) a convolution layer with a window size of 10 × 32, followed by a ReLU and a max pooling layer with a window size of 1 × 2; (3) three fully connected layers where the number of nodes and dropout rate for each layer are fine tuned by searching the grid of {20, 60, 120} and {0.3, 0.5, 0.7} each.The hyper- parameters of our enrichment model are tuned as following:

Original vs. enriched representation
In the experimental result reported in Fig. 3, we compute the Root Mean Squared Error (RMSE) between the ground truth clinical scores and predicted clinical scores from both the original and enriched representations.The result reveals that the prediction from enriched representation are mostly more accurate (9.84% in average) than the predictions supplied from the most recent record of original representation.Interestingly, among the various concatenation of modalities of biomarker measurements, the performance improvements of our enriched representation is larger when the many modalities are given.This indicates that our model fully utilizes the multimodal dynamic data.Especially, the error gap is smallest when only SNPs are given.We suppose that this is because SNPs are static data which do not change along the time, genetic data is not able to provide the enough information about temporal variations of cognitive decline, while our model is designed to learn the temporal variations when dynamic data is given.

Identification of disease-relevant biomarkers
In addition to the cognitive outcomes prediction task, we identify AD relevant biomarkers using the weights of the learned projections.Since p-th feature in the enriched representation e T p W T i x i is weighted summation of the original biomarkers measurements, the weights summation in q-th row of the projection n i=1 e T q W i 1 can be interpreted as AD relevance on the q-th biomarker.

Identified Neuroimaging Biomarkers
In this aspect, we first identify the AD relevant imaging biomarkers, by plotting the weights of ROIs of VBM and FS in Fig. 4.These brain regions all appear in the medical literature associated with AD-related dimentia.For example, participants with cognitive decline showed atrophy of the caudate nucleus [33].The volume of thalamus was significantly reduced [34] in participants diagnosed as probable AD.Furthermore, it has been found that the thalamus region plays an important role in generating attention, and anterior thalamus is in charge of declarative memory functioning [35].The hippocampus is vulnerable to be damaged from AD [36] and has been shown to affect long-term memory and spatial navigation in participants with AD.Finally, the amygdala region, also identified by our approach, is also severely affected by AD [37] and is associated with emotional response and decision-making.
Identified Genotypic Biomarkers In addition, we identify AD relevant SNPs by plotting the weights on individual SNPs and AlzGene group of SNPs in Figs. 5 and  6.In Fig. 6, we download and use the AlzGene grouping information constructed by multiple genome-wide association studies listed on (http:// www.alzge ne.org/) [38].The standard deviation of weights of SNPs in each AlzGene group is displayed as line length at the head of each bar.The top identified individual SNP, rs10779339, in Fig. 5 has been found to be related to cognitive decline [39].Among the AlzGene groups in Fig. 6 the SNPs in the ACE (angiotensin-converting enzyme) group have been found to reduce the Amyloid Beta peptide (Aβ ) which is commonly observed in the progression of AD-related cognitive decline [40].Furthermore, the APOE (apolipoprotein E) gene, also identified by our approach is also involved in A β aggregation and clearance [41].
In summary, the complex relationships between cognitive ability and identified biomarkers are clearly identified by our method and are well represented in previous AD research studies.This result supports the utility of our approach as a tool to discover and validate AD risk factors from multi-modal data.

Discussion
Because our enrichment model incorporates the genetic and phenotypic biomarkers, it is possible to learn the enrichment from the different multi-modal data to predict the different target labels.From the experimental  6 results in Fig. 3, the prediction performance is further improved as the many modalities of data are given.For example, Diffusion Tensor Imaging (DTI) is an effective tool to investigate the white matter organization of the brain and AD progression [42].Compared to MRI, blood based biomarkers can be measured less costly and intrusive and A β levels in blood can aid in the early diag- nosis of AD [43].We can learn the projections for these additional modalities, and concatenate the projections in Eq. ( 2) and Eq.(5).While considering the complexity of Algorithm 1 proportional to d 3 and the increasing dimensionality d with the many modalities of measurements, our model can be flexibly applied to the diverse datasets and prediction tasks.

Conclusion
Missing data is a major issue in longitudinal multi-modal healthcare datasets.This research aims to devise a novel methodology to learn a consistent length representation for all the participants in the ADNI dataset.The learned biomarker representation summarizes the genetic biomarkers and their group structure, known clinical scores, and all the available records of longitudinal biomarkers on a per-participant basis.Our experiments show that the learned enriched representation outperforms the original measurement in predicting the clinical scores.Finally, the identified AD relevant biomarkers are in nice accordance with existing research findings, indicating the utility of our approach.

Appendix
In this section, we show how Algorithm 1 is derived from our smoothed objective Eq. ( 6).ADMM solves convex optimization problems by breaking them into smaller pieces that are easier to handle.Specifically, given the following objective with the equality constraint: Algorithm 2 solves the problem in Eq. ( 9) by decoupling it into subproblems and optimizing each variable while fixing others, where y is the Lagrangian multiplier to the constraint h.We extend Algorithm 2 of two variables to Algorithm 1 of the set of variables {U, F, W, H 0 , H 1 , G 0 , G 1 , A, B (2) } .To be specific, we earn a update equation respect to each variable in Algorithm 1, by taking the derivation over a each variable and set it to zero matrix, while fixing other variables.For the step 2 in Algorithm 1, the derivative is: For the step 3, the derivative is: (9) Fig.For the step 4, the derivative is: For the step 5, the derivative is: For the step 6, the derivative is: For the step 7 ( 1 ≤ p ≤ c ), the derivative is: For the step 8 ( 1 ≤ p ≤ l ), the derivative is: For the step 9 ( l + 1 ≤ p ≤ n ), the derivative is: For the step 10, the derivative is: (11) ∂J For the step 11 ( 1 ≤ p ≤ r 1 ), the derivative is: By setting the derivative of each step to zero matrix, the update equations in Algorithm 1 can be obtained.

Fig. 1
Fig. 1 Illustration of original and enriched biomarker representations.The goal of the enrichment model is to learn the set of projections W and project the last record.As a result, the dimensionality of enriched representation r 1 is much smaller than the dimensionality of original representation d

Fig. 2
Fig.2Overview of proposed semi-supervised learning framework to fully utilize the potential of a longitudinal AD dataset.We use factorization to extract the common representations of participants shared across genetic, image, and clinical scores data.As a result, the genetic and clinical scores data can be reflected in the learned projections W

F,
equation, such that sylvester(P, Q, R) gives an unique and exact solution for X of equation PX + XQ = R .The time complexity of Algorithm 1 is O(nr 1 d 2 (d + r 1 )) for each iter- ation where the step 11 is the most dominant.The detailed derivation of Algorithm 1 is provided in the Appendix.

Fig. 3
Fig. 3 Comparison on prediction errors from original (blue) or enriched (orange) representation.The percentage % next to model name indicates the decreased amount in errors of predictions from the enriched representation.We also plot the standard deviation from 5-fold cross-validation at the head of each bar

Fig. 4 Fig. 5
Fig. 4 Visualization of weights distribution over the brain regions.The darker color indicates the larger weight on that region.The top four AD relevant regions are identified in FS: Right Caudate, Brodmann area 24, Left Thalamus, and Left Caudate, in VBM: Left Hippocampus, Left Amygdala, Left Thalamus, and Right Medial Orbito-frontal Cortex sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org).The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California.ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.We thank Dr. Heng Huang in the Department of Electrical and Computer Engineering of University of Pittsburgh and Dr. Li Shen in the Department of Biostatistics, Epidemiology and Informatics of University of Pennsylvania for providing with us the parsed ADNI data which are used in the experiments of this study.About this supplement This article has been published as part of Selected articles from the 19th Asia Pacific Bioinformatics Conference (APBC 2021): Medical Informatics and Decision Making, Volume 24 Supplement S1, 2024: Selected articles from the 19th Asia Pacific Bioinformatics Conference (APBC 2021): Medical Informatics and Decision Making.The full contents of the supplement are available at URL.

6
Weights on AlzGene groups.The number next to the AlzGene group name denotes the number of SNPs in that group Seo et al.BMC Medical Informatics and Decision Making (2024) 24:61 Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH12-2-0012).ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada.Private ,i − µ 4,i B i + µ 4,i W i − � 4,i e i + 2γ 2 d p 3 x i (x T i w i,p − e T p H 1 G 1 e i ).