Skip to main content

Table 5 Grouping of predictors from the studies

From: Prediction and diagnosis of depression using machine learning with electronic health records data: a systematic review

Predictor group

Commentary

Comorbidities

Comorbidities were included in 13 studies. They included long-term conditions, such as diabetes, asthma, epilepsy, and chronic pain. These were commonly used, especially when the study authors highlighted theoretical links with depression

Demographic

Demographic predictors were used in 16 studies. On some occasions, specific demographic variables were excluded due to insufficient availability/coverage (often the case for ethnicity). Gender was included as a predictor and occasionally also as a means of creating gender-specific models (e.g., Nichols et al. [59]). Social deprivation was also used as a predictor, and information about missed immunization(s) was used in two studies, Nemesure et al. [58] and Nichols et al. [59], as a proxy for social deprivation

The age range of cases was often an integral part of the study’s specific aims. Age being treated either as a numeric or to break up the study population into subgroups. Some studies specifically focussed on older patients. For instance, Sau and Bhakta [62] used data with an average age of 68.5 years (standard deviation 4.85 years), whereas Nichols et al. [59] focused on early diagnosis among young people, between 15 to 24 years of age. Some studies narrowed the analysis to a narrow age bracket, others included a wide range of ages. For example, Hochman et al. [51], who studied postpartum depression reported an average age of 29.4 years (standard deviation, 5.4) whereas Xu et al. [65] used data from participants whose age ranged from 18 to over 65

Family History

Family history was used in five studies and included family history of abuse (physical/sexual) and drug/substance abuse, often because the study authors cited theoretical links with depression. This group of predictors was often under recorded, as reported in the Nichols et al. [59] study where family history data was removed from the model due to low prevalence (< 0.02%) in their data. Insufficient family history data was also highlighted as a limitation in other studies [53, 55]

Obstetric specific

Obstetric specific were used in five studies focussed on the prediction of postpartum depression, and these included predictors such as premature birth, use of specific drugs during pregnancy and obesity. This type of predictor was also used in non-postpartum depression studies e.g., Abar et al. [49]

Psychiatric symptoms or other diagnoses

Psychiatric symptoms/diagnoses were used in fifteen studies. These include both depression related symptoms such as: anxiety, low mood, self-harm, sleeping and eating disorders, too little sleep etc. They also include the broader range of conditions including post-traumatic stress syndrome, obsessive compulsive disorder, personality disorders and psychoses. Within individual studies there may/may not be a distinction made between these two subgroups

Smoking

Smoking was used in seven studies. However, it was identified, for instance by Nichols et al. [59], that data may be incomplete for all participants and that this might impact the ability to reliably assess correlations with depression, to mitigate this they used “missing smoker” data as a separate predictor. This was a categorical predictor in the selected studies

Social/family

Social and family related factors were used in seven studies these included bereavement, divorce, single parent, police or social services involvement and similar

Somatic

Somatic conditions were used in 14 studies these include physical conditions such as, abdominal pain, back pain, dyspepsia, eczema, headaches, and others

Substance/alcohol abuse

Alcohol/substance abuse was used in seven studies, participants identified as having drug/alcohol abuse problems. Typically categorical, but some studies included levels of abuse and/or combinations of the two

Visit frequency

Visit frequency was used in six studies and shown to be a significant contributor to model performance. This is an integer variable based on number of visits in a specified period to the primary care facility (e.g., NHS GP)

Word list/text

Word list/text derived data was used in only one study, Geraci et al. [50], this was a source of data that was then analysed, using natural language processing, to extract predictors from clinical notes. It is based on language/defined terms specific

Other measurements and predictors

Other measurements and predictors were used in 11 studies and included, e.g., measurements of physical characteristics such as blood pressure, cholesterol, results of assays, and height/weight

  1. Note: There may be overlap or gaps in these groupings as the predictors used and the reason for their use is study specific and not always explained