Can We Improve Diagnosis of Depression with XGBOOST Machine Learning Model & a Large Biomarkers Dutch Dataset?

Research Paper Title

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( n = 11,081).


Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardised patient interviews with defined set of questions and scales which is a time consuming and costly process.

The objective of the researchers was to apply the machine learning model and to evaluate to see if there is predictive power of biomarkers data to enhance the diagnosis of depression cases.

In this research paper, they aimed to explore the detection of depression cases among the sample of 11,081 Dutch citizen dataset. Most of the earlier studies have balanced datasets wherein the proportion of healthy cases and unhealthy cases are equal but in their study, the dataset contains only 570 cases of self-reported depression out of 11,081 cases hence it is a class imbalance classification problem. The machine learning model built on imbalance dataset gives predictions biased toward majority class hence the model will always predict the case as no depression case even if it is a case of depression.

The researchers used different resampling strategies to address the class imbalance problem. They created multiple samples by under sampling, over sampling, over-under sampling and ROSE sampling techniques to balance the dataset and then, they applied machine learning algorithm “Extreme Gradient Boosting” (XGBoost) on each sample to classify the mental illness cases from healthy cases.

The balanced accuracy, precision, recall and F1 score obtained from over-sampling and over-under sampling were more than 0.90.


Sharma, A. & Verbeke, W.J.M.I. (2021) Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( n = 11,081). Frontiers in Big Data. doi: 10.3389/fdata.2020.00015. eCollection 2020.

What is the Geriatric Depression Scale?


The Geriatric Depression Scale (GDS) is a 30-item self-report assessment used to identify depression in the elderly.

The scale was first developed in 1982 by J.A. Yesavage and colleagues.


In the Geriatric Depression Scale, questions are answered “yes” or “no.” A five-category response set is not utilised in order to ensure that the scale is simple enough to be used when testing ill or moderately cognitively impaired individuals, for whom a more complex set of answers may be confusing, or lead to inaccurate recording of responses.

The GDS is commonly used as a routine part of a Comprehensive Geriatric Assessment. One point is assigned to each answer and the cumulative score is rated on a scoring grid. The grid sets a range of 0-9 as “normal”, 10-19 as “mildly depressed”, and 20-30 as “severely depressed”.

A diagnosis of clinical depression should not be based on GDS results alone. Although the test has well-established reliability and validity evaluated against other diagnostic criteria, responses should be considered along with results from a comprehensive diagnostic work-up. A short version of the GDS (GDS-SF) containing 15 questions has been developed, and the scale is available in languages other than English. The conducted research found the GDS-SF to be an adequate substitute for the original 30-item scale.

The GDS was validated against Hamilton Rating Scale for Depression (HRS-D) and the Zung Self-Rating Depression Scale (SDS). It was found to have a 92% sensitivity and an 89% specificity when evaluated against diagnostic criteria.

Scale Questions and Scoring

The scale consists of 30 yes/no questions. Each question is scored as either 0 or 1 points. The following general cutoff may be used to qualify the severity:

  • Normal 0-9.
  • Mild depressives 10-19.
  • Severe depressives 20-30.


Yesavage, J.A., Brink, T.L., Rose, T.L., et al. (1982) Development and validation of a geriatric depression screening scale: a preliminary report. Journal of Psychiatric Research. 17(1), pp.37-49.

What is the Hamilton Rating Scale for Depression?


The Hamilton Rating Scale for Depression (HRSD), also called the Hamilton Depression Rating Scale (HDRS), abbreviated HAM-D, is a multiple item questionnaire used to provide an indication of depression, and as a guide to evaluate recovery.

Max Hamilton originally published the scale in 1960 and revised it in 1966, 1967, 1969, and 1980. The questionnaire is designed for adults and is used to rate the severity of their depression by probing mood, feelings of guilt, suicide ideation, insomnia, agitation or retardation, anxiety, weight loss, and somatic symptoms.

The HRSD has been criticised for use in clinical practice as it places more emphasis on insomnia than on feelings of hopelessness, self-destructive thoughts, suicidal cognitions and actions. An antidepressant may show statistical efficacy even when thoughts of suicide increase but sleep is improved, or for that matter, an antidepressant that as a side effect increase sexual and gastrointestinal symptom ratings may register as being less effective in treating the depression itself than it actually is. Hamilton maintained that his scale should not be used as a diagnostic instrument.

The original 1960 version contained 17 items (HDRS-17), but four other questions not added to the total score were used to provide additional clinical information. Each item on the questionnaire is scored on a 3 or 5 point scale, depending on the item, and the total score is compared to the corresponding descriptor. Assessment time is about 20 minutes.


The patient is rated by a clinician on 17 to 29 items (depending on version) scored either on a 3-point or 5-point Likert-type scale. For the 17-item version, a score of 0-7 is considered to be normal while a score of 20 or higher (indicating at least moderate severity) is usually required for entry into a clinical trial. Questions 18-20 may be recorded to give further information about the depression (such as whether diurnal variation or paranoid symptoms are present), but are not part of the scale. A structured interview guide for the questionnaire is available.

Although Hamilton’s original scale had 17 items, other versions included up to 29 items (HRSD-29).

Unstructured versions of the HAM-D provide general instructions for rating items, while structured versions may provide definitions and/or specific interview questions for use. Structured versions of the HAM-D show more reliability than unstructured versions with informed use.

Levels of Depression

The UK National Institute for Health & Clinical Excellence (NICE) established the levels of depression in relation to the 17 item HRSD compared with those suggested by the American Psychiatrists Association (APA):

  • Not depressed: 0-7.
  • Mild (subthreshold): 8-13.
  • Moderate (mild): 14-18.
  • Severe (moderate): 19-22.
  • Very severe (severe): >23.

Other Scales

Other scales include:

What is the Montgomery-Asberg Depression Rating Scale?


The Montgomery-Åsberg Depression Rating Scale (MADRS) is a ten-item diagnostic questionnaire which psychiatrists use to measure the severity of depressive episodes in patients with mood disorders.

It was designed in 1979 by British and Swedish researchers as an adjunct to the Hamilton Rating Scale for Depression (HRSD) which would be more sensitive to the changes brought on by antidepressants and other forms of treatment than the Hamilton Scale was. There is, however, a high degree of statistical correlation between scores on the two measures.


Higher MADRS score indicates more severe depression, and each item yields a score of 0 to 6. The overall score ranges from 0 to 60.

The questionnaire includes questions on the following symptoms:

  1. Apparent sadness.
  2. Reported sadness.
  3. Inner tension.
  4. Reduced sleep.
  5. Reduced appetite.
  6. Concentration difficulties.
  7. Lassitude.
  8. Inability to feel.
  9. Pessimistic thoughts.
  10. Suicidal thoughts.

Usual cut-off points are:

  • 0 to 6: normal/symptom absent.
  • 7 to 19: mild depression.
  • 20 to 34: moderate depression.
  • >34: severe depression.


A self-rating version of this scale (MADRS-S) is often used in clinical practice and correlates reasonably well with expert ratings.

The MADRS-S instrument has nine questions, with an overall score ranging from 0 to 54 points.

What is the Zung Self-Rating Depression Scale?


The Zung Self-Rating Depression Scale (SDS) was designed by Duke University psychiatrist William W.K. Zung MD (1929-1992) to assess the level of depression for patients diagnosed with depressive disorder.

The Levels

  • 20-44: Normal Range.
  • 45-59: Mildly Depressed.
  • 60-69: Moderately Depressed.
  • 70 and above Severely Depressed.

The SDS has been translated into many languages, including Arabic, Azerbaijani, Dutch, German, Portuguese, and Spanish.

You can find an online version of the SDS here.

Refer to Zung Self-Rating Anxiety Scale.


Zung, W.A.K. (1965) A Self-Rating Depression Scale. Archives of General Psychiatry. 12(1), pp63-70.

What are Rating Scales for Depression?


A depression rating scale is a psychiatric measuring instrument having descriptive words and phrases that indicate the severity of depression for a time period.

When used, an observer may make judgements and rate a person at a specified scale level with respect to identified characteristics. Rather than being used to diagnose depression, a depression rating scale may be used to assign a score to a person’s behaviour where that score may be used to determine whether that person should be evaluated more thoroughly for a depressive disorder diagnosis. Several rating scales are used for this purpose.

Between 1918 and 2009, more than 280 measures of depressive severity were developed and published (Santor, Gregus & Welch, 2009).

What is the Purpose of a Rating Scale?

To determine degree of depression.

Who Can Complete Rating Scales?

Scales Completed by Researchers

Some depression rating scales are completed by researchers. For example, the Hamilton Depression Rating Scale includes 21 questions with between 3 and 5 possible responses which increase in severity. The clinician must choose the possible responses to each question by interviewing the patient and by observing the patient’s symptoms. Designed by psychiatrist Max Hamilton in 1960, the Hamilton Depression Rating Scale is one of the two most commonly used among those completed by researchers assessing the effects of drug therapy. Alternatively, the Montgomery-Åsberg Depression Rating Scale has ten items to be completed by researchers assessing the effects of drug therapy and is the other of the two most commonly used among such researchers. Another scale is the Raskin Depression Rating Scale; which rates the severity of the patients symptoms in three areas: verbal reports, behaviour, and secondary symptoms of depression.

Scales Completed by Patients

Some depression rating scales are completed by patients. The Beck Depression Inventory, for example, is a 21-question self-report inventory that covers symptoms such as irritability, fatigue, weight loss, lack of interest in sex, and feelings of guilt, hopelessness or fear of being punished. The scale is completed by patients to identify the presence and severity of symptoms consistent with the DSM-IV diagnostic criteria. The Beck Depression Inventory was originally designed by psychiatrist Aaron T. Beck in 1961.

The Geriatric Depression Scale (GDS) is another self-administered scale, but in this case it is used for older patients, and for patients with mild to moderate dementia. Instead of presenting a five-category response set, the GDS questions are answered with a simple “yes” or “no”. The Zung Self-Rating Depression Scale is similar to the Geriatric Depression Scale in that the answers are preformatted. In the Zung Self-Rating Depression Scale, there are 20 items: ten positively worded and ten negatively worded. Each question is rated on a scale of 1 through 4 based on four possible answers: “a little of the time”, “some of the time”, “good part of the time”, and “most of the time”.

The Patient Health Questionnaire (PHQ) sets are self-reported depression rating scales. For example, the Patient Health Questionnaire-9 (PHQ-9) is a self-reported, 9-question version of the Primary Care Evaluation of Mental Disorders. The Patient Health Questionnaire-2 (PHQ-2) is a shorter version of the PHQ-9 with two screening questions to assess the presence of a depressed mood and a loss of interest or pleasure in routine activities; a positive response to either question indicates further testing is required.

The two questions on the Patient Health Questionnaire-2 (PHQ-2):

During the past month, have you often been bothered by feeling down, depressed, or hopeless?

During the past month, have you often been bothered by little interest or pleasure in doing things?

Scales Completed by Patients and Researchers

The Primary Care Evaluation of Mental Disorders (PRIME-MD) is completed by the patient and a researcher. This depression rating scale includes a 27-item screening questionnaire and follow-up clinician interview designed to facilitate the diagnosis of common mental disorders in primary care. Its lengthy administration time has limited its clinical usefulness; it has been replaced by the Patient Health Questionnaire.

What is the Validity and Usefulness of Rating Scales?

How Useful are Rating Scales?

Screening programmes using rating scales to search for candidates for a more in-depth evaluation have been advocated to improve detection of depression, but there is evidence that they do not improve detection rates, treatment, or outcome. There is also evidence that a consensus on the interpretation of rating scales, in particular the Hamilton Rating Scale for Depression, is largely missing, leading to misdiagnosis of the severity of a patient’s depression. However, there is evidence that portions of rating scales, such as the somatic section of the PHQ-9, can be useful in predicting outcomes for subgroups of patients like coronary heart disease patients.

How Valid are Rating Scales?

Several research articles have come out in the past several years that question the validity of sum-score rating scales for depression.

Fried, E.I. (2017) The 52 Symptoms of Major Depression: Lack of Content Overlap Among Seven Common Depression Scales. Journal of Affective Disorders. 208, pp.191-197.

Santor, D.A., Gregus, M. & Welch, A. (2009) Eight Decades of Measurement in Depression. Measurement: Interdisciplinary Research and Perspectives. 4(3), pp.135-155.

Copyrighted vs. Public Domain Rating Scales

The Beck Depression Inventory is copyrighted, a fee must be paid for each copy used, and photocopying it is a violation of copyright. There is no evidence that the BDI-II is more valid or reliable than other depression scales, and public domain scales such as the Centre for Epidemiological Studies Depression Scale (CES-D), the Zung Depression scale and Patient Health Questionnaire – Nine Item (PHQ-9) has been studied as a useful tool.

Other copyrighted scales allow individual clinicians and researchers to make copies for their own use, but require licenses for electronic versions or large-scale redistribution, including:

  • The Clinically Useful Depression Outcome Scale (CUDOS).
  • The Inventory of Depressive Symptomatology (IDS).
  • The Mood and Feelings Questionnaire (MFQ).
  • The Quick Inventory of Depressive Symptoms (QIDS).
  • The Patient Health Questionnaire (PHQ).
    • Research in process – Banner University Medical Centre.
  • Hamilton Rating Scale (HRSDD, HDRS, Ham-D).
  • Columbia Suicide Severity Rating Scale (C-SSRS).
  • Depression and Anxiety Stress Scales (DASS).
  • Depression Self-Rating Scale for Children.
  • Brief Psychiatric Rating Scale (BPRS).
  • Geriatric Depression Scale (GDS).
  • Beck’s Depression Inventory (BDI).
  • HEADS-ED, used in hospital emergency departments.
  • Children’s Depression Rating Scale (CDRS).
  • Behavioural Activation for Depression Scale (BADS-SF).
  • Edinburgh Postnatal Depression Scale.
  • Quick Inventory of Depressive Symptomatology Clinician (QIDS-C).
  • Quick Inventory of Depressive Symptomatology Self Report (QIDS-SR).
  • Kutcher Adolescent Depression Scale (KADS-11).
  • Montgomery-Asberg Depression Scale (MADRS).
  • Clinically Useful Depression Outcome Scale (CUDOS).
  • Hospital Anxiety and Depression Scale.
  • Primary Care Evaluation of Mental Disorders (PRIME-MD).
  • Children’s Depression Inventory (CDI).