Can We Improve Diagnosis of Depression with XGBOOST Machine Learning Model & a Large Biomarkers Dutch Dataset?

Research Paper Title

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( n = 11,081).

Abstract

Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardised patient interviews with defined set of questions and scales which is a time consuming and costly process.

The objective of the researchers was to apply the machine learning model and to evaluate to see if there is predictive power of biomarkers data to enhance the diagnosis of depression cases.

In this research paper, they aimed to explore the detection of depression cases among the sample of 11,081 Dutch citizen dataset. Most of the earlier studies have balanced datasets wherein the proportion of healthy cases and unhealthy cases are equal but in their study, the dataset contains only 570 cases of self-reported depression out of 11,081 cases hence it is a class imbalance classification problem. The machine learning model built on imbalance dataset gives predictions biased toward majority class hence the model will always predict the case as no depression case even if it is a case of depression.

The researchers used different resampling strategies to address the class imbalance problem. They created multiple samples by under sampling, over sampling, over-under sampling and ROSE sampling techniques to balance the dataset and then, they applied machine learning algorithm “Extreme Gradient Boosting” (XGBoost) on each sample to classify the mental illness cases from healthy cases.

The balanced accuracy, precision, recall and F1 score obtained from over-sampling and over-under sampling were more than 0.90.

Reference

Sharma, A. & Verbeke, W.J.M.I. (2021) Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( n = 11,081). Frontiers in Big Data. doi: 10.3389/fdata.2020.00015. eCollection 2020.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.