Inspiration

  • Bioengineering students interested in monitoring sleep stages and how they can improve our living quality.

What it does

Data Wrangling

  • Clean the dataset
    • Remove NaN data points.
    • Remove not scored sleep stage.
  • Manipulate
    • Using bandpass filters to extract meaningful data.
    • Normalize data to around 0.
    • Balance the dataset by removing some Waking stage data, which are far away from the sleep stages (> 30 minutes).
    • Calculate time domain statistics, including std, IQR, skewness, kurtosis, number of zero-crossings, Hjorth mobility, Hjorth complexity, higuch fractal dimension, petrosian fractal dimension, permutation entropy, binned entropy (4); and frequency-domain statistics including spectral Fourier statistics (4), binned Fourier entropy (7), absolute spectral power in the 0.4-30 Hz band, relative spectral power in the applied frequency bands (6), fast delta+theta spectral power, alpha/theta spectral power, delta/beta spectral power, delta/sigma spectral power and delta/theta spectral power.

Data Visualization

  • Visualize the data structure.
  • Visualize different channels including EEG Fpz-Cz, EEG Pz-Oz, EOG horizontal, Resp oro-nasal, EMG submental, Temp rectal and the sleep stage labels. Visualize the processed data.
  • Visualize the count of data points in different sleep stages.
  • Visualize the model structure.
  • Visualize the results in a confusion matrix and metric tables.

Process

  1. Collect data information
  2. Load data
  3. Bandpass filter and normalize
  4. Calculate statistics
  5. Save data
  6. Add related temporal statistics for training
  7. Train the model
  8. Evaluate the model
  9. Select the best model and make predictions

Machine Learning Models

  • Implemented Linear Classifier and Multi-Layer Perceptron Classifier with sklearn.
  • Implemented Catboost Classifier from catboost library.
  • The models are trained to convert statistical features calculated from 6 feature channels into 6 distinct sleep stages.

Evaluation

  • We calculated F1 score, balanced accuracy, accuracy, and log loss for each model on the training set and test set.
  • Due to the time limit, we used 2-fold cross-validation.

How we built it

Data process

  • We built helper functions to load data.
  • We used packages and code from paper to calculate statistics.

Visualization

  • The process can be visualized in jupyter notebook.

Model building

  • We used machine learning packages including sklearn and scipy.

Challenges we ran into

  • Model evaluation result
    • It takes a long time to perform delicate evaluations for 3 models.
Model Dataset F1 Balanced accuracy Accuracy Log loss
Linear train 0.8378 0.8529 0.8826 0.7567
Linear test 0.7242 0.7334 0.8005 1.6495
Catboost train 0.9297 0.9193 0.9501 0.1736
Catboost test 0.7490 0.7440 0.8229 0.4717
MLP train 0.9061 0.8920 0.9397 0.1700
MLP test 0.7139 0.7125 0.7979 0.6820
  • Overfitting
    • According to the result, the MLP model is overfitted to the training set, with a better score than the other 2 models on the training set, and a worse score than the Catboost model on the test set.
    • The MLP model is overfitted because it is a complex model with too many parameters, so it can fit the variation of data statistics while leading to a large bias. The overfitting is a result of the bias-variance trade-off.

Accomplishments that we're proud of

  • Finished a meaningful project in 36 hours.
  • Good feature selection and calculation.
  • Selected the best model and performed accurate prediction of sleep stage.

What we learned

  • The data processing for time series data.
    • The time series data can be analyzed as chunks of statistics.
    • Time series data has dependencies on nearby data points.
  • Data visualization is important for the pipeline design.
  • Model building
    • Select proper evaluation metrics.
    • There are a lot of models on the shelf and we have to pick the best one based on the evaluation results.

What's next for Stimulus

  • Check why sleep stage 1 is not well predicted.
  • Check the contribution of each statistic, and remove redundant statistics to save memory.
  • Try to integrate the model into devices that can report real-time sleep stage prediction. The model will change because we will not have data points after the predicted time point.

References

  • Jeroen Van Der Donckt, Jonas Van Der Donckt, Emiel Deprost, Nicolas Vandenbussche, Michael Rademaker, Gilles Vandewiele, Sofie Van Hoecke, Do not sleep on traditional machine learning: Simple and interpretable techniques are competitive to deep learning for sleep scoring, Biomedical Signal Processing and Control, Volume 81, 2023, 104429, ISSN 1746-8094

  • Sun H, Ganglberger W, Panneerselvam E, Leone MJ, Quadri SA, Goparaju B, Tesh RA, Akeju O, Thomas RJ, Westover MB. Sleep staging from electrocardiography and respiration with deep learning. Sleep. 2020 Jul 13;43(7):zsz306. doi: 10.1093/sleep/zsz306.

  • Bakker JP, Ross M, Vasko R, Cerny A, Fonseca P, Jasko J, Shaw E, White DP, Anderer P. Estimating sleep stages using cardiorespiratory signals: validation of a novel algorithm across a wide range of sleep-disordered breathing severity. J Clin Sleep Med. 2021 Jul 1;17(7):1343-1354. doi: 10.5664/jcsm.9192. PMID: 33660612; PMCID: PMC8314617.

  • Morokuma S, Hayashi T, Kanegae M, Mizukami Y, Asano S, Kimura I, Tateizumi Y, Ueno H, Ikeda S, Niizeki K. Deep learning-based sleep stage classification with cardiorespiratory and body movement activities in individuals with suspected sleep disorders. Sci Rep. 2023 Oct 18;13(1):17730. doi: 10.1038/s41598-023-45020-7.

  • Derk-Jan Dijk, Christian Cajochen, Irene Tobler, Alexander A. Borbély, Sleep Extension in Humans: Sleep Stages, EEG Power Spectra and Body Temperature, Sleep, Volume 14, Issue 4, July 1991, Pages 294–306, https://doi.org/10.1093/sleep/14.4.294

  • Immanuel SA, Pamula Y, Kohler M, Martin J, Kennedy D, Saint DA, Baumert M. Respiratory cycle-related electroencephalographic changes during sleep in healthy children and in children with sleep disordered breathing. Sleep. 2014 Aug 1;37(8):1353-61. doi: 10.5665/sleep.3930. PMID: 25083016;

Built With

Share this project:

Updates