State-of-the-Art of Stress Prediction from Heart Rate Variability Using Artificial Intelligence

Rule-Based Approaches

Kumar et al. [85] addressed the issue of explainability of fuzzy theoretic nonparametric deep model applications in biology and medicine. They used one previously studied dataset of 50 subjects and a new dataset of 100 subjects and obtained (Pearson’s correlation coefficient (r): 0.8162 (old dataset) vs 0.6809 (new dataset), RMSE: 6.8382 (old dataset) vs 9.4872 (new dataset)).

El-Samahy et al. [83] found a close match between the measurement of the proposed system and the actual measurements acquired from human volunteers. The system was built and evaluated using heart rate and pupil diameter data collected from 5 people. To compare the achievements of subjects 1 and 2, an evaluation index (EI) was produced for each of them. During levels 1–3, subject 1 had a high EI of over 90%. On the other hand, subject 2 showed an EI between 60 and 90% throughout the whole experiment, which means the levels of mental stress will be unchanged.

Ranganath et al. [86], using their proposed wavelet transform and neuro-fuzzy inference system, evaluate stress using HRV. To investigate the activity of the ANS, the authors performed a time-frequency analysis (TFA) of HRV, which can be used to quantify mental stress. The authors studied 20 physically fit adults at two points in time: before and after they began smoking and acquired a spectral decomposition of HRV. These were used to build the proposed NF-based model.

Kumar et al. [87] proposed a fuzzy clustering method which helped to quantify mental stress and demonstrate a direct functional link between ANS activities and mental stress. The researchers used NASA Task Load Index to examine subjective ratings of mental workload in 38 physically fit volunteers in air traffic management task simulations.

Wang et al. [84] provided a way for utilising HRV to correlate the human body’s salivary response to stress. They used 176 ECG recordings and 264 salivary samples from 22 people. They have generated six datasets (3-amylase, 3-cortisol) using alpha-amylase and cortisol measurements to label ECG feature vectors. The final classifier system correctly classified salivary cortisol based on ECG characteristics with an accuracy of 80%, compared to 75% for salivary alpha-amylase. A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of RB stress prediction research is presented in Table 10.

Table 10 A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of rule-based stress prediction researchShallow Machine Learning Approaches

Sriramprakash et al. [88] used ECG, skin conductance, and Kinect 3D sensor to collect data from selected individuals. The SWELL-KW dataset was used for classification (149 features and 2688 instances in total) and got accuracies: 66.52% (KNN) vs 72.83% (SVM-RBF kernel).

Huang et al. [89] demonstrated that the mental fatigue of the samples could be accurately identified with a wearable ECG device. They collected 58 samples of ECG signals and compared SVM, NB, KNN, and LR algorithms to obtain accuracy (57.08% 9(SVM) vs 48.84% (NB) vs 65.37% (KNN) vs 59.71% (LR)) and area under the curve (AUC) (0.68 (SVM) vs 0.64 (NB) vs 0.74 (KNN) vs 0.65 (LR)). Wu et al. [90] combined HRV sensors and accelerometers to develop a model for monitoring the perceived stress levels in daily life. They collected data from 8 participants for their daily life in about 2 weeks and compared the performances of NB, J48, RF, and bagging algorithms where accuracy 0.730 (NB) vs 0.819 (J48) vs 0.832 (RF) vs 0.8392 (bagging) were obtained.

Sevil et al. [75] addressed the problem of detecting psychological stress (APS) using data collected from wristbands. They collected data from 34 samples doing 166 clinical experiments and compared different classification algorithms: KNN, SVM, DT, NB, EL, LD, and DL, where SVM had the highest accuracy of 99.1%.

Pourmohammadi and Maleki [91] collected EMG and ECG signals concurrently from 34 healthy students (23 females and 11 males, ages 20 to 37). They used LIBSVM (a library for SVM) with RBF (radial basis function) kernel for training the model. Sequentially, stress identification accuracy was 100%, 97.6%, and 96.2 % for the two, three, and four levels. Maldonado et al. [92] collected data from 50 engineering students in Chile, with a total of 33 men and 17 women aged 22.4 ± 2.8 years. They took HR, SpO2, and temperature readings to utilise in their SVM model, which yielded an AUC of 0.994 with a variable collecting cost of 16.

Pluntke et al. [93] acquired HRV data from subjects in a laboratory setting, and SVM and DT were used to train the model. A set of labelled RR-interval signals was collected as a training set. They used an H7 chest strap sensor to collect data from 26 male and female participants ranging in age from 23 to 59. A precision, recalling, and F-score of almost 90% were shown in the best model based on a DT of C5.

Giannakakis et al. [94] evaluated 24 participants and 11 tasks, performing a research protocol for about 45 min. They used KNN, generalised linear model (GLM), NB, linear discriminant analysis (LDA), SVM, and RF classifiers, where RF excels with a classification accuracy of 75.1% above any other classification method. 84.4% classification accuracy in a 10-fold method is the best result in the proposal of stress recognition simply by using hRV characteristics.

Castaldo et al. [95] used a 3-lead electrocardiogram (ECG) to collect data from 42 students on two distinct days, including during an oral examination (stress) and during rest following a holiday. They employed five distinct algorithms (NB, SVM, MLP, AB, and C4.5 (DT)). With sensitivity, specificity, and accuracy rates of 78%, 80%, and 79%, correspondingly, the C4.5 tree algorithm was the best ML technique for distinguishing between stress and rest.

Delmastro et al. [96] collected data conducting a randomised cross-over observational study where Zephyr BioHarness34 device was used for ECG monitoring and Shimmer3 GSR+Development Kit5 for EDA. Some algorithms (BN, SVM, k-NN, C4.5 DT, AB) were used where RF and AB learning schemes outperform the other classifier learning methods (accuracy: 87% for RF and 88.2% for AB).

Table 11 A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of Shallow ML-based stress prediction research

Lima et al. [97] gathered information using some sensors (such as PPG, Spare, TVOC) from a group of willing participants (15 participants, ranging in age from 21 to 55 years old (9 females and 6 males)). While under stress, the model had an accuracy of about 80% in terms of HRV features in baseline and about 77 % in terms of HRV and EDA simultaneous baseline characteristics.

Yu et al. [98] used the ensemble learning technique to create a classifier that incorporates three separate work activities: body movement, typing, and browsing. These can be identified with 94.2%, 93.2%, and 91.2% accuracy, correspondingly. They gathered information from ten office workers, all of whom were around 31 years old.

Padmaja et al. [99] collected data from a smartphone and a Fitbit and then preprocessed and normalised it. They used NB (accuracy: 72%) and DT (accuracy: 62%) for classification. DetectStress has a 72% accuracy rate in recognising perceived stress utilising data from both smartphones and wireless fitness trackers.

Can et al. [100] collected physiological signal and questionnaire data from the 21 participants by using Samsung Gear S and S2 and Empatica E4 sensors. From HR and ACC signals acquired using Empatica E4, the MLP algorithm produced the best results (92.19%), while the RF algorithm produced the best classification accuracy (88.26%) with HR and ACC data collected from all devices.

Chen et al. [101] collected data from PPG and Polar H10 sensors, used RF as a classifier, and compared it with the SVM, Naïve Bayes, and MLP model. In the PPG dataset, their approach obtains an overall leave-one-participant-out F1-score of 80%, while the ground truth ECG scores 79.7%. Koldijk et al. [102] used the SWELL-KW dataset (149 features and 2688 instances in total) and compared SVM (accuracy: 90.0298%) with 7 other algorithms, which includes NB (64.7693%), K-star (65.8110%), Bayes net (69.0848%), J48 (78.1994%), IBk (nearest neighbour with euclidean distance (84.5238%)), RF(87.0908%), and MLP (88.5417%).

Ciabattoni et al. [103] utilised KNN to classify stress using uniform precedence probability and Euclidean distance metrics with one neighbour. An accuracy of 84.5% has been determined altogether. In recognition of stress, a 26% misclassification error was detected when the individual was calm.

Attaran et al. [104] utilised the ThreatFire belt for data collection and employed several physiological and behavioural factors with both SVM and KNN classifiers to increase the detection accuracy. The best classification accuracy to identify stress was observed for the heart rate (HR) and accelerometer characteristics. For hardware implementation, the SVM classification was utilised, and this system has an overall classification accuracy of 96%.

Table 12 A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of Shallow ML-based stress prediction research

Castaldo et al. [105] collected 23 ultra-short HRV features from 42 healthy subjects. They found six out of 23 ultra-short HRV features (MeanNN, StdNN, MeanHR, StdHR, HF, and SD2) displaying consistency in the detection of stress. The authors employed 5 ML algorithms and found their accuracies: MLP (98%) vs SVM (88%) vs C4.5 DT(94%) vs IBK (94%) vs LDA (94%).

Hantono et al. [106] recorded heart rate data using PPG sensors in smartphones from 41 subjects. They analysed the data and extracted HRV features to detect mental stress. The authors employed NN, KNN, DA, and NB algorithms to find the accuracies: NN (73%) vs KNN (82%) vs DA (66%) vs NB (60%).

Tiwari et al. [107] collected ECG and breathing data from 27 police trainees over the course of 15 weeks. They extracted ultra-short-term HRV and breathing features from the data and predicted stress. Results suggested that ultra-short-term analysis for stress prediction results in performance losses lower than 7% when compared to short-term analysis. They used an SVM classifier with RBF kernel, resulting in 80% performance accuracy.

Clark et al. [108] proposed a model for driver stress prediction. They collected data from 17 subjects using ECG, GSR, and respiration sensors after they completed a 20-mile drive. The authors extracted 42 features from the data to use in an RF classifier which achieved an average accuracy of 94%. Ahmad et al. [109] collected the dataset named Ryerson Multimedia Research Laboratory (RML), which was recorded by physiological signals using 9 participants and measured ECG, GSR, and respiration signals. They used raw data, which is procured from the ECG signal. For the proposed fusion model, they got 66.6% and 72.7% in the RML and WESAD datasets, respectively.

Dalmeida et al. [110] investigated the role of HRV features stress predicted from ECG, EMG, GSR, and respiration sensor data. They used a dataset collected by MIT and available in Physionet. They tested different ML models such as KNN, SVM, MLP, RF, and GB. MLP was considered an appropriate stress classification method with an 80% sensitivity score. HRV features such as the AVNN, SDNN, and RMSSD were found to be relevant aspects for stress identification.

Table 13 A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of Shallow ML-based stress prediction research

Sandulescu et al. [111] present an SVM-based approach for stress prediction by collecting PPG, HRV, and EDA sensor data from 5 participants. The results showed 82% accuracy on two participants and more than 80% precision level for all the participants.

Munla et al. [112] intended to study stress-level detection from HRV features extracted from 16 different subjects from the Stress Recognition in Automobile Driver database (DRIVEDB). They used three ML models and achieved accuracies: KNN (66.66%) vs SVM (83.33%) and SVM with RBF kernel (83.3%).

A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of shallow ML-based stress prediction research is presented in Tables 11, 12, and 13.

Deep Machine Learning Approaches

de Vries et al. [113] collected GSR, RSP, and ECG sensor data from 61 participants from the age of 18 to 28 years to perform stress and relaxation classification. They used learning vector quantisation to achieve an accuracy of 88% for the classification.

Rastgoo et al. [115] collected ECG, vehicle, and environmental data from 27 participants in a vehicle simulator. They proposed a CNN and LSTM-based multimodal fusion model, which showed an accuracy of 92.8%, sensitivity of 94.13%, specificity of 97.37%, and precision of 95.00%.

Akbulut et al. [116] developed a stress model that incorporates an algorithm for detecting affective states based on HRV analysis, emotion recognition, and other statistical data. They collected the dataset conducted with 30 volunteers and named it CVDiMo. In categorising the stress levels of all patients, their suggested method had a 90.5% accuracy rate. The average success rate of MES patients was found to be 92%, which is greater than the general performance of healthy people.

Coutts et al. [117] recorded HRV features from 652 participants using a wearable sensor. They employed an LSTM network for the detection of stress, anxiety, and depression levels, finding 85% classification accuracy.

He et al. [118] used ECG sensor data from 20 participants to extract six HRV features (HR, LH, pQ, SD2, SDNN, Comb). They used SVM, LDA, and CNN-based models to detect cognitive stress from these models, where CNN (17.3%) outperformed LDA (25.1 ± 14.2%) and SVM (24.5 ± 13.2%) according to detection error rate.

Qin et al. [119] used 10 HRV features extracted from 56 samples of R-R intervals recorded during the modified Stroop test. They used 40 samples as training data and 16 as testing for a stress evaluation system based on the BP neural network, which could detect different levels of stress with an accuracy rate of 93.75%.

Ding et al. [120] recruited 18 healthy individuals to collect heart rate, heart rate variability, electromyography, electrodermal activity, and respiration physiological data to measure changes in physiological activity with varied levels of tasks. While combining physiological signals and task performance, their classification models could achieve accuracy at 96.4% but 78.3% when taking physiological features only.

Kalatzis et al. [121] recruited 57 participants to extract time- and frequency-domain features of HR and HRV using ECG sensors. They used an ANN-based model to classify stress and no-stress states, achieving a 90.83% accuracy level.

Qin et al. [119] used 10 HRV features extracted from 56 samples of R-R intervals recorded during the modified Stroop test. They used 40 samples as training data and 16 as testing for a stress evaluation system based on the BP neural network, which could detect different levels of stress with an accuracy rate of 93.75%.

Table 14 A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of deep ML-based stress prediction research

Ding et al. [120] recruited 18 healthy individuals to collect heart rate, heart rate variability, electromyography, electrodermal activity, and respiration physiological data to measure changes in physiological activity with varied levels of tasks. While combining physiological signals and task performance, their classification models could achieve accuracy at 96.4% but 78.3% when taking physiological features only.

Kalatzis et al. [121] recruited 57 participants to extract time- and frequency-domain features of HR and HRV using ECG sensors. They used an ANN-based model to classify stress and no-stress states, achieving a 90.83% accuracy level.

Dhaouadi and Ben Khelifa [122] used ECG, EDA, and EMG measures taken by wearable devices from 15 young gamers in order to stress monitoring in real time. They explored LSTM and DNN networks where the DNN model obtained the best accuracy of 65% at 15 and 30 epochs, but LSTm achieved the best accuracy of 95% at 30 epochs.

Stewart et al. [123] used two publicly available datasets, which include drivedb and WESAD. Data was collected from both datasets using multiple sensor recordings, including ECG and GSR. They used shallow ML models (such as KNN, SVM, and LR). Neural processes models outperformed those models (WESAD: 0.957 (average precision), drivedb: 0.804 (average precision)) and had the best performance when using periods of stress and baseline as context.

Silva et al. [124] monitored the stress of 83 medical students by comparing stress levels during academic exams and a regular week. Data was collected from wearable sensors such as Microsoft Smart band 2 and PPG. The neural network revealed better performance (model-1: sensitivity, 75.2%; specificity, 77.9%. Model-2: sensitivity, 74.2%; specificity, 78.1%.) where two models were established to predict stress comparing shallow ML algorithms (such as SVM, KNN, LR, RF). A summary of used algorithms, datasets, evaluation metrics, and obtained outcomes of deep ML-based stress prediction research is presented in Table 14.

Discussion

Stress can lead to a variety of psychological issues. Many disorders are more likely to develop in a stressful environment, particularly if the stress is intense and long-lasting [125]. Therefore, being able to predict stress in an effective manner is a crucial fact. In this research, we observed HRV characteristics as physiological indicators for stress detection based on a review of 43 studies published between 2016 and 2021. RMSSD, SDNN, pNN50, and AVNN are determined to be the most often utilised HRV features in our tables. ECG, PPG, and GSR are the most deployed sensors for data collection.

In AI, accuracy is one of the most important performance indicators. The present research has been examined in this article in order to provide a full understanding of the field of stress prediction via HRV.

According to Fig. 8 displaying the performance comparison of the papers based on accuracy level, only one article by Wang et al. [84] employed accuracy as a performance measure for RB techniques. Using the fuzzy ARTMAP classifier, they explored the stress association between HRV and salivary, achieving an overall accuracy of 80% for ECG records.

In the case of shallow ML approaches, Sevil et al. [75] achieved the highest accuracy among the 21 studies utilising accuracy as a performance measure. They used wristband data to quantify psychological stress and attained 99.1% accuracy using the SVM classifier, which is also the highest among all the publications reviewed in this review article. For deep ML techniques, Ding et al. [120] used a BPNN classifier to assess stress based on physiological activity with varying levels of tasks and achieved high accuracy. Their classification models have a 96.4% accuracy rate.

Another performance metric for assessing classification errors is the AUC. This review article contained 5 studies that employed the AUC measure, a two-dimensional area beneath the ROC curve. The highest AUC value for deep ML techniques was attained by Akbulut et al. [116], as shown in Fig. 9. They created a stress model based on HRV analysis, emotion recognition, and other statistical data from the CVDiMo dataset, which includes an algorithm for recognising affective states. Using FFNN, they were able to attain an AUC of 0.97. Maldonado et al. [92] used shallow ML to get the best AUC value of 0.99 for stress detection, which is significantly higher than other models that use AUC as a performance indicator.

Fig. 8figure 8

Performance comparison of the articles based on accuracy level. The different algorithm types are presented using different colours

Fig. 9figure 9

Performance comparison of the articles based on AUC level. The different algorithm types are presented using different colours

留言 (0)

沒有登入
gif