Machine learning for distinguishing saudi children with and without autism via eye-tracking data

The experiment followed the steps reported in Fig. 1 and described in detail in the next sections. Ethical approval was obtained from King Faisal Specialist Hospital and Research Centre (RAC – 2,201,183). The experiments took place in the Human Behavior Lab of the hospital. Participants were welcomed to the lab and informed about the procedure upon arrival. Consent forms were collected with the permission of the participants’ guardians.

Fig. 1figure 1

Experiment procedure steps

Participants

A participant recruitment form was electronically disseminated through email and the hospital’s official social media channels to current patients at the center and those on the waiting list. A total of 130 participants were enrolled for this study. Of these, 28 were later excluded from the analysis due to inadequate calibration or because their measurements were invalid (e.g., they did not look at the screen during one or more visual stimuli). The final number of participants with satisfactory calibration and engagement rate was 104 (74 males, 30 females): 41 neurotypical (23 male, 18 female), 63 with ASD (51 male, 12 female). In the ASD group, the youngest participant was one year and ten months old, and the oldest was 22 years old; in the neurotypical group, the age range spanned from two to 17 years old. The mean age was 8 ± 3.89 years for the ASD group and 8.21 ± 4.12 years for neurotypical group. The two groups were comparable in terms of age (t = 0:26(72); p = 0:797; d = -0:05) and gender (χ2 = 19:36(107); p = 1). Consequently, we deemed our sample heterogeneous, accurately representing the population of clinic attendees in Saudi Arabia.

The Social Communication Questionnaire (SCQ) was collected for most participants in both groups (N = 94). A cut-off score of 14 was employed to identify the risk of further assessment for autism, which is one of the clinic’s initial screening processes. Confirmation of an ASD diagnosis was carried out at the clinic by a multidisciplinary team comprising five divisions: pediatric neurology, psychology, occupational therapy and sensory integration, speech and language pathology, and behavioral analysis and observation. In our ASD sample, 60% (N = 36) of participants also had their ASD diagnosis confirmed using ADOS. In all instances, DSM-5 criteria were applied.

Setup

The testing room was specifically tailored for this experiment, with careful attention given to maintaining consistent lighting, ensuring a constant distance of 50 to 60 cm between the participant and the screen, and standardizing the height at which participants saw the screen by using an adjustable chair and footrest; the average height from the floor to the center of the screen was 114 cm.

The experimental setup minimized distractions by having participants sit in front of the screen, separated from the control computer by a divider. Parents could observe their children from behind a one-way glass mirror positioned behind the participant. Some participants preferred to have a parent present and to sit on their lap. In such cases, the parent was asked to wear black sunglasses to prevent data contamination. In compliance with the ethical approval obtained for the study, one parent was always present during the experiment, either inside the room wearing black sunglasses or behind the one-way glass mirror. Figure 2 shows the experimental setup utilized for data collection.

Fig. 2figure 2 Data collection procedure

Visual stimuli were specifically designed for this study to investigate differences in visual behavior during social perception tests, including left visual field (LVF) bias, joint attention, and perception of human and animal cartoon faces. To examine LVF bias and eye gaze, four distinct child avatar faces were created. Each stimulus was displayed for a total of four seconds while participants’ gaze patterns were recorded. Moreover, we included images of four animal cartoon faces to explore potential similarities and differences in gaze patterns compared to human faces.

To assess joint attention, we presented images of children avatars with their gaze directed towards one of two toys, arranged in a numerically balanced and pseudo-randomized fashion on either the left or the right side of the child avatar (Fig. 3). In total, four child avatar identities were shown, each appearing twice with the target toy on either the left or right side. The time taken to achieve the first fixation and the duration participants spent looking at the object gazed upon by the avatars were recorded as the primary outcome measures. It was anticipated that children with autism would be less inclined to direct their attention towards the object viewed by the avatars. Participants were simply instructed to “look at the screen” during the task, with no additional guidance provided. Additional metrics were examined during the experiment, using stimuli probing scene perception, visual disengagement, and pupillary reactions; these results will be reported in a separate paper. The experiments were conducted using iMotions software version 8.3 [17]. Visual stimuli were displayed on a 24-inch screen with a resolution of 1920 × 1080 pixels. Areas of interest (AOI) were designated for each image shown. A screen-based eye tracker by Tobii Pro Fusion, which captured gaze data at a maximum frequency of 120 Hz [18], was utilized for eye-tracking.

Fig. 3figure 3

Examples of the visual stimuli and AOI for the JA task

Data pre-processing

Relevant areas of interest (AOIs) were drawn on each of the stimuli using the iMotions software. For condition one (Attention on Eyes), two AOIs were drawn on the eyes. For condition two (LVF Bias), AOIs were drawn on the left and right sides of the screen. For condition three (Joint Attention), AOIs were drawn for congruent and incongruent objects. From the iMotions AOI Metrics, ‘dwell time’ (defined below) was exported for each AOI per participant per stimuli. Further, ‘time to first fixation (TTFF)’ (defined below) was exported for condition three. Finally, mean values were calculated per group and t-tests were calculated to measure between-group differences on these metrics. All statistical analyses were done in RStudio. Outliers or participants with missing data (one to three per prediction) were removed.

In total, the dataset consisted of a binary label (autistic or not) and a set of input features for each test of each child:

SCQ score.

DwellTimeLeftEye (ms): time spent gazing on the left eye (human avatar).

DwellTimeLeftEye_animal (ms): time spent gazing on the left eye (animal avatar).

DwellTimeLeftFace (ms): time spent gazing on the left side of the face (human avatar).

DwellTimeLeftFace_animal (ms): time spent gazing on the left side of the face (animal avatar).

DwellTimeLeftSide (ms): time spent gazing on the left side of the screen (human avatar).

DwellTimeLeftSide_animal (ms): time spent gazing on the left side of the screen (animal avatar).

DwellTimeCongruent (ms): time spent gazing at the same object gazed upon by the avatar.

TTFFLeftEye (ms): time it took for first fixation on left eye (human avatar).

TTFFLeftEye_animal (ms): time it took for first fixation on left eye (animal avatar).

TTFFLeftFace (ms): time it took for first fixation on left side of the face (human avatar).

TTFFLeftFace_animal (ms): time it took for first fixation on left side of the face (animal avatar).

TTFFLeftSide (ms): time it took for first fixation on the left side of the screen (human avatar).

TTFFLeftSide_animal (ms): time it took for first fixation on the left side of the screen (animal avatar).

TTFFCongruent (ms): milliseconds it took for first fixation on the same object gazed upon by the avatar.

Machine learning-based classification

To identify the best algorithm for the classification problem at hand, we tested several popular machine learning algorithms, namely: logistic regression, support vector machines (SVMs), random forests (RF), and a custom-made long short-term memory (LSTM) neural network [19].

Our LSTM model takes three inputs and predicts if the corresponding participant has ASD or not. Each input is fed to one of three branches of the model: one that takes as input the age and gender of the participant; one for the features relative to human avatars, organized as sequences of 20 values for each participant; and one for the features relative to animal avatars, organized as sequences of 5 values for each participant. The two sequence inputs are fed to two separate LSTM blocks, each consisting of one LSTM layer, which are particularly good at handling sequence data. The output of the two LSTM blocks is then concatenated (together with the scalar input) and fed to a dense layer, which performs the classification via a sigmoid activation function (Fig. 4).

As we had limited amounts of data (104 participants, each constituting a data point), we performed K-fold cross-validation. It involves splitting the dataset into K equally sized folds, training the model on K − 1 folds and evaluating its performance on the remaining fold. This process is repeated K times, with each fold serving as the validation set once, and the results are averaged to provide an overall estimate of the model’s performance. The technique helps to reduce overfitting and improve the generalization of the model. In this study we chose K = 5, although K = 10 is another equally valid and popular choice.

We compare the performance of all models using four metrics: accuracy, precision, recall, and F1. All traditional models were implemented using the Python library ‘sklearn’, whereas the LSTM model was implemented using the Python library ‘tensorflow’.

Fig. 4figure 4

The architecture of the proposed LSTM network

留言 (0)

沒有登入
gif