Distinct Neural Components of Visually Guided Grasping during Planning and Execution

Setup

A schematic of our setup is shown in Figure 1A. Participants lay supine inside the MRI scanner with their head placed in a head coil tilted by ∼30° to allow direct viewing of real stimulus objects placed in front of them. Below the head we positioned the bottom 20 channels of a 32-channel head coil, and we suspended a four-channel flex coil via a Loc-Line (Lockwood Products) over the forehead. A black wooden platform placed above a participant's hip, enabled the presentation of real objects that participants were required to grasp, lift, and set back down using their right hand. The flat surface of the platform was tilted by ∼15° toward a participant to maximize comfort and visibility. Objects were placed on a black cardboard target ramp (Fig. 1A; ramp dimensions, 15 × 5 × 13 cm) on top of the platform to create a level surface that prevented objects from tipping over. The exact placement of the objects was adjusted so that all required movements were possible and comfortable. Between trials, a participant's right hand rested on a button at a start position on the lower right side of the table. The button monitored movement start and end times. Participants upper right arm was strapped to their upper body and the MRI table using a hemicylindrical brace (Fig. 1A, brace not shown). This prevented shoulder and head movements, thus minimizing movement artifacts while enabling reach-to-grasp movements through elbow and wrist rotations. A small red LED fixation target was placed above at a slightly closer depth location than the object to control for eye movements. Participants were required to maintain fixation on this target at all times during scanning. An MR-compatible camera was positioned on the left side of the head coil to record the participant's actions. Videos of the runs were screened off-line, and trials containing errors were excluded from further analyses. A total of 22 error trials were excluded, 18 of which occurred in one run where the participant erroneously grasped the objects during the planning phase.

Two bright LEDs illuminated the workplace for the duration of the planning and execution phases of each trial; one was mounted on the head coil, and the other was taped to the ceiling of the bore. Another LED was taped to the outside of the bore and was only visible to the experimenter to cue the extraction and placement of the objects. The objects were kept on a table next to the MRI scanner on which three LEDs cued the experimenter on which object to place inside the scanner. Participants wore MR-safe headphones to relay task instructions on every trial. The LEDs and headphones were controlled by a MATLAB script on a PC that interfaced with the MRI scanner. Triggers were received from the scanner at the start of every volume acquisition. All other lights in the MRI room were turned off, and any other potential light sources and windows were covered so that no other light could illuminate the participant's workspace.

Stimuli

Stimuli were three L-shaped objects of the same size, created from seven blocks (cubes, 2.5 cm side length). One object was constructed with seven cubes of beech wood (object weight, 67 g), whereas the other two were both constructed of four brass and three wooden cubes (object weight, 557 g). We performed pilot testing to ensure that the objects and their movements did not evoke artifacts related to the movement of masses within the scanner (Barry et al., 2010). Specifically, we placed a spherical MRI phantom (immobile mass) in the scanner and collected fMRI data while the experimenter placed and removed the objects as they would in the actual experiment. Functional time courses were carefully examined to ensure that no artifacts were observed (such as spikes or abrupt changes in signal at the time of action; Culham, 2006; Singhal et al., 2013). The two identical wood/brass objects were positioned in two different orientations, one with the brass arm pointing up (Fig. 1F, BrassUp), the other with the brass arm lying down (BrassDown). In a slow event-related fMRI design, on each trial, participants directly viewed, grasped, and lifted an object placed on a platform.

Task

Participants performed three distinct grasps per object with each grasp marked on the objects with colored stickers during the experiment. The colors were clearly distinguishable inside the scanner and served to cue participants about which grasp to perform. Participants were instructed to perform three-digit grasps with their right hand, by placing the thumb in opposition to index and middle fingers. This grasp was similar to the precision grip grasps used in our previous work (Maiello et al., 2019, 2020; Klein et al., 2020, 2021) but ensured participants could apply sufficient grip force to lift all objects to a height of ∼2 cm above the platform. Grasp contact locations for the index and thumb were selected to produce a set of uncorrelated—and thus linearly independent—representational dissimilarity matrices (RDMs) for the three grasp factors investigated, that is, grasp axis, grasp size, and object mass. Specifically, grasps could be rotated 45° either clockwise or counterclockwise around the vertical axis and could require small (2.5 cm) or large (7.5 cm) grip apertures. In pilot testing we further refined the positioning of the objects and grasps within the magnetic field of the MRI scanner to avoid forming eddy currents within the brass parts of the objects that could hinder participants from executing the grasps. The complete set of grasp conditions is shown in Figure 1C.

Experimental design and statistical analysisfMRI experimental procedure

We employed a slow-event-related fMRI design with trials spaced every 23–31 s. Participants underwent four experimental runs in which they performed each combination of three objects times three grasps twice per run (18 trials × run, 72 trials total) in a pseudorandom order to minimize trial order effects (van Polanen and Davare, 2015a; Maiello et al., 2018; van Polanen et al., 2020). The sequence of events occurring on each trial is schematized in Figure 1B. Before each trial, the experimenter was first cued on which object to place inside the scanner. The experimenter placed the object on the ramp. At trial onset, the illumination LEDs turned on, and the participant heard the instruction plan over the headphones, immediately followed by the auditory cue specifying which grasp to execute. The auditory cue was blue, green, or red, which corresponded to colored stickers marking the grasp locations on the objects. The duration of the planning phase of the task was randomly selected to be 6, 8, 10, or 12 s. During this time, the participant was required to hold still and mentally prepare to grasp the object at the cued location. Following previous research (Gallivan et al., 2014, 2016), we used a variable delay between cue and movement onset to distinguish sustained planning-related neural activity from the movement-execution response accompanying action initiation. It is important to note that we use the term “action planning” for a sustained action planning previewing phase in which participants are thinking about how to execute the movement and must thus access mental representations of the object and task. In this kind of delayed action task, previous work has demonstrated that dorsal stream areas plan and maintain action goals (Singhal et al., 2013). We specifically do not mean the purely feedforward movement planning that occurs only a few hundred milliseconds before movement initiation (Westwood and Goodale, 2003) because it is unfeasible to investigate neural signals at this timescale though fMRI BOLD activity.

Once the planning phase ended, the word “Lift” was spoken through the headphones to cue the participant to execute the grasp. During the execution phase of the task, the participant had 7 s to reach, grasp, and lift the object straight up by ∼2 cm, place it back down on the target ramp, and return their hand to the start position. The illumination LEDs turned off, and the participant waited for a 10–12 s intertrial interval (ITI) for the next trial to begin. During the ITI the experimenter removed the object and placed the next one before the onset of the following trial. We note that we did not include a passive preview phase in our trial design because we have repeatedly shown in previous studies that action intentions cannot be decoded from neural activity recorded during passive stimulus preview (Gallivan et al., 2011, 2013a,b).

Participants were instructed about the task, familiarized themselves with the objects, and practiced the grasps outside the MRI room for ∼5 min before the experiment. Once participants were strapped into the setup, they practiced all grasps again, thus ensuring that they could comfortably grasp each object.

Grasp comfort ratings

At the end of the fMRI experiment, participants remained positioned in the scanner and performed a short rating task. Participants were asked to perform one more time each of the nine grasp conditions. For each grasp, participants verbally reported how comfortable the grasp was on a scale of 1–10 (1 being highly uncomfortable and 10 being highly comfortable). Verbal ratings were manually recorded by the experimenter.

Analyses

Data analyses were conducted using BrainVoyager 20.0 (BV20) and 21.4 software packages (Brain Innovation) as well as MATLAB version R2019b.

fMRI data acquisition

Imaging was performed using a Siemens 3T Magnetom Prisma Fit MRI scanner at the Robarts Research Institute at the University of Western Ontario. Functional MRI volumes were acquired using a T2*-weighted, single-shot, gradient-echo, echoplanar imaging acquisition sequence. Functional scanning parameters were time to repetition (TR) = 1000 ms, time to echo (TE) = 30 ms, field of view = 210 × 210 mm in plane, 48 axial 3 mm slices, voxel resolution = 3 mm isotropic, flip angle = 40°, and multiband factor = 4. Anatomical scans were acquired using a T1-weighted MPRAGE sequence with the following parameters: TR = 2300 ms; field of view = 248 × 256 mm in plane, 176 sagittal 1 mm slices; flip angle = 8°; 1 mm isotropic voxels.

fMRI data preprocessing

Brain imaging data were preprocessed using the BV20 Preprocessing Workflow. First, we performed Inhomogeneity Correction and extracted the brain from the skull. We then coregistered the functional images to the anatomic images and normalized anatomic and functional data to Montreal Neurological Institute (MNI) space. Functional scans underwent motion correction and high-pass temporal filtering (to remove frequencies below three cycles/run). No slice scan time correction and no spatial smoothing were applied.

General linear model

Data were further processed with a random-effects general linear model that included one predictor for each of the 18 conditions [three grasp locations times three objects times two phases (planning versus execution)] convolved with the default BrainVoyager two-gamma hemodynamic response function (Friston et al., 1998) and aligned to trial onset. As predictors of no interest, we included the six motion parameters (x, y, and z translations and rotations) resulting from the 3D motion correction.

Definition of regions of interest

We investigated a targeted range of regions of interest (ROIs). The locations of these ROIs are shown in Figure 1H. The criteria used to define the regions and their MNI coordinates are provided in Table 1. ROIs were selected from the literature as regions most likely specialized in the components of visually guided grasping investigated in our study. These included primary visual cortex (V1), areas LOC, pFS, and PPA within the ventral visual stream (occipitotemporal cortex), areas SPOC, aIPS, PMv, PMd within the dorsal visual stream (occipitoparietal and premotor cortex), and M1/primary somatosensory cortex (S1).

Table 1.

Regions of interest and their peak x, y, and z coordinates in MNI space

V1 was included because it represents the first stage of cortical visual processing on which all subsequent visuomotor computations rely. Primary motor area M1 was included instead as the final stage of processing, where motor commands are generated and sent to the arm and hand. In our study, however, we refer to this ROI as M1/S1 because our volumetric data do not allow us to distinguish between the two banks of the central sulcus along which motor and somatosensory regions lie.

We next selected regions believed to perform the sensorimotor transformations that link visual inputs to motor outputs. The dorsal visual stream is thought to be predominantly specialized for visually guided actions, whereas the ventral stream mostly specializes in visual object recognition (Goodale and Milner, 1992; Culham et al., 2003; Cavina-Pratesi et al., 2007; Vaziri-Pashkam and Xu, 2017). Nevertheless, significant cross talk occurs between these streams (Budisavljevic et al., 2018), and visual representations of object material properties have been found predominantly in ventral regions. We therefore selected areas across both dorsal and ventral visual streams that would encode grasp axis, grasp size, and object mass.

We expected grasp axis could be encoded in dorsal stream regions SPOC (Fattori et al., 2004, 2009, 2010; Monaco et al., 2011), aIPS (Taubert et al., 2010), PMv (Murata et al., 1997; Raos et al., 2006; Theys et al., 2012), and PMd (Raos et al., 2004). We expected grasp size to be encoded in dorsal stream regions SPOC, aIPS (Monaco et al., 2015), PMd (Monaco et al., 2015), and PMv (Murata et al., 1997; Raos et al., 2006; Theys et al., 2012), and ventral stream region LOC (Monaco et al., 2015). We expected visual estimates of object mass to be encoded in ventral stream regions LOC, pFS, and PPA (Cant and Goodale, 2011; Hiramatsu et al., 2011; Gallivan et al., 2014; Goda et al., 2014, 2016). We further hypothesized that the network formed by aIPS, PMv, and PMd might play a role in linking ventral stream representations of object mass to the motor commands generated and sent to the hand through M1 (Murata et al., 2000; Borra et al., 2008; Davare et al., 2009, 2010, 2011; Janssen and Scherberger, 2015; van Polanen and Davare, 2015b; Schwettmann et al., 2019; Schmid et al., 2021).

It should be noted that we do not expect the set of ROIs investigated here to be the exhaustive set of regions involved in visually guided grasping. For example, subcortical regions are also likely to play a role (Nowak et al., 2007; Prodoehl et al., 2009; Cavina-Pratesi et al., 2018). However, cortical and subcortical structures require different imaging protocols (De Hollander et al., 2017; Miletić et al., 2020), and the small size and heterogeneity of subcortical structures also require different normalization, coregistration, and alignment techniques than those used in the cortex (Diedrichsen et al., 2010). Moreover, adding further ROIs would reduce statistical power when correcting for multiple comparisons. We thus chose to focus on a constrained set of cortical regions for which we had a priori hypotheses regarding their involvement in the aspects of visually guided grasping investigated here. Nevertheless, we hope that exploratory analyses on our open access data may guide future studies mapping out the distributed neural circuitry involved in visually guided grasping.

Figure 1H shows our selected ROIs as volumes within the Colin 27 template brain (https://nist.mni.mcgill.ca/colin-27-average-brain-2008/). To locate all left-hemisphere ROIs (except V1) in a standardized fashion, we searched the automated meta-analysis website https://neurosynth.org (Yarkoni et al., 2011) for key words (Table 1), which yielded volumetric statistical maps. Visual inspection of the maps allowed us to locate the ROIs we had preselected based on a combination of activation peaks, anatomic criteria, and expected location from the relevant literature. For example, aIPS was selected based on the hotspot for grasping nearest to the intersection of the intraparietal and postcentral sulci (Culham et al., 2003). Spherical ROIs of 15 mm diameter, centered on the peak voxel, were selected for all regions except V1. Because Neurosynth is based on a meta-analysis of published studies, search terms like V1 would be biased to the typical retinotopic locations used in the literature and likely skewed toward the foveal representation (whereas the objects and hand would have been viewed across a larger expanse within the lower visual field). As such, we defined V1 in the left-hemisphere V1 using the Wang et al. (2015) atlas, which mapped retinotopic cortex plus or minus at ∼15° from the fovea. Table 1 presents an overview of our ROI selection where we list all our Neurosynth-extracted ROIs with their peak coordinates, search terms, and download dates. We also share our ROIs (in MNI space) in the nifti format.

Representational similarity analysis

The analysis of activation patterns within the selected ROIs was performed using multivoxel pattern analysis, specifically, RSA (Kriegeskorte, 2008; Kriegeskorte et al., 2008). An activation pattern corresponded to the set of normalized beta-weight estimates of the BOLD response of all voxels within a specific ROI for a specific condition. To construct RDMs for each ROI, we computed the dissimilarity between activation patterns for each condition. Dissimilarity was defined as 1-r, where r was the Pearson correlation coefficient. RDMs were computed separately from both grasp planning and grasp execution phases. These neural RDMs computed were then correlated to model RDMs (Fig. 1D–F) to test whether neural representations encoded grasp axis, grasp size, and object mass. To estimate maximum correlation values expected in each region given the between-participant variability, we computed the upper and lower bounds of the noise ceiling. The upper bound of the noise ceiling was computed as the average correlation of each participant's RDMs with the average RDM in each ROI. The lower bound of the noise ceiling was computed by correlating each participant's RDMs with the average of the other participants' RDMs. All correlations were performed between upper triangular portions of the RDMs excluding the diagonal. We then used one-tailed Wilcoxon signed rank tests to determine whether these correlations were significantly above zero within each ROI. We set statistical significance at p < 0.05 and applied false discovery rate (FDR) correction for multiple comparisons following Benjamini and Hochberg (1995).

To visualize the representational structure of the neural activity patterns within grasp planning and grasp execution phases, we first averaged RDMs across participants in each ROI and task phase. We then correlated average RDMs across ROIs within each phase and used hierarchical clustering and multidimensional scaling to visualize representational similarities across brain regions. We also correlated average RDMs across ROIs and across planning and execution phases. Statistically significant correlations (p < 0.05 with Bonferroni correction) are shown also as topological connectivity plots (within-phase data) and as a Sankey diagram (between-phase data; see Fig. 3F).

Grasp comfort ratings

Grasp comfort ratings were analyzed using simple t tests to assess whether ratings varied across different grasp axes, grasp sizes, or object mass. The difference between ratings for each condition was then used to create grasp comfort RDMs for each participant. Grasp comfort RDMs were correlated to model RDMs to further test how strongly grasp comfort corresponded to grasp axis, grasp size, and object mass. To search for brain regions that might encode grasp comfort, the average grasp comfort RDM was correlated to neural RDMs following RSA as described above.

留言 (0)

沒有登入
gif