Structured light for touchless 3D registration in video-based surgical navigation

The SL system consists of a line-structured light projector and an arthroscope. The design is illustrated in Fig. 2. The laser projector is instrumented with a visual marker to enable tracking at every frame-time instant by the arthroscopic camera. Since this provides the extrinsic calibration between both SL system components, the arthroscope can go through one portal and the laser beam through the other, and both can move independently, being more versatile than existing methods.

Fig. 2figure 2

Proposed SL system setup with a standard arthroscope and laser projector instrumented with a fiducial. The relative pose between the arthroscope and the laser projector is known at every frame-time instant by tracking the laser fiducial

In the current implementation of VBSN, the surgeon rigidly attaches a marker to bone (World Marker, WM) whose pose in camera coordinates can be determined at each time instant by tracking the visual markers [16]. This enables 3D points reconstructed in different frames to be represented in the same WM coordinate system that does not move with respect to the anatomy. As depicted in Fig. 1, the final step of registration aligns the 3D points reconstructed intraoperatively with a 3D model obtained preoperatively. Refer to Fig. 1 in the supplementary material for a schematic representation of the different coordinate systems and their relationships. Our pipeline for generating the 3D model consists of segmenting bone and cartilage structures from an MRI of the patient’s joint, applying the marching cubes algorithm [11], and smoothing the resulting 3D model.

This section provides a comprehensive overview of our laser scanner and presents an automatic segmentation model for identifying the regions of interest within arthroscopic images (femur bone and cartilage) such that 3D points reconstructed in other anatomical structures are not considered during registration. Lastly, the registration algorithm employed in this work is described.

Laser scanner

The proposed SL system involves three main steps: detection of the laser contour in the arthroscopic image, calibration of the plane of light of the laser projector and 3D point reconstruction through triangulation. These steps are described below.

Detection of the laser contour

Fig. 3 depicts the main steps of the pipeline for detecting the laser projection. Firstly, the distortion of the input image is removed both for retrieving the green channel and the grayscale image. By subtracting these two images and binarizing the result, the obtained binary image provides the segmentation of the laser projection. Since the laser projection exhibits significant dispersion, it appears as blobs in the binary image. Blob detection is then performed and a PCA-based approach fits a contour to the blob by (i) extracting the direction of maximum variance, (ii) sampling the blob along that direction, and (iii) for each sample determining the midpoint of the line segment contained in the blob with perpendicular direction. The resulting contour is depicted in red in the final step of Fig. 3.

Fig. 3figure 3

Laser detection pipeline: for every input frame, distortion is removed and the resulting undistorted image is both converted to grayscale and used for retrieving the green channel. The subtraction of these two images is binarized and blob detection is performed. The central line of laser projection is detected using a PCA-based approach, yielding the detection of the laser line

Projector calibration

Preoperatively, the laser projector must be calibrated, i.e., the equation of the plane of light in laser coordinates must be estimated. To accomplish this, a setup consisting of a planar target instrumented with visual fiducials is considered. By pointing the laser into the target, a line is projected and a calibration image showing the planar target, the laser projection and the laser marker simultaneously is acquired. Both the camera and the laser projector are moved to enable the acquisition of a calibration set with distinct poses. For each calibration image, the laser line projection is detected as described previously and 3D points are reconstructed in camera coordinates by intersecting backprojection rays with the planar target. These points are then transformed to laser coordinates using its tracked pose. After all calibration images are processed, a set of 3D lines is obtained, which is given as input to a RANSAC-based [5] plane fitting algorithm. Figure 2 in the supplementary material illustrates the calibration setup.

3D point reconstruction

The basic principle of a SL system involves projecting a light pattern onto the scene and detecting it by a camera. The intersection of the light plane transformed to the camera coordinate system with the camera ray yields the 3D coordinates of the point (refer to Fig. 3 in supplementary material for a schematic representation). Performing this process for all points within the detected laser contour, we obtain a 3D contour in camera coordinates. Then, we can transform these points to the reference frame of the WM attached to the bone. By repeating this process for a set of frames, we achieve the reconstruction of a denser point cloud representing the anatomical structures. In Sect. 5.2, registration tests with data obtained by the proposed algorithm are performed.

Arthroscopic video segmentation

As previously discussed, the objective of the proposed pipeline is to register a preoperative femur bone and cartilage model with intraoperative data. However, due to the presence of other anatomical structures such as proximal tibia, the anterior and posterior cruciate ligaments and the meniscus, it is impossible to capture arthroscopic footage containing solely the structures of interest. These additional structures are not contained in the preoperative model, and, hence, should be removed from the reconstructed point set obtained with the structured light system. Therefore, we propose an automatic segmentation model designed for arthroscopic videos. The considered architecture is a standard U-Net [18] and is depicted in the supplementary material (Fig. 4). The loss function used is \(1-\text (T,P)\), where \(\text (T,P)\) is the DICE score [20] defined as

$$\begin \text (T,P)=\frac^ T_i P_i}^T_i + \sum _^P_i} \end$$

(1)

where T is the ground-truth segmentation, P is the inferred segmentation, i is a pixel, and N is the number of pixels. The DICE score is one when the inferred and ground-truth segmentations overlap perfectly, and zero when there is no overlap.

All input images are normalized using the contrast limited adaptive histogram equalization, as outlined in [14], to enhance image contrast. Furthermore, standard data augmentation, including image rotation, translation, scaling, and flipping, is applied. This step is important because the dataset is limited in size and data augmentation is an effective strategy for artificially increasing the amount of training samples [13].

Two segmentation models, the no laser augmentation (NLA) model and the laser augmentation (LA) model, were trained following identical procedures except for the training datasets. The NLA model was trained solely using the dataset described in Sect. 4.1.1, while the LA model was trained by also considering images with synthetic laser projection. This novel data augmentation technique will be described below.

Data augmentation using synthetic laser projection

For each image in the training dataset, a new image containing synthetic laser projection is generated as follows. Considering the corresponding ground-truth binary mask, a random pixel within the region of interest is initially selected. Using the registration and the camera pose relative to the image, the forward projection ray that goes through the pixel is intersected with the 3D model, yielding a 3D point. Then, a random 3D vector is chosen, that, together with the 3D point, define a 3D plane. This 3D plane corresponds to the synthetic laser plane of light and is afterward intersected with the 3D model, yielding a 3D contour. By backprojecting this contour onto the image, a synthetic laser projection is generated. Finally, the laser light dispersion is simulated by applying a Gaussian intensity distribution centered at the backprojected 2D contour. Figure 4 depicts different synthetic laser projections obtained with the described approach.

Fig. 4figure 4

Example images of the synthetic laser projections. We propose a method that takes as input arthroscopic images without any laser projection and outputs the same image with a realistic projection of the laser. These generated images are used to train the semantic segmentation model

Registration

Following the acquisition of arthroscopic footage with laser projection using the pipeline detailed in Sect. 3.1, and filtering outlier 3D points with one of the segmentation models from Sect. 3.2, we obtain a dense point cloud representing the anatomical structures. The next step involves accomplishing registration using the method presented in [15] for curve-surface registration. It is a method for global registration that automatically finds pairs of matching points in the curve and in the surface, along with their tangents and normals, for estimating the rigid transformation between the preoperative model and the patient’s anatomy. For each pair of points from the curve with associated tangents, the algorithm finds all matching pairs of points and associated normals on the surface using a set of conditions that depend on the differential information (tangents and normals). Then, an hypothesize-and-test framework finds the rigid transformation that best aligns the reconstructed curve with the preoperative surface. A final standard ICP step [2] is performed for refining the solution.

Comments (0)

No login
gif