Computer vision in colorectal surgery: Current status and future challenges

As mainstream surgical approaches have shifted from open to endoscopic surgery, such as laparoscopic and robot-assisted surgery, patients seem to have benefitted immensely. The most obvious advantage of endoscopic surgeries is the avoidance of large open wounds or incisions. However, the greatest advantage of laparoscopic surgery for surgeons is its educational benefits. In open abdominal surgery, especially deep pelvic surgery, assistants may struggle to visualize the surgical field even if they are participating in the surgery; only the primary surgeon (and occasionally the first assistant) can visualize the surgical field. However, there are several occasions where even the primary surgeon must solely rely on fingertip sensations to proceed further with surgery. Open abdominal surgery can be likened to a “black box,” where numerous things cannot be fully understood without the primary surgeon actually operating on the patient. Even when using a ceiling-mounted camera in the operating room (OR) to capture the surgical field, the video often only shows the back of the surgeon's hands and head and not the areas that actually need to be observed. This makes it difficult to capture good-quality images that effectively convey the subtleties of open abdominal surgery.

During laparoscopic surgery, the eyes of the primary surgeon are projected onto the scope, allowing shared viewing of visual information. As a result, surgery is no longer conducted blindly and visual information is not exclusively limited to the primary surgeon. Assistants and those not directly involved in the surgery can now equally benefit and access the images. Furthermore, the importance of fingertip sensation during surgery, which was once exclusively reserved for the primary surgeon, has also diminished considerably. To date, all surgical procedures rely on visual information. These changes have made it possible to facilitate learning from anywhere and at any given time, simply by accessing intraoperative videos without the need to participate as an assistant. Initially unaware of specific procedures, locations, and distinctions between good and poor techniques, surgeons have now realized that repeated viewing of a large number of high-quality intraoperative videos would benefit them with broader knowledge.

Artificial intelligence is being increasingly used in the medical field. Using deep-learning-based computer vision, it is possible to gain easy access to videos that form the basis of image analysis. Thus, following the self-learning process where surgeons gain an understanding of surgery by repeatedly watching intraoperative videos, efforts have been made to build AI models for surgery using AI input and analyzing a vast amount of information from intraoperative videos. However, whether AI's understanding of surgery increases sufficiently and becomes useful in daily surgical practice remains unclear. Therefore, this review aims to discuss the current status and future challenges of using AI in the field of surgery, focusing on laparoscopic colorectal surgery.

In this review, “AI” specifically refers to “deep learning-based computer vision,” although the term “AI” encompasses various concepts. Fig. 1 shows the following typical AI algorithms used for image recognition tasks. Image classification involves determining the image and the meaning of the image itself; object detection identifies and locates objects within an image, enclosing them with a bounding box; semantic segmentation classifies the meaning of each pixel in the image; and instance segmentation detects the meaning and location of the recognition object, surrounds it in a bounding box, and then identifies the pixels corresponding to the recognition object within the bounding box.

The algorithm used in the AI technologies introduced in subsequent sections is one of the four algorithms mentioned above. In the field of surgery, AI technologies do not require the development of special surgical algorithms. Instead, it is important to fully understand the characteristics of the existing AI technologies and utilize them effectively to develop the most appropriate application method for the desired objective.

Each surgery consists of a single procedure. A series of smooth and continuous surgeries is the accomplishment of tasks in a step-by-step manner, without interruption or stagnation, which can be divided into defined process units, known as surgical phases or steps. A trained surgeon can easily predict the surgical phase or step by simply glancing at a single static image extracted from an intraoperative video. Image classification using AI involves predicting what is shown in an image based on a single image. Therefore, this technology can be applied to automatically recognize the surgical phase or step. By capturing a series of recognition results on a time axis, we can automatically calculate the frequency of each surgical phase or step that has been transited.1, 2, 3, 4

Monitoring the progress of all ORs is important for ensuring efficient management and logistics. For ORs that are ahead of schedule, early preparations must be made for admitting the next surgical patient. Conversely, for ORs that are behind schedule, investigating the cause of the delay, and implementing appropriate measures, such as assigning additional OR staff, may be necessary. It is time-consuming and labor-intensive for OR staff to visit all ORs, personally check the situation, and obtain information firsthand. By developing an AI-based automatic surgical phase or step recognition system, it becomes possible to automatically extract real-time information from all ORs, such as whether each surgical phase or step is completed within the standard time, whether the number of phase or step transitions is too frequent, and what causes delays in the standard progress status. Real-time sharing of essential deviations from standards by the entire OR team will greatly contribute to risk management. In addition, it can also be used as a parameter for an automatic surgical skill assessment system that differentiates between expert and novice surgeons because the number of phase or step transitions reflects surgical efficiency. In the future, the development of various software systems, such as surgical navigation systems that apply AI technology for endoscopic surgery, and surgical phase or step recognition is expected to be a necessary fundamental technology for controlling the activation timing of each system.

Recognition of surgical instruments is a fundamental technology used in various systems. For surgical phase or step recognition, the image classification algorithm assigns labels to each static image, while the object detection algorithm is suitable for simultaneously recognizing the type and rough location of surgical instruments.5 Furthermore, for more advanced requirements, such as pinpointing the position of the tip of a surgical instrument or the angle of its shaft using the recognition results, the optimal choice is the semantic segmentation algorithm, which achieves pixel-level recognition.6 However, during a typical laparoscopic colorectal surgery, four instruments are displayed on the monitor: two instruments of the primary surgeon and two instruments of the assistant. Depending on the type of surgery, multiple surgical instruments of the same type may overlap. In semantic segmentation, it is impossible to distinguish and recognize overlapping areas of the same class of objects as individual objects; thus, instance segmentation is employed in such cases.7

Surgical instrument recognition can ensure safety during surgery because most intraoperative complications, such as organ injury, tend to occur around the surgical instruments that are used to perform dissection and resection. For example, information on the type and location of surgical instruments is crucial for developing a system that alerts surgeons when they dissect or resect the wrong layer or a critical area. It could also serve as a parameter for automatic surgical skill assessment by following the trajectory of surgical instrument usage by expert and novice surgeons and analyzing the differences in their tendencies. Furthermore, for autonomous control of laparoscopes during surgery, an algorithm that recognizes the primary surgical instrument used by the primary surgeon and follows its tips is necessary.

In endoscopic surgery, when the images on the monitor are the same, the amount of information that can be extracted differs considerably between expert and novice surgeons. Although they appear to be observing the same image, they perceive different views. Expert surgeons can recognize objects that require location verification during surgery based on information, such as the morphology of the surrounding tissues, running of microvessels, and subtle variations in color tone; however, novice surgeons, whose eyes are not accustomed to such information, may find such tasks difficult. Furthermore, during surgery, especially in tense situations, attention tends to focus only on the region of interest, resulting in a narrowing of the field of vision and reduced awareness of the area outside the region of interest. Consequently, important anatomical structures and organs outside the region of interest may become unrecognizable and unintentionally injured. It is believed that most complications that occur during surgery are not technical errors resulting from a lack of skill but rather cognitive errors based on misrecognition or misunderstanding.8

When AI, which learns the image features of the anatomical structures and organs to be recognized from numerous intraoperative videos, can present the location of recognition objects during surgery in any situation, it may contribute to preventing intraoperative organ injury based on cognitive errors. Simultaneously, because the location of particular anatomical structures and organs is often used as a reference during surgery to estimate the location of other anatomical structures and organs, the presentation of the location information of recognition objects may also be an important landmark in determining the appropriate intraoperative incision line or dissection layer9, 10, 11, 12, 13, 14 (Fig. 2). Therefore, AI-based surgical anatomical recognition may provide clinical benefits as an intraoperative navigation system.

However, the human anatomy is highly variable. The amount of visceral fat and adhesions associated with a surgical history can cause significant differences in the appearance of intra-abdominal surgical fields. The redness of the surgical field caused by bleeding during surgery also hinders AI recognition. Surgical AI systems, which are expected to be commercialized, must have a robust recognition performance that can cope with various situations and individual differences. Excessive false positives not only induce visual stress for surgeons but may also mislead them regarding wrong incision directions or dissection layers. In addition, high real-time performance should be used as an intraoperative navigation system. Recognition, generalization, and real-time performance must be simultaneously enhanced, which is a challenge for commercialization and widespread use in daily surgical practice. If these challenges are addressed, surgical AI systems can be put to use quickly.

Tactile information is often used in open surgery; however, laparoscopic surgery is highly dependent on visual information. Therefore, the affinity for image navigation is higher for laparoscopic surgery than for open surgery. AI should be able to do the same through image analysis, thereby turning into a reliable tool for novice surgeons who cannot yet determine the incision location because expert surgeons can determine where to incise based on visual information.

In gastrointestinal surgeries, such as colorectal surgery, while grasping the tissue and applying countertraction with appropriate strength, a white cotton-like sparse connective tissue, known as angel hair, appears in the area to be dissected. To avoid injury to vital anatomical structures and organs, surgery proceeds by dissecting only the surrounding angel hair. Therefore, angel hair can be considered as the object to be dissected, and automatic angel hair recognition can be applied as an intraoperative navigation system15,16 (Fig. 3).

In addition, once AI can understand incision location, automation and autonomy in surgery may be achieved by moving the scalpel to the specified position coordinates. However, to achieve full autonomy in surgery, several legal and ethical issues exist; even so, using automated car driving as an example, partial automation, such as assisting with steering wheel operation and maintaining a certain distance between cars, has already been achieved and is widely used in our daily lives in many types of cars. Partial automation can reduce fatigue and stress for surgeons and may eventually benefit patients by improving the quality of surgery and reducing the incidence of postoperative complications.

The transition to laparoscopic surgery has facilitated the storage of high-quality intraoperative videos, enabling easier retrospective surgical skill assessments and providing feedback, which are important for surgical education. However, for the rater, the assessment process, which involves watching several hours of intraoperative videos from start to finish, is time consuming and labor intensive.

Although the development of an AI-based automatic surgical skill assessment system is underway,17, 18, 19, 20 the first step toward its realization is to objectively and clearly define and verbalize surgical skills.21 For software development, various AI-based image recognition approaches such as surgical phase or step recognition and surgical instrument recognition can be used as fundamental technologies. A three-dimensional convolutional neural network (3D-CNN) algorithm that analyzes videos rather than static images may be suitable for surgical skill assessment. While general CNNs are used for analyzing static images, a 3D-CNN is an algorithm that performs convolutional computation on the time axis and the vertical and horizontal axes of an image, enabling learning that considers time variation and obtains feature representations from videos.22

Automatic surgical skill assessment, a developing research field, is a difficult task even for human raters because it requires manpower, time, and effort. Therefore, it would be highly useful to exclude low- and high-skill groups substantially, even if perfect scoring is impossible.

Comments (0)

No login
gif