In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images and videos. In particular, the determination of the position and type of instruments is of great interest. Current work involves both spatial and temporal information, with the idea that predicting the movement of surgical tools over time may improve the quality of the final segmentations. The provision of publicly available datasets has recently encouraged the development of new methods, mainly based on deep learning. In this review, we identify and characterize datasets used for method development and evaluation and quantify their frequency of use in the literature. We further present an overview of the current state of research regarding the segmentation and tracking of minimally invasive surgical instruments in endoscopic images and videos. The paper focuses on methods that work purely visually, without markers of any kind attached to the instruments, considering both single-frame semantic and instance segmentation approaches, as well as those that incorporate temporal information. The publications analyzed were identified through the platforms Google Scholar, Web of Science, and PubMed. The search terms used were "instrument segmentation", "instrument tracking", "surgical tool segmentation", and "surgical tool tracking", resulting in a total of 741 articles published between 01/2015 and 07/2023, of which 123 were included using systematic selection criteria. A discussion of the reviewed literature is provided, highlighting existing shortcomings and emphasizing the available potential for future developments.
«
In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images and videos. In particular, the determination of the position and type of instruments is of great interest. Current work involves both spatial and temporal information, with the idea that predicting the movement of surgical tools over time may improve the quality of the final segmentations. The provision of p...
»