The imminence of mistargeting a lesion is one of the main concerns in image fusion. This could be due to two reasons: (1) the lesion is usually small in size or (2) the adjacent pseudo-lesions are confused with a lesion (regenerative nodules in a cirrhotic liver). Lesions may also be missed due to their location (subphrenic or subcapsular areas) or poor conspicuity.
It is also challenging to synchronize a static image (pre-operative CT) with a dynamic image (intra-operative US) due to the patient’s breathing motion affecting the liver. The liver itself is also a dynamic organ in the sense that the periphery expands more widely than the core during the breathing cycle, including translations and rotations. There are also fewer vessels present in the periphery, making the image fusion difficult due to fewer anatomical landmarks. The patient may also be positioned differently during the two imaging procedures.
Contrast-enhanced US (CEUS), which is the use of ultrasound after injection of contrast media, may be used in combination with image fusion to alleviate some of the limitations since it has better conspicuity than the B-mode US.
Image fusion aims to determine the spatial correspondence between two image sets minimizing their difference. For instance, there are a static image St(x) and a moving image Mv(x); an optimal transform Tr(x) is determined by the image fusion algorithm that minimizes the difference between St(x) and Mv(x). The fusion algorithms can either be rigid or non-rigid; the operations such as rotation and translation are uniform in rigid image fusion so that all the pixel-to-pixel relationships remain equal even after the transformation, whereas, in the case of non-rigid fusion (also called deformable fusion), the pixel-to-pixel relationships change keeping St(x) and Mv(x) aligned on the same reference coordinate. However, a pixel in either image set may not necessarily represent the same anatomical structure. Thus, local distortions either due to self or neighbouring dynamic organs\tissues or patient breathing are bound to occur.
Several challenges can complicate the fusion accuracy; they are real-time fusion, tissues located in the abdomen/thorax, and respiratory motion causing tissue deformation. Furthermore, intra- and inter-fractional anatomical variations in the image sets can cause dissimilarities. The plain physiological changes such as tumour growth, patient weight loss, bladder filling, etc., can cause soft tissue deformation. The image fusion should not be dependent on the medical instruments in the ultrasound images.
Although the deformable fusion can manage most of these challenges, it would certainly need high computational power and time that may affect the real-time aspect. There are three key functionary blocks in a deformable fusion: the deformation model, the optimization method, and the objective function. The similarity definition between St(x) and Mv(x) is needed for the objective function that can be feature-based, intensity-based, or a combination of the two. The type of similarity metric depends on the fusion accuracy desired and the type of images along with the amplitude of misalignment. The intensity-based objective functions are most suitable for the single modality images, while the feature-based objective function needs the image feature definition that is independent of image intensity. However, it can be time-consuming and difficult to construct the features while introducing intra- and inter-observer dependencies.
From a medical standpoint, the real-time US-CT fusion is certainly challenging, which is due to the fact that the information in the modalities originates from different physical processes and properties. The changes in the acoustic impedance, various artefacts, and speckle noise are all included in US images, whereas CT uses X-ray attenuation. With regard to image quality, US images are relatively limited with image information due to overlying structures and subcutaneous fat or gas-containing organs. Additionally, the US images are acquired in arbitrary planes.
The fusion outcome may be better visualized by using augmented reality (AR), whereas there is a substantial drop in the cost incurred [56]. Although virtual reality is another option, it is not appropriate to be used since it isolates the clinicians from the surroundings. AR has the ability to integrate real objects with virtual ones in a real environment. Importantly, since the virtual and the real objects are aligned and run interactively, AR could be more advantageous for image-guided systems. Thus, operating on HCCs could also be made easier by using AR, where the lesion is projected onto the patient’s liver at the exact location and depth. We have provided a possible workflow as in Fig. 3, which could be explored further. We believe that this solution could improve the lesion visualization indirectly reducing the patient’s exposure to radiation. Furthermore, it could also be time- and cost-effective, and much more clinically applicable.
The system overview can further be imagined as shown in Fig. 4. This system aims at fusing pre-operative CT images with intra-operative US images for hepatobiliary procedures in Interventional Radiology with a complete visualization in augmented reality (AR). As described in Fig. 4, the pre-operative CT images of the same patient can be retrieved from the picture archiving and communication system (PACS) server and US images can be sent from the ultrasound machine to the Image Fusion system in real time. The Image Fusion PC can run a set of image processing algorithms (including image segmentation [40,41,42,43] and image registration [57]) rendering the processed 3D images via augmented reality (AR) providing an enhanced visualization to the clinician.
Figure 4 describes the hardware architecture of the entire system. The system mainly consists of the following major hardware components: fusion PC (personal computer) is the core of the system. This PC is responsible for communicating with all other hardware components and running all software components in it, including querying CT images from PACS, receiving real-time ultrasound images from the ultrasound scanner, running all image processing algorithms, receiving scanner position via optical marker, and running AR rendering server to render AR images via AR device. The PACS server is responsible for storing DICOM CT images, whereas the ultrasound scanner is responsible for acquiring real-time ultrasound images of the patient and sending those images to the fusion PC. Finally, Hololens renders the processed images in augmented reality. In this setup, the optical tracker tracks the position of the body part being scanned in the 3D space. The proposed system can certainly be helpful, if some concerns are properly addressed: (1) most of the ultrasound machines available in the hospitals do not have provision to export raw data (videos), say to the computer; (2) latency at the computer in receiving the raw data from the US machine; (3) accuracy of real-time fusion; (4) privacy in using AR; and (5) adaptation of an AR-based system by the clinicians, who are not technology friendly.