Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 22 January 2024

Segment anything in medical images

  • Jun Ma 1 , 2 , 3 ,
  • Yuting He 4 ,
  • Feifei Li   ORCID: orcid.org/0000-0002-4004-4134 1 ,
  • Lin Han 5 ,
  • Chenyu You   ORCID: orcid.org/0000-0001-8365-7822 6 &
  • Bo Wang   ORCID: orcid.org/0000-0002-9620-3413 1 , 2 , 3 , 7 , 8  

Nature Communications volume  15 , Article number:  654 ( 2024 ) Cite this article

109k Accesses

161 Citations

269 Altmetric

Metrics details

  • Computer science
  • Machine learning
  • Medical imaging

Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often tailored to specific modalities or disease types, lack generalizability across the diverse spectrum of medical image segmentation tasks. Here we present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. We conduct a comprehensive evaluation on 86 internal validation tasks and 60 external validation tasks, demonstrating better accuracy and robustness than modality-wise specialist models. By delivering accurate and efficient segmentation across a wide spectrum of tasks, MedSAM holds significant potential to expedite the evolution of diagnostic tools and the personalization of treatment plans.

Similar content being viewed by others

research papers on medical image processing

The Medical Segmentation Decathlon

research papers on medical image processing

Annotation-efficient deep learning for automatic medical image segmentation

research papers on medical image processing

Shifting to machine supervision: annotation-efficient semi and self-supervised learning for automatic medical image segmentation and classification

Introduction.

Segmentation is a fundamental task in medical imaging analysis, which involves identifying and delineating regions of interest (ROI) in various medical images, such as organs, lesions, and tissues 1 . Accurate segmentation is essential for many clinical applications, including disease diagnosis, treatment planning, and monitoring of disease progression 2 , 3 . Manual segmentation has long been the gold standard for delineating anatomical structures and pathological regions, but this process is time-consuming, labor-intensive, and often requires a high degree of expertise. Semi- or fully automatic segmentation methods can significantly reduce the time and labor required, increase consistency, and enable the analysis of large-scale datasets 4 .

Deep learning-based models have shown great promise in medical image segmentation due to their ability to learn intricate image features and deliver accurate segmentation results across a diverse range of tasks, from segmenting specific anatomical structures to identifying pathological regions 5 . However, a significant limitation of many current medical image segmentation models is their task-specific nature. These models are typically designed and trained for a specific segmentation task, and their performance can degrade significantly when applied to new tasks or different types of imaging data 6 . This lack of generality poses a substantial obstacle to the wider application of these models in clinical practice. In contrast, recent advances in the field of natural image segmentation have witnessed the emergence of segmentation foundation models, such as segment anything model (SAM) 7 and Segment Everything Everywhere with Multi-modal prompts all at once 8 , showcasing remarkable versatility and performance across various segmentation tasks.

There is a growing demand for universal models in medical image segmentation: models that can be trained once and then applied to a wide range of segmentation tasks. Such models would not only exhibit heightened versatility in terms of model capacity but also potentially lead to more consistent results across different tasks. However, the applicability of the segmentation foundation models (e.g., SAM 7 ) to medical image segmentation remains limited due to the significant differences between natural images and medical images. Essentially, SAM is a promptable segmentation method that requires points or bounding boxes to specify the segmentation targets. This resembles conventional interactive segmentation methods 4 , 9 , 10 , 11 but SAM has better generalization ability, while existing deep learning-based interactive segmentation methods focus mainly on limited tasks and image modalities.

Many studies have applied the out-of-the-box SAM models to typical medical image segmentation tasks 12 , 13 , 14 , 15 , 16 , 17 and other challenging scenarios 18 , 19 , 20 , 21 . For example, the concurrent studies 22 , 23 conducted a comprehensive assessment of SAM across a diverse array of medical images, underscoring that SAM achieved satisfactory segmentation outcomes primarily on targets characterized by distinct boundaries. However, the model exhibited substantial limitations in segmenting typical medical targets with weak boundaries or low contrast. In congruence with these observations, we further introduce MedSAM, a refined foundation model that significantly enhances the segmentation performance of SAM on medical images. MedSAM accomplishes this by fine-tuning SAM on an unprecedented dataset with more than one million medical image-mask pairs.

We thoroughly evaluate MedSAM through comprehensive experiments on 86 internal validation tasks and 60 external validation tasks, spanning a variety of anatomical structures, pathological conditions, and medical imaging modalities. Experimental results demonstrate that MedSAM consistently outperforms the state-of-the-art (SOTA) segmentation foundation model 7 , while achieving performance on par with, or even surpassing specialist models 1 , 24 that were trained on the images from the same modality. These results highlight the potential of MedSAM as a new paradigm for versatile medical image segmentation.

MedSAM: a foundation model for promptable medical image segmentation

MedSAM aims to fulfill the role of a foundation model for universal medical image segmentation. A crucial aspect of constructing such a model is the capacity to accommodate a wide range of variations in imaging conditions, anatomical structures, and pathological conditions. To address this challenge, we curated a diverse and large-scale medical image segmentation dataset with 1,570,263 medical image-mask pairs, covering 10 imaging modalities, over 30 cancer types, and a multitude of imaging protocols (Fig.  1 and Supplementary Tables  1 – 4) . This large-scale dataset allows MedSAM to learn a rich representation of medical images, capturing a broad spectrum of anatomies and lesions across different modalities. Figure  2 a provides an overview of the distribution of images across different medical imaging modalities in the dataset, ranked by their total numbers. It is evident that computed tomography (CT), magnetic resonance imaging (MRI), and endoscopy are the dominant modalities, reflecting their ubiquity in clinical practice. CT and MRI images provide detailed cross-sectional views of 3D body structures, making them indispensable for non-invasive diagnostic imaging. Endoscopy, albeit more invasive, enables direct visual inspection of organ interiors, proving invaluable for diagnosing gastrointestinal and urological conditions. Despite the prevalence of these modalities, others such as ultrasound, pathology, fundus, dermoscopy, mammography, and optical coherence tomography (OCT) also hold significant roles in clinical practice. The diversity of these modalities and their corresponding segmentation targets underscores the necessity for universal and effective segmentation models capable of handling the unique characteristics associated with each modality.

figure 1

The dataset covers a variety of anatomical structures, pathological conditions, and medical imaging modalities. The magenta contours and mask overlays denote the expert annotations and MedSAM segmentation results, respectively.

figure 2

a The number of medical image-mask pairs in each modality. b MedSAM is a promptable segmentation method where users can use bounding boxes to specify the segmentation targets. Source data are provided as a Source Data file.

Another critical consideration is the selection of the appropriate segmentation prompt and network architecture. While the concept of fully automatic segmentation foundation models is enticing, it is fraught with challenges that make it impractical. One of the primary challenges is the variability inherent in segmentation tasks. For example, given a liver cancer CT image, the segmentation task can vary depending on the specific clinical scenario. One clinician might be interested in segmenting the liver tumor, while another might need to segment the entire liver and surrounding organs. Additionally, the variability in imaging modalities presents another challenge. Modalities such as CT and MR generate 3D images, whereas others like X-ray and ultrasound yield 2D images. These variabilities in task definition and imaging modalities complicate the design of a fully automatic model capable of accurately anticipating and addressing the diverse requirements of different users.

Considering these challenges, we argue that a more practical approach is to develop a promptable 2D segmentation model. The model can be easily adapted to specific tasks based on user-provided prompts, offering enhanced flexibility and adaptability. It is also able to handle both 2D and 3D images by processing 3D images as a series of 2D slices. Typical user prompts include points and bounding boxes and we show some segmentation examples with the different prompts in Supplementary Fig.  1 . It can be found that bounding boxes provide a more unambiguous spatial context for the region of interest, enabling the algorithm to more precisely discern the target area. This stands in contrast to point-based prompts, which can introduce ambiguity, particularly when proximate structures resemble each other. Moreover, drawing a bounding box is efficient, especially in scenarios involving multi-object segmentation. We follow the network architecture in SAM 7 , including an image encoder, a prompt encoder, and a mask decoder (Fig.  2 b). The image encoder 25 maps the input image into a high-dimensional image embedding space. The prompt encoder transforms the user-drawn bounding boxes into feature representations via positional encoding 26 . Finally, the mask decoder fuses the image embedding and prompt features using cross-attention 27 (Methods).

Quantitative and qualitative analysis

We evaluated MedSAM through both internal validation and external validation. Specifically, we compared it to the SOTA segmentation foundation model SAM 7 as well as modality-wise specialist U-Net 1 and DeepLabV3+ 24 models. Each specialized model was trained on images from the corresponding modality, resulting in 10 dedicated specialist models for each method. During inference, these specialist models were used to segment the images from corresponding modalities, while SAM and MedSAM were employed for segmenting images across all modalities (Methods). The internal validation contained 86 segmentation tasks (Supplementary Tables  5 – 8 and Fig.  2) , and Fig.  3 a shows the median dice similarity coefficient (DSC) score of these tasks for the four methods. Overall, SAM obtained the lowest performance on most segmentation tasks although it performed promisingly on some RGB image segmentation tasks, such as polyp (DSC: 91.3%, interquartile range (IQR): 81.2–95.1%) segmentation in endoscopy images. This could be attributed to SAM’s training on a variety of RGB images, and the fact that many targets in these images are relatively straightforward to segment due to their distinct appearances. The other three models outperformed SAM by a large margin and MedSAM has a narrower distribution of DSC scores of the 86 interval validation tasks than the two groups of specialist models, reflecting the robustness of MedSAM across different tasks. We further connected the DSC scores corresponding to the same task of the four models with the podium plot Fig.  3 b, which is complementary to the box plot. In the upper part, each colored dot denotes the median DSC achieved with the respective method on one task. Dots corresponding to identical test cases are connected by a line. In the lower part, the frequency of achieved ranks for each method is presented with bar charts. It can be found that MedSAM ranked in first place on most tasks, surpassing the performance of the U-Net and DeepLabV3+ specialist models that have a high frequency of ranks with second and third places, respectively, In contrast, SAM ranked last place in almost all tasks. Figure  3 c (and Supplementary Fig.  9) visualizes some randomly selected segmentation examples where MedSAM obtained a median DSC score, including liver tumor in CT images, brain tumor in MR images, breast tumor in ultrasound images, and polyp in endoscopy images. SAM struggles with targets of weak boundaries, which is prone to under or over-segmentation errors. In contrast, MedSAM can accurately segment a wide range of targets across various imaging conditions, which achieves comparable of even better than the specialist U-Net and DeepLabV3+ models.

figure 3

a Performance distribution of 86 internal validation tasks in terms of median dice similarity coefficient (DSC) score. The center line within the box represents the median value, with the bottom and top bounds of the box delineating the 25th and 75th percentiles, respectively. Whiskers are chosen to show the 1.5 of the interquartile range. Up-triangles denote the minima and down-triangles denote the maxima. b Podium plots for visualizing the performance correspondence of 86 internal validation tasks. Upper part: each colored dot denotes the median DSC achieved with the respective method on one task. Dots corresponding to identical tasks are connected by a line. Lower part: bar charts represent the frequency of achieved ranks for each method. MedSAM ranks in the first place on most tasks. c Visualized segmentation examples on the internal validation set. The four examples are liver cancer, brain cancer, breast cancer, and polyp in computed tomography (CT), (Magnetic Resonance Imaging) MRI, ultrasound, and endoscopy images, respectively. Blue: bounding box prompts; Yellow: segmentation results. Magenta: expert annotations. Source data are provided as a Source Data file.

The external validation included 60 segmentation tasks, all of which either were from new datasets or involved unseen segmentation targets (Supplementary Tables  9 – 11 and Figs.  10 – 12) . Figure  4 a, b show the task-wise median DSC score distribution and their correspondence of the 60 tasks, respectively. Although SAM continued exhibiting lower performance on most CT and MR segmentation tasks, the specialist models no longer consistently outperformed SAM (e.g., right kidney segmentation in MR T1-weighted images: 90.1%, 85.3%, 86.4% for SAM, U-Net, and DeepLabV3+, respectively). This indicates the limited generalization ability of such specialist models on unseen targets. In contrast, MedSAM consistently delivers superior performance. For example, MedSAM obtained median DSC scores of 87.8% (IQR: 85.0-91.4%) on the nasopharynx cancer segmentation task, demonstrating 52.3%, 15.5%, and 22.7 improvements over SAM, the specialist U-Net, and DeepLabV3+, respectively. Significantly, MedSAM also achieved better performance in some unseen modalities (e.g., abdomen T1 Inphase and Outphase), surpassing SAM and the specialist models with improvements by up to 10%. Figure  4 c presents four randomly selected segmentation examples for qualitative evaluation, revealing that while all the methods have the ability to handle simple segmentation targets, MedSAM performs better at segmenting challenging targets with indistinguishable boundaries, such as cervical cancer in MR images (more examples are presented in Supplementary Fig.  13) . Furthermore, we evaluated MedSAM on the multiple myeloma plasma cell dataset, which represents a distinct modality and task in contrast to all previously leveraged validation tasks. Although this task had never been seen during training, MedSAM still exhibited superior performance compared to the SAM (Supplementary Fig.  14) , highlighting its remarkable generalization ability.

figure 4

a Performance distribution of 60 external validation tasks in terms of median dice similarity coefficient (DSC) score. The center line within the box represents the median value, with the bottom and top bounds of the box delineating the 25th and 75th percentiles, respectively. Whiskers are chosen to show the 1.5 of the interquartile range. Up-triangles denote the minima and down-triangles denote the maxima. b Podium plots for visualizing the performance correspondence of 60 external validation tasks. Upper part: each colored dot denotes the median DSC achieved with the respective method on one task. Dots corresponding to identical tasks are connected by a line. Lower part: bar charts represent the frequency of achieved ranks for each method. MedSAM ranks in the first place on most tasks. c Visualized segmentation examples on the external validation set. The four examples are the lymph node, cervical cancer, fetal head, and polyp in CT, MR, ultrasound, and endoscopy images, respectively. Source data are provided as a Source Data file.

The effect of training dataset size

We also investigated the effect of varying dataset sizes on MedSAM’s performance because the training dataset size has been proven to be pivotal in model performance 28 . We additionally trained MedSAM on two different dataset sizes: 10,000 (10K) and 100,000 (100K) images and their performances were compared with the default MedSAM model. The 10K and 100K training images were uniformly sampled from the whole training set, to maintain data diversity. As shown in (Fig.  5 a) (Supplementary Tables  12 – 14) , the performance adhered to the scaling rule, where increasing the number of training images significantly improved the performance in both internal and external validation sets.

figure 5

a Scaling up the training image size to one million can significantly improve the model performance on both internal and external validation sets. b MedSAM can be used to substantially reduce the annotation time cost. Source data are provided as a Source Data file.

MedSAM can improve the annotation efficiency

Furthermore, we conducted a human annotation study to assess the time cost of two pipelines (Methods). For the first pipeline, two human experts manually annotate 3D adrenal tumors in a slice-by-slice way. For the second pipeline, the experts first drew the long and short tumor axes with the linear marker (initial marker) every 3-10 slices, which is a common practice in tumor response evaluation. Then, MedSAM was used to segment the tumors based on these sparse linear annotations. Finally, the expert manually revised the segmentation results until they were satisfied. We quantitatively compared the annotation time cost between the two pipelines (Fig.  5 b). The results demonstrate that with the assistance of MedSAM, the annotation time is substantially reduced by 82.37% and 82.95% for the two experts, respectively.

We introduce MedSAM, a deep learning-powered foundation model designed for the segmentation of a wide array of anatomical structures and lesions across diverse medical imaging modalities. MedSAM is trained on a meticulously assembled large-scale dataset comprised of over one million medical image-mask pairs. Its promptable configuration strikes an optimal balance between automation and customization, rendering MedSAM a versatile tool for universal medical image segmentation.

Through comprehensive evaluations encompassing both internal and external validation, MedSAM has demonstrated substantial capabilities in segmenting a diverse array of targets and robust generalization abilities to manage new data and tasks. Its performance not only significantly exceeds that of existing the state-of-the-art segmentation foundation model, but also rivals or even surpasses specialist models. By providing precise delineation of anatomical structures and pathological regions, MedSAM facilitates the computation of various quantitative measures that serve as biomarkers. For instance, in the field of oncology, MedSAM could play a crucial role in accelerating the 3D tumor annotation process, enabling subsequent calculations of tumor volume, which is a critical biomarker 29 for assessing disease progression and response to treatment. Additionally, MedSAM provides a successful paradigm for adapting natural image foundation models to new domains, which can be further extended to biological image segmentation 30 , such as cell segmentation in light microscopy images 31 and organelle segmentation in electron microscopy images 32 .

While MedSAM boasts strong capabilities, it does present certain limitations. One such limitation is the modality imbalance in the training set, with CT, MRI, and endoscopy images dominating the dataset. This could potentially impact the model’s performance on less-represented modalities, such as mammography. Another limitation is its difficulty in the segmentation of vessel-like branching structures because the bounding box prompt can be ambiguous in this setting. For example, arteries and veins share the same bounding box in eye fundus images. However, these limitations do not diminish MedSAM’s utility. Since MedSAM has learned rich and representative medical image features from the large-scale training set, it can be fine-tuned to effectively segment new tasks from less-represented modalities or intricate structures like vessels.

In conclusion, this study highlights the feasibility of constructing a single foundation model capable of managing a multitude of segmentation tasks, thereby eliminating the need for task-specific models. MedSAM, as the inaugural foundation model in medical image segmentation, holds great potential to accelerate the advancement of new diagnostic and therapeutic tools, and ultimately contribute to improved patient care 33 .

Dataset curation and pre-processing

We curated a comprehensive dataset by collating images from publicly available medical image segmentation datasets, which were obtained from various sources across the internet, including the Cancer Imaging Archive (TCIA) 34 , Kaggle, Grand-Challenge, Scientific Data, CodaLab, and segmentation challenges in the Medical Image Computing and Computer Assisted Intervention Society (MICCAI). All the datasets provided segmentation annotations by human experts, which have been widely used in existing literature (Supplementary Table  1 – 4) . We incorporated these annotations directly for both model development and validation.

The original 3D datasets consisted of computed tomography (CT) and magnetic resonance (MR) images in DICOM, nrrd, or mhd formats. To ensure uniformity and compatibility with developing medical image deep learning models, we converted the images to the widely used NifTI format. Additionally, grayscale images (such as X-Ray and Ultrasound) as well as RGB images (including endoscopy, dermoscopy, fundus, and pathology images), were converted to the png format. Several exclusive criteria are applied to improve the dataset quality and consistency, including incomplete images and segmentation targets with branching structures, inaccurate annotations, and tiny volumes. Notably, image intensities varied significantly across different modalities. For instance, CT images had intensity values ranging from -2000 to 2000, while MR images exhibited a range of 0 to 3000. In endoscopy and ultrasound images, intensity values typically spanned from 0 to 255. To facilitate stable training, we performed intensity normalization across all images, ensuring they shared the same intensity range.

For CT images, we initially normalized the Hounsfield units using typical window width and level values. The employed window width and level values for soft tissues, lung, and brain are (W:400, L:40), (W:1500, L:-160), and (W:80, L:40), respectively. Subsequently, the intensity values were rescaled to the range of [0, 255]. For MR, X-ray, ultrasound, mammography, and optical coherence tomography (OCT) images, we clipped the intensity values to the range between the 0.5th and 99.5th percentiles before rescaling them to the range of [0, 255]. Regarding RGB images (e.g., endoscopy, dermoscopy, fundus, and pathology images), if they were already within the expected intensity range of [0, 255], their intensities remained unchanged. However, if they fell outside this range, we utilized max-min normalization to rescale the intensity values to [0, 255]. Finally, to meet the model’s input requirements, all images were resized to a uniform size of 1024 × 1024 × 3. In the case of whole-slide pathology images, patches were extracted using a sliding window approach without overlaps. The patches located on boundaries were padded to this size with 0. As for 3D CT and MR images, each 2D slice was resized to 1024 × 1024, and the channel was repeated three times to maintain consistency. The remaining 2D images were directly resized to 1024 × 1024 × 3. Bi-cubic interpolation was used for resizing images, while nearest-neighbor interpolation was applied for resizing masks to preserve their precise boundaries and avoid introducing unwanted artifacts. These standardization procedures ensured uniformity and compatibility across all images and facilitated seamless integration into the subsequent stages of the model training and evaluation pipeline.

Network architecture

The network utilized in this study was built on transformer architecture 27 , which has demonstrated remarkable effectiveness in various domains such as natural language processing and image recognition tasks 25 . Specifically, the network incorporated a vision transformer (ViT)-based image encoder responsible for extracting image features, a prompt encoder for integrating user interactions (bounding boxes), and a mask decoder that generated segmentation results and confidence scores using the image embedding, prompt embedding, and output token.

To strike a balance between segmentation performance and computational efficiency, we employed the base ViT model as the image encoder since extensive evaluation indicated that larger ViT models, such as ViT Large and ViT Huge, offered only marginal improvements in accuracy 7 while significantly increasing computational demands. Specifically, the base ViT model consists of 12 transformer layers 27 , with each block comprising a multi-head self-attention block and a Multilayer Perceptron (MLP) block incorporating layer normalization 35 . Pre-training was performed using masked auto-encoder modeling 36 , followed by fully supervised training on the SAM dataset 7 . The input image (1024 × 1024 × 3) was reshaped into a sequence of flattened 2D patches with the size 16 × 16 × 3, yielding a feature size in image embedding of 64 × 64 after passing through the image encoder, which is 16 × downscaled. The prompt encoders mapped the corner point of the bounding box prompt to 256-dimensional vectorial embeddings 26 . In particular, each bounding box was represented by an embedding pair of the top-left corner point and the bottom-right corner point. To facilitate real-time user interactions once the image embedding had been computed, a lightweight mask decoder architecture was employed. It consists of two transformer layers 27 for fusing the image embedding and prompt encoding, and two transposed convolutional layers to enhance the embedding resolution to 256 × 256. Subsequently, the embedding underwent sigmoid activation, followed by bi-linear interpolations to match the input size.

Training protocol and experimental setting

During data pre-processing, we obtained 1,570,263 medical image-mask pairs for model development and validation. For internal validation, we randomly split the dataset into 80%, 10%, and 10% as training, tuning, and validation, respectively. Specifically, for modalities where within-scan continuity exists, such as CT and MRI, and modalities where continuity exists between consecutive frames, we performed the data splitting at the 3D scan and the video level respectively, by which any potential data leak was prevented. For pathology images, recognizing the significance of slide-level cohesiveness, we first separated the whole-slide images into distinct slide-based sets. Then, each slide was divided into small patches with a fixed size of 1024 × 1024. This setup allowed us to monitor the model’s performance on the tuning set and adjust its parameters during training to prevent overfitting. For the external validation, all datasets were held out and did not appear during model training. These datasets provide a stringent test of the model’s generalization ability, as they represent new patients, imaging conditions, and potentially new segmentation tasks that the model has not encountered before. By evaluating the performance of MedSAM on these unseen datasets, we can gain a realistic understanding of how MedSAM is likely to perform in real-world clinical settings, where it will need to handle a wide range of variability and unpredictability in the data. The training and validation are independent.

The model was initialized with the pre-trained SAM model with the ViT-Base model. We fixed the prompt encoder since it can already encode the bounding box prompt. All the trainable parameters in the image encoder and mask decoder were updated during training. Specifically, the number of trainable parameters for the image encoder and mask decoder are 89,670,912 and 4,058,340, respectively. The bounding box prompt was simulated from the expert annotations with a random perturbation of 0-20 pixels. The loss function is the unweighted sum between dice loss and cross-entropy loss, which has been proven to be robust in various segmentation tasks 1 . The network was optimized by AdamW 37 optimizer ( β 1 = 0.9, β 2 = 0.999) with an initial learning rate of 1e-4 and a weight decay of 0.01. The global batch size was 160 and data augmentation was not used. The model was trained on 20 A100 (80G) GPUs with 150 epochs and the last checkpoint was selected as the final model.

Furthermore, to thoroughly evaluate the performance of MedSAM, we conducted comparative analyses against both the state-of-the-art segmentation foundation model SAM 7 and specialist models (i.e., U-Net 1 and DeepLabV3+ 24 ). The training images contained 10 modalities: CT, MR, chest X-ray (CXR), dermoscopy, endoscopy, ultrasound, mammography, OCT, and pathology, and we trained the U-Net and DeepLabV3+ specialist models for each modality. There were 20 specialist models in total and the number of corresponding training images was presented in Supplementary Table  5 . We employed the nnU-Net to conduct all U-Net experiments, which can automatically configure the network architecture based on the dataset properties. In order to incorporate the bounding box prompt into the model, we transformed the bounding box into a binary mask and concatenated it with the image as the model input. This function was originally supported by nnU-Net in the cascaded pipeline, which has demonstrated increased performance in many segmentation tasks by using the binary mask as an additional channel to specify the target location. The training settings followed the default configurations of 2D nnU-Net. Each model was trained on one A100 GPU with 1000 epochs and the last checkpoint was used as the final model. The DeepLabV3+ specialist models used ResNet50 38 as the encoder. Similar to ref. 3 , the input images were resized to 224 × 224 × 3. The bounding box was transformed into a binary mask as an additional input channel to provide the object location prompt. Segmentation Models Pytorch (0.3.3) 39 was used to perform training and inference for all the modality-wise specialist DeepLabV3 + models. Each modality-wise model was trained on one A100 GPU with 500 epochs and the last checkpoint was used as the final model. During the inference phase, SAM and MedSAM were used to perform segmentation across all modalities with a single model. In contrast, the U-Net and DeepLabV3+ specialist models were used to individually segment the respective corresponding modalities.

A task-specific segmentation model might outperform a modality-based one for certain applications. Since U-Net obtained better performance than DeepLabV3+ on most tasks, we further conducted a comparison study by training task-specific U-Net models on four representative tasks, including liver cancer segmentation in CT scans, abdominal organ segmentation in MR scans, nerve cancer segmentation in ultrasound, and polyp segmentation in endoscopy images. The experiments included both internal validation and external validation. For internal validation, we adhered to the default data splits, using them to train the task-specific U-Net models and then evaluate their performance on the corresponding validation set. For external validation, the trained U-Net models were evaluated on new datasets from the same modality or segmentation targets. In all these experiments, MedSAM was directly applied to the validation sets without additional fine-tuning. As shown in Supplementary Fig.  15 , while task-specific U-Net models often achieved great results on internal validation sets, their performance diminished significantly for external sets. In contrast, MedSAM maintained consistent performance across both internal and external validation sets. This underscores MedSAM’s superior generalization ability, making it a versatile tool in a variety of medical image segmentation tasks.

Loss function

We used the unweighted sum between cross-entropy loss and dice loss 40 as the final loss function since it has been proven to be robust across different medical image segmentation tasks 41 . Specifically, let S ,  G denote the segmentation result and ground truth, respectively. s i ,  g i denotes the predicted segmentation and ground truth of voxel i , respectively. N is the number of voxels in the image I . Binary cross-entropy loss is defined by

and dice loss is defined by

The final loss L is defined by

Human annotation study

The objective of the human annotation study was to quantitatively evaluate how MedSAM can reduce the annotation time cost. Specifically, we used the recent adrenocortical carcinoma CT dataset 34 , 42 , 43 , where the segmentation target, adrenal tumor, was neither part of the training nor of the existing validation sets. We randomly sampled 10 cases, comprising a total of 733 tumor slices requiring annotations. Two human experts participated in this study, both of whom are experienced radiologists with 8 and 6 years of clinical practice in abdominal diseases, respectively. Each expert generated two groups of annotations, one with the assistance of MedSAM and one without.

In the first group, the experts manually annotated the 3D adrenal tumor in a slice-by-slice manner. Annotations by the two experts were conducted independently, with no collaborative discussions, and the time taken for each case was recorded. In the second group, annotations were generated after one week of cooling period. The experts independently drew the long and short tumor axes as initial markers, which is a common practice in tumor response evaluation. This process was executed every 3-10 slices from the top slice to the bottom slice of the tumor. Then, we applied MedSAM to segment the tumors based on these sparse linear annotations, including three steps.

Step 1. For each annotated slice, a rectangle binary mask was generated based on the linear label that can completely cover the linear label.

Step 2. For the unlabeled slices, the rectangle binary masks were created through interpolation of the surrounding labeled slices.

Step 3. We transformed the binary masks into bounding boxes and then fed them along with the images into MedSAM to generate segmentation results.

All these steps were conducted in an automatic way and the model running time was recorded for each case. Finally, human experts manually refined the segmentation results until they met their satisfaction. To summarize, the time cost of the second group of annotations contained three parts: initial markers, MedSAM inference, and refinement. All the manual annotation processes were based on ITK-SNAP 44 , an open-source software designed for medical image visualization and annotation.

Evaluation metrics

We followed the recommendations in Metrics Reloaded 45 and used the dice similarity coefficient and normalized surface distance (NSD) to quantitatively evaluate the segmentation results. DSC is a region-based segmentation metric, aiming to evaluate the region overlap between expert annotation masks and segmentation results, which is defined by

NSD 46 is a boundary-based metric, aiming to evaluate the boundary consensus between expert annotation masks and segmentation results at a given tolerance, which is defined by

where \({B}_{\partial G}^{(\tau )}=\{x\in {R}^{3}\,| \,\exists \tilde{x}\in \partial G,\,| | x-\tilde{x}| | \le \tau \}\) , \({B}_{\partial S}^{(\tau )}=\{x\in {R}^{3}\,| \,\exists \tilde{x}\in \partial S,\,| | x-\tilde{x}| | \le \tau \}\) denote the border region of the expert annotation mask and the segmentation surface at tolerance τ , respectively. In this paper, we set the tolerance τ as 2.

Statistical analysis

To statistically analyze and compare the performance of the aforementioned four methods (MedSAM, SAM, U-Net, and DeepLabV3+ specialist models), we employed the Wilcoxon signed-rank test. This non-parametric test is well-suited for comparing paired samples and is particularly useful when the data does not meet the assumptions of normal distribution. This analysis allowed us to determine if any method demonstrated statistically superior segmentation performance compared to the others, providing valuable insights into the comparative effectiveness of the evaluated methods. The Wilcoxon signed-rank test results are marked on the DSC and NSD score tables (Supplementary Table  6 – 11) .

Software utilized

All code was implemented in Python (3.10) using Pytorch (2.0) as the base deep learning framework. We also used several Python packages for data analysis and results visualization, including connected-components-3d (3.10.3), SimpleITK (2.2.1), nibabel (5.1.0), torchvision (0.15.2), numpy (1.24.3), scikit-image (0.20.0), scipy (1.10.1), and pandas (2.0.2), matplotlib (3.7.1), opencv-python (4.8.0), ChallengeR (1.0.5), and plotly (5.15.0). Biorender was used to create Fig.  1 .

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The training and validating datasets used in this study are available in the public domain and can be downloaded via the links provided in Supplementary Tables  16 and 17 . Source data are provided with this paper in the Source Data file. We confirmed that All the image datasets in this study are publicly accessible and permitted for research purposes.  Source data are provided in this paper.

Code availability

The training script, inference script, and trained model have been publicly available at https://github.com/bowang-lab/MedSAM . A permanent version is released on Zenodo 47 .

Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Method. 18 , 203–211 (2021).

De Fauw, J. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24 , 1342–1350 (2018).

Ouyang, D. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580 , 252–256 (2020).

Wang, G. Deepigeos: a deep interactive geodesic framework for medical image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence 41 , 1559–1572 (IEEE, 2018).

Antonelli, M. The medical segmentation decathlon. Nat. Commun. 13 , 4128 (2022).

Minaee, S. Image segmentation using deep learning: A survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence 44 , 3523–3542 (IEEE, 2021).

Kirillov, A. et al. Segment anything. In IEEE International Conference on Computer Vision. 4015–4026 (IEEE, 2023).

Zou, X. et al. Segment everything everywhere all at once. In Advances in Neural Information Processing Systems (MIT Press, 2023).

Wang, G. Interactive medical image segmentation using deep learning with image-specific fine tuning. In IEEE Transactions on Medical Imaging 37 , 1562–1573 (IEEE, 2018).

Zhou, T. Volumetric memory network for interactive medical image segmentation. Med. Image Anal. 83 , 102599 (2023).

Luo, X. Mideepseg: Minimally interactive segmentation of unseen objects from medical images using deep learning. Med. Image Anal. 72 , 102102 (2021).

Deng, R. et al. Segment anything model (SAM) for digital pathology: assess zero-shot segmentation on whole slide imaging. Preprint at https://arxiv.org/abs/2304.04155 (2023).

Hu, C., Li, X. When SAM meets medical images: an investigation of segment anything model (SAM) on multi-phase liver tumor segmentation. Preprint at https://arxiv.org/abs/2304.08506 (2023).

He, S., Bao, R., Li, J., Grant, P.E., Ou, Y. Accuracy of segment-anything model (SAM) in medical image segmentation tasks. Preprint at https://doi.org/10.48550/arXiv.2304.09324 (2023).

Roy, S. et al. SAM.MD: zero-shot medical image segmentation capabilities of the segment anything model. Preprint at https://arxiv.org/abs/2304.05396 (2023).

Zhou, T., Zhang, Y., Zhou, Y., Wu, Y. & Gong, C. Can SAM segment polyps? Preprint at https://arxiv.org/abs/2304.07583 (2023).

Mohapatra, S., Gosai, A., Schlaug, G. Sam vs bet: a comparative study for brain extraction and segmentation of magnetic resonance images using deep learning. Preprint at https://arxiv.org/abs/2304.04738 (2023).

Chen, J., Bai, X. Learning to" segment anything" in thermal infrared images through knowledge distillation with a large scale dataset SATIR. Preprint at https://arxiv.org/abs/2304.07969 (2023).

Tang, L., Xiao, H., Li, B. Can SAM segment anything? when SAM meets camouflaged object detection. Preprint at https://arxiv.org/abs/2304.04709 (2023).

Ji, G.-P. et al. SAM struggles in concealed scenes–empirical study on” segment anything”. Science China Information Sciences. 66 , 226101 (2023).

Ji, W., Li, J., Bi, Q., Li, W., Cheng, L. Segment anything is not always perfect: an investigation of SAM on different real-world applications. Preprint at https://arxiv.org/abs/2304.05750 (2023).

Mazurowski, M. A. Segment anything model for medical image analysis: an experimental study. Med. Image Anal. 89 , 102918 (2023).

Huang, Y. et al. Segment anything model for medical images? Med. Image Anal. 92 , 103061 (2024).

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision . 801–818 (IEEE, 2018).

Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (OpenReview.net, 2020).

Tancik, M. Fourier features let networks learn high frequency functions in low-dimensional domains. In Advances in Neural Information Processing Systems 33 , 7537–7547 (Curran Associates, Inc., 2020).

Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems , Vol. 30 (Curran Associates, Inc., 2017).

He, B. Blinded, randomized trial of sonographer versus AI cardiac function assessment. Nature 616 , 520–524 (2023).

Eisenhauer, E. A. New response evaluation criteria in solid tumours: revised recist guideline (version 1.1). Eur. J. Cancer 45 , 228–247 (2009).

Ma, J. & Wang, B. Towards foundation models of biological image segmentation. Nat. Method. 20 , 953–955 (2023).

Ma, J. et al. The multi-modality cell segmentation challenge: towards universal solutions. Preprint at https://arxiv.org/abs/2308.05864 (2023).

Xie, R., Pang, K., Bader, G.D., Wang, B. Maester: masked autoencoder guided segmentation at pixel resolution for accurate, self-supervised subcellular structure recognition. In IEEE Conference on Computer Vision and Pattern Recognition . 3292–3301 (IEEE, 2023).

Bera, K., Braman, N., Gupta, A., Velcheti, V. & Madabhushi, A. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat. Rev. Clin. Oncol. 19 , 132–146 (2022).

Clark, K. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26 , 1045–1057 (2013).

Ba, J.L., Kiros, J.R., Hinton, G.E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).

He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition . 16000–16009 (IEEE, 2022).

Loshchilov, I., Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (OpenReview.net, 2019).

He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition . 770–778 (IEEE, 2016).

Iakubovskii, P. Segmentation models pytorch. GitHub https://github.com/qubvel/segmentation_models.pytorch (2019).

Milletari, F., Navab, N., Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision (3DV). 565–571 (IEEE, 2016).

Ma, J. Loss odyssey in medical image segmentation. Med. Image Anal. 71 , 102035 (2021).

Ahmed, A. Radiomic mapping model for prediction of Ki-67 expression in adrenocortical carcinoma. Clin. Radiol. 75 , 479–17 (2020).

Moawad, A.W. et al. Voxel-level segmentation of pathologically-proven Adrenocortical carcinoma with Ki-67 expression (Adrenal-ACC-Ki67-Seg) [data set]. https://doi.org/10.7937/1FPG-VM46 (2023).

Yushkevich, P.A., Gao, Y., Gerig, G. Itk-snap: an interactive tool for semi-automatic segmentation of multi-modality biomedical images. In International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) . 3342–3345 (IEEE, 2016).

Maier-Hein, L. et al. Metrics reloaded: Pitfalls and recommendations for image analysis validation. Preprint at https://arxiv.org/abs/2206.01653 (2022).

DeepMind surface-distance. https://github.com/google-deepmind/surface-distance (2018).

Ma, J. bowang-lab/MedSAM: v1.0.0. https://doi.org/10.5281/zenodo.10452777 (2023).

Download references

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2020-06189 and DGECR-2020-00294) and CIFAR AI Chair programs. The authors of this paper highly appreciate all the data owners for providing public medical images to the community. We also thank Meta AI for making the source code of segment anything publicly available to the community. This research was enabled in part by computing resources provided by the Digital Research Alliance of Canada.

Author information

Authors and affiliations.

Peter Munk Cardiac Centre, University Health Network, Toronto, ON, Canada

Jun Ma, Feifei Li & Bo Wang

Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada

Jun Ma & Bo Wang

Vector Institute, Toronto, ON, Canada

Department of Computer Science, Western University, London, ON, Canada

Tandon School of Engineering, New York University, New York, NY, USA

Department of Electrical Engineering, Yale University, New Haven, CT, USA

Department of Computer Science, University of Toronto, Toronto, ON, Canada

UHN AI Hub, Toronto, ON, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

Conceived and designed the experiments: J.M. Y.H., C.Y., B.W. Performed the experiments: J.M. Y.H., F.L., L.H., C.Y. Analyzed the data: J.M. Y.H., F.L., L.H., C.Y., B.W. Wrote the paper: J.M. Y.H., F.L., L.H., C.Y., B.W. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Bo Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests

Peer review

Peer review information.

Nature Communications thanks David Ouyang, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, peer review file, source data, source data file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ma, J., He, Y., Li, F. et al. Segment anything in medical images. Nat Commun 15 , 654 (2024). https://doi.org/10.1038/s41467-024-44824-z

Download citation

Received : 24 October 2023

Accepted : 05 January 2024

Published : 22 January 2024

DOI : https://doi.org/10.1038/s41467-024-44824-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization.

  • Tamar Schwartz
  • Assaf Zaritsky

Nature Communications (2024)

Holotomography

  • Herve Hugonnet
  • YongKeun Park

Nature Reviews Methods Primers (2024)

An efficient segment anything model for the segmentation of medical images

  • Guanliang Dong
  • Zhangquan Wang
  • Haidong Cui

Scientific Reports (2024)

A Comprehensive Survey of Image Generation Models Based on Deep Learning

  • Chenyang Zhang

Annals of Data Science (2024)

TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

  • Xiaoxiao Liu

Applied Intelligence (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

research papers on medical image processing

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Recent Advances in Medical Image Processing

Affiliations.

  • 1 Hangzhou Zhiwei Information and Technology Inc., Hangzhou, China.
  • 2 Hangzhou Zhiwei Information and Technology Inc., Hangzhou, China, [email protected].
  • PMID: 33176311
  • DOI: 10.1159/000510992

Background: Application and development of the artificial intelligence technology have generated a profound impact in the field of medical imaging. It helps medical personnel to make an early and more accurate diagnosis. Recently, the deep convolution neural network is emerging as a principal machine learning method in computer vision and has received significant attention in medical imaging. Key Message: In this paper, we will review recent advances in artificial intelligence, machine learning, and deep convolution neural network, focusing on their applications in medical image processing. To illustrate with a concrete example, we discuss in detail the architecture of a convolution neural network through visualization to help understand its internal working mechanism.

Summary: This review discusses several open questions, current trends, and critical challenges faced by medical image processing and artificial intelligence technology.

Keywords: Artificial intelligence; Convolution neural network; Deep learning; Medical imaging.

© 2020 S. Karger AG, Basel.

PubMed Disclaimer

Similar articles

  • [Role of artificial intelligence in the diagnosis and treatment of gastrointestinal diseases]. Yu YY. Yu YY. Zhonghua Wei Chang Wai Ke Za Zhi. 2020 Jan 25;23(1):33-37. doi: 10.3760/cma.j.issn.1671-0274.2020.01.006. Zhonghua Wei Chang Wai Ke Za Zhi. 2020. PMID: 31958928 Chinese.
  • AI-based computer-aided diagnosis (AI-CAD): the latest review to read first. Fujita H. Fujita H. Radiol Phys Technol. 2020 Mar;13(1):6-19. doi: 10.1007/s12194-019-00552-4. Epub 2020 Jan 2. Radiol Phys Technol. 2020. PMID: 31898014 Review.
  • An overview of deep learning algorithms and water exchange in colonoscopy in improving adenoma detection. Hsieh YH, Leung FW. Hsieh YH, et al. Expert Rev Gastroenterol Hepatol. 2019 Dec;13(12):1153-1160. doi: 10.1080/17474124.2019.1694903. Epub 2019 Nov 30. Expert Rev Gastroenterol Hepatol. 2019. PMID: 31755802 Review.
  • Super-resolution reconstruction of knee magnetic resonance imaging based on deep learning. Qiu D, Zhang S, Liu Y, Zhu J, Zheng L. Qiu D, et al. Comput Methods Programs Biomed. 2020 Apr;187:105059. doi: 10.1016/j.cmpb.2019.105059. Epub 2019 Sep 24. Comput Methods Programs Biomed. 2020. PMID: 31582263
  • Artificial Intelligence and Machine Learning in Cardiovascular Imaging. Seetharam K, Min JK. Seetharam K, et al. Methodist Debakey Cardiovasc J. 2020 Oct-Dec;16(4):263-271. doi: 10.14797/mdcj-16-4-263. Methodist Debakey Cardiovasc J. 2020. PMID: 33500754 Free PMC article. Review.
  • Deep Learning for Nasopharyngeal Carcinoma Segmentation in Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis. Wang CK, Wang TW, Yang YX, Wu YT. Wang CK, et al. Bioengineering (Basel). 2024 May 17;11(5):504. doi: 10.3390/bioengineering11050504. Bioengineering (Basel). 2024. PMID: 38790370 Free PMC article. Review.
  • Validation of an established TW3 artificial intelligence bone age assessment system: a prospective, multicenter, confirmatory study. Liu Y, Ouyang L, Wu W, Zhou X, Huang K, Wang Z, Song C, Chen Q, Su Z, Zheng R, Wei Y, Lu W, Wu W, Liu Y, Yan Z, Wu Z, Fan J, Zhou M, Fu J. Liu Y, et al. Quant Imaging Med Surg. 2024 Jan 3;14(1):144-159. doi: 10.21037/qims-23-715. Epub 2023 Oct 28. Quant Imaging Med Surg. 2024. PMID: 38223047 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • S. Karger AG, Basel, Switzerland

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

ORIGINAL RESEARCH article

Medical image analysis using deep learning algorithms.

Mengfang Li&#x;

  • 1 The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
  • 2 Department of Cardiovascular Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
  • 3 Department of Cardiovascular Medicine, Wencheng People’s Hospital, Wencheng, China

In the field of medical image analysis within deep learning (DL), the importance of employing advanced DL techniques cannot be overstated. DL has achieved impressive results in various areas, making it particularly noteworthy for medical image analysis in healthcare. The integration of DL with medical image analysis enables real-time analysis of vast and intricate datasets, yielding insights that significantly enhance healthcare outcomes and operational efficiency in the industry. This extensive review of existing literature conducts a thorough examination of the most recent deep learning (DL) approaches designed to address the difficulties faced in medical healthcare, particularly focusing on the use of deep learning algorithms in medical image analysis. Falling all the investigated papers into five different categories in terms of their techniques, we have assessed them according to some critical parameters. Through a systematic categorization of state-of-the-art DL techniques, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Long Short-term Memory (LSTM) models, and hybrid models, this study explores their underlying principles, advantages, limitations, methodologies, simulation environments, and datasets. Based on our results, Python was the most frequent programming language used for implementing the proposed methods in the investigated papers. Notably, the majority of the scrutinized papers were published in 2021, underscoring the contemporaneous nature of the research. Moreover, this review accentuates the forefront advancements in DL techniques and their practical applications within the realm of medical image analysis, while simultaneously addressing the challenges that hinder the widespread implementation of DL in image analysis within the medical healthcare domains. These discerned insights serve as compelling impetuses for future studies aimed at the progressive advancement of image analysis in medical healthcare research. The evaluation metrics employed across the reviewed articles encompass a broad spectrum of features, encompassing accuracy, sensitivity, specificity, F-score, robustness, computational complexity, and generalizability.

1. Introduction

Deep learning is a branch of machine learning that employs artificial neural networks comprising multiple layers to acquire and discern intricate patterns from extensive datasets ( 1 , 2 ). It has brought about a revolution in various domains, including computer vision, natural language processing, and speech recognition, among other areas ( 3 ). One of the primary advantages of deep learning is its capacity to automatically learn features from raw data, thereby eliminating the necessity for manual feature engineering ( 4 ). This makes it especially powerful in domains with large, complex datasets, where traditional machine learning methods may struggle to capture the underlying patterns ( 5 ). Deep learning has also facilitated significant advancements in various tasks, including but not limited to image and speech recognition, comprehension of natural language, and the development of autonomous driving capabilities ( 6 ). For instance, deep learning has enabled the creation of exceptionally precise computer vision systems capable of identifying objects in images and videos with unparalleled precision. Likewise, deep learning has brought about substantial enhancements in natural language processing, leading to the development of models capable of comprehending and generating language that resembles human-like expression ( 7 ). Overall, deep learning has opened up new opportunities for solving complex problems and has the potential to transform many industries, including healthcare, finance, transportation, and more.

Medical image analysis is a field of study that involves the processing, interpretation, and analysis of medical images ( 8 ). The emergence of deep learning algorithms has prompted a notable transformation in the field of medical image analysis, as they have increasingly been employed to enhance the diagnosis, treatment, and monitoring of diverse medical conditions in recent years ( 9 ). Deep learning, as a branch of machine learning, encompasses the training of algorithms to acquire knowledge from vast quantities of data. When applied to medical image analysis, deep learning algorithms possess the capability to automatically identify and categorize anomalies in various medical images, including X-rays, MRI scans, CT scans, and ultrasound images ( 10 ). These algorithms can undergo training using extensive datasets consisting of annotated medical images, where each image is accompanied by labels indicating the corresponding medical condition or abnormality ( 11 ). Once trained, the algorithm can analyze new medical images and provide diagnostic insights to healthcare professionals. The application of deep learning algorithms in medical image analysis has exhibited promising outcomes, as evidenced by studies showcasing high levels of accuracy in detecting and diagnosing a wide range of medical conditions ( 12 ). This has led to the development of various commercial and open-source software tools that leverage deep learning algorithms for medical image analysis ( 13 ). Overall, the utilization of deep learning algorithms in medical image analysis has the capability to bring about substantial enhancements in healthcare results and transform the utilization of medical imaging in diagnosis and treatment.

Medical image processing is an area of research that encompasses the creation and application of algorithms and methods to analyze and decipher medical images ( 14 ). The primary objective of medical image processing is to extract meaningful information from medical images to aid in diagnosis, treatment planning, and therapeutic interventions ( 15 ). Medical image processing involves various tasks such as image segmentation, image registration, feature extraction, classification, and visualization. The primary aim of medical image processing is to extract pertinent information from medical images, facilitating the tasks of diagnosis, treatment planning, and therapeutic interventions. Each modality has its unique strengths and limitations, and the images produced by different modalities may require specific processing techniques to extract useful information ( 16 ). Medical image processing techniques have revolutionized the field of medicine by providing a non-invasive means to visualize and analyze the internal structures and functions of the body. It has enabled early detection and diagnosis of diseases, accurate treatment planning, and monitoring of treatment response. The use of medical image processing has significantly improved patient outcomes, reduced treatment costs, and enhanced the quality of care provided to patients. Visual depictions of CNNs in the context of medical image analysis using DL algorithms portray a layered architecture, where initial layers capture rudimentary features like edges and textures, while subsequent layers progressively discern more intricate and abstract characteristics, allowing the network to autonomously extract pertinent information from medical images for tasks like detection, segmentation, and classification. Additionally, the visual representations of RNNs in medical image analysis involving DL algorithms illustrate a network structure adept at grasping temporal relationships and sequential patterns within images, rendering them well-suited for tasks such as video analysis or the processing of time-series medical image data. Furthermore, visual representations of GANs in medical image analysis employing DL algorithms exemplify a dual-network framework: one network, the generator, fabricates synthetic medical images, while the other, the discriminator, assesses their authenticity, facilitating the generation of lifelike images closely resembling actual medical data. Moreover, visual depictions of LSTM networks in medical image analysis with DL algorithms delineate a specialized form of recurrent neural network proficient in processing sequential medical image data by preserving long-term dependencies and learning temporal patterns crucial for tasks like video analysis and time-series image processing. Finally, visual representations of hybrid methods in medical image analysis using DL algorithms portray a combination of diverse neural network architectures, often integrating CNNs with RNNs or other specialized modules, enabling the model to harness both spatial and temporal information for a comprehensive analysis of medical images.

Case studies and real-world examples provide tangible evidence of the effectiveness and applicability of DL algorithms in various medical image analysis tasks. They underscore the potential of this technology to revolutionize healthcare by improving diagnostic accuracy, reducing manual labor, and enabling earlier interventions for patients. Here are several examples of case studies and real-worlds applications:

1. Skin cancer detection

2. Case Study: In Vijayalakshmi ( 17 ), a DL algorithm was trained to identify skin cancer from images of skin lesions. The algorithm demonstrated accuracy comparable to that of dermatologists, highlighting its potential as a tool for early skin cancer detection.

3. Diabetic retinopathy screening

4. Case Study: also, De Fauw et al. ( 18 ) in Moorfields Eye Hospital, developed a DL system capable of identifying diabetic retinopathy from retinal images. The system was trained on a dataset of over 128,000 images and achieved a level of accuracy comparable to expert ophthalmologists.

1. Tumor segmentation in MRI

2. Case Study: A study conducted by Guo et al. ( 8 ) at Massachusetts General Hospital utilized DL techniques to automate the segmentation of brain tumors from MRI scans. The algorithm significantly reduced the time required for tumor delineation, enabling quicker treatment planning for patients.

3. Chest X-ray analysis for tuberculosis detection

4. Case Study: The National Institutes of Health (NIH) released a dataset of chest X-ray images for the detection of tuberculosis. Researchers have successfully applied deep learning algorithms to this dataset, achieving high accuracy in identifying TB-related abnormalities.

5. Automated bone fracture detection

6. Case Study: Meena and Roy ( 19 ) at Stanford University developed a deep learning model capable of detecting bone fractures in X-ray images. The model demonstrated high accuracy and outperformed traditional rule-based systems in fracture detection.

Within the realm of medical image analysis utilizing DL algorithms, ML algorithms are extensively utilized for precise and efficient segmentation tasks. DL approaches, especially convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have demonstrated exceptional proficiency in capturing and leveraging spatial dependencies and symmetrical properties inherent in medical images. These algorithms enable the analyzing medical image of symmetric structures, such as organs or limbs, by leveraging their inherent symmetrical patterns. The utilization of DL mechanisms in medical image analysis encompasses various practical approaches, including generative adversarial networks (GANs), hybrid models, and combinations of CNNs and RNNs. The objective of this research is to offer a thorough examination of the uses of DL techniques in the domain of deep symmetry-based image analysis within medical healthcare, providing a comprehensive overview. By conducting an in-depth systematic literature review (SLR), analyzing multiple studies, and exploring the properties, advantages, limitations, datasets, and simulation environments associated with different DL mechanisms, this study enhances comprehension regarding the present state and future pathways for advancing and refining deep symmetry-based image analysis methodologies in the field of medical healthcare. The article is structured in the following manner: The key principles and terminology of ML/DL in medical image analysis are covered in the first part, followed by an investigation of relevant papers in part 3. Part 4 discusses the studied mechanisms and tools for paper selection, while part 5 illustrates the classification that was selected. Section 6 presents the results and comparisons, and the remaining concerns and conclusion are explored in the last section.

2. Fundamental concepts and terminology

The concepts and terms related to medical image analysis using DL algorithms that are covered in this section are essential for understanding the underlying principles and techniques used in medical image analysis.

2.1. The role of image analysis in medical healthcare

The utilization of deep learning algorithms for image analysis has brought about a revolution in medical healthcare by facilitating advanced and automated analysis of medical images ( 20 ). Deep learning methods, including Convolutional Neural Networks (CNNs), have showcased outstanding proficiency in tasks like image segmentation, feature extraction, and classification, exhibiting remarkable performance ( 21 ). By leveraging large amounts of annotated data, deep learning models can learn intricate patterns and relationships within medical images, facilitating accurate detection, localization, and diagnosis of diseases and abnormalities. Deep learning-based image analysis allows for faster and more precise interpretation of medical images, leading to improved patient outcomes, personalized treatment planning, and efficient healthcare workflows ( 22 ). Furthermore, these algorithms have the potential to assist in early disease detection, assist radiologists in decision-making, and enhance medical research through the analysis of large-scale image datasets. Overall, deep learning-based image analysis is transforming medical healthcare by providing powerful tools for image interpretation, augmenting the capabilities of healthcare professionals, and enhancing patient care ( 23 ).

2.2. Medical image analysis application

The utilization of deep learning algorithms in medical image analysis has discovered numerous applications within the healthcare sector. Deep learning techniques, notably Convolutional Neural Networks (CNNs), have been widely employed for tasks encompassing image segmentation, object detection, disease classification, and image reconstruction ( 24 ). In medical image analysis, these algorithms can assist in the detection and diagnosis of various conditions, such as tumors, lesions, anatomical abnormalities, and pathological changes. They can also aid in the evaluation of disease progression, treatment response, and prognosis. Deep learning models can automatically extract meaningful features from medical images, enabling efficient and accurate interpretation ( 25 ). The application of this technology holds promise for elevating clinical decision-making, ameliorating patient outcomes, and optimizing resource allocation in healthcare settings. Moreover, deep learning algorithms can be employed for data augmentation, image registration, and multimodal fusion, facilitating a comprehensive and integrated analysis of medical images obtained from various modalities. With continuous advancements in deep learning algorithms, medical image analysis is witnessing significant progress, opening up new possibilities for precision medicine, personalized treatment planning, and advanced healthcare solutions ( 26 ).

2.3. Various aspects of medical image analysis for the healthcare section

Medical image analysis encompasses various crucial aspects in the healthcare sector, enabling in-depth examination and diagnosis based on medical imaging data ( 27 ). Image preprocessing constitutes a crucial element, encompassing techniques like noise reduction, image enhancement, and normalization, aimed at enhancing the quality and uniformity of the images. Another essential aspect is image registration, which aligns multiple images of the same patient or acquired through different imaging modalities, enabling precise comparison and fusion of information ( 28 ). Feature extraction is another crucial step, where relevant characteristics and patterns are extracted from the images, aiding in the detection and classification of abnormalities or specific anatomical structures. Segmentation plays a vital role in delineating regions of interest, enabling precise localization and measurement of anatomical structures, tumors, or lesions ( 29 ). Finally, classification and recognition techniques are applied to differentiate normal and abnormal regions, aiding in disease diagnosis and treatment planning. Deep learning algorithms, notably Convolutional Neural Networks (CNNs), have exhibited extraordinary achievements in diverse facets of medical image analysis by acquiring complex patterns and representations from extensive datasets of medical imaging ( 30 ). However, challenges such as data variability, interpretability, and generalization across different patient populations and imaging modalities need to be addressed to ensure reliable and effective medical image analysis in healthcare applications.

3. Relevant reviews

We are going to look into some recent research on medical image analysis using DL algorithms in this part. The initial purpose is to properly make a distinction between the current study’s significant results in comparison with what is discussed in this paper. Due to advancements in AI technology, there is a growing adoption of AI mechanisms in medical image analysis. Simultaneously, academia has shown a heightened interest in addressing challenges related to medical image analysis. Furthermore, medical image analysis is a hierarchical network management framework modeled to direct analysis availability to aim medical healthcare. In this regard, Gupta and Katarya ( 31 ) provided a comprehensive review of the literature on social media-based surveillance systems for healthcare using machine learning. The authors analyzed 50 studies published between 2011 and 2021, covering a wide range of topics related to social media monitoring for healthcare, including disease outbreaks, adverse drug reactions, mental health, and vaccine hesitancy. The review highlighted the potential of machine learning algorithms for analyzing vast amounts of social media data and identifying relevant health-related information. The authors also identified several challenges associated with the use of social media data, such as data quality and privacy concerns, and discuss potential solutions to address these challenges. The authors noted that social media-based surveillance systems can complement traditional surveillance methods by providing real-time data on health-related events and trends. They also suggested that machine learning algorithms can improve the accuracy and efficiency of social media monitoring by automatically filtering out irrelevant information and identifying patterns and trends in the data. The review highlighted the importance of data pre-processing and feature selection in developing effective machine learning models for social media analysis.

As well, Kourou et al. ( 32 ) reviewed machine learning (ML) applications for cancer prognosis and prediction. The authors started by describing the challenges of cancer treatment, highlighting the importance of personalized medicine and the role of ML algorithms in enabling it. The paper then provided an overview of different types of ML algorithms, including supervised and unsupervised learning, and discussed their potential applications in cancer prognosis and prediction. The authors presented examples of studies that have used ML algorithms for diagnosis, treatment response prediction, and survival prediction across different types of cancer. They also discussed the use of multiple data sources for ML algorithms, such as genetic data, imaging data, and clinical data. The paper concluded by addressing the challenges and limitations encountered in using ML algorithms for cancer prognosis and prediction, which include concerns regarding data quality, overfitting, and interpretability. The authors proposed that ML algorithms hold significant potential for enhancing cancer treatment outcomes. However, they emphasized the necessity for further research to optimize their application and tackle the associated challenges in this domain.

Moreover, Razzak et al. ( 33 ) provided a comprehensive overview of the use of deep learning in medical image processing. The authors deliberated on the potential of deep learning algorithms in diverse medical imaging tasks, encompassing image classification, segmentation, registration, and synthesis. They emphasized the challenges encountered when employing deep learning, such as the requirement for extensive annotated datasets, interpretability of deep models, and computational demands. Additionally, the paper delved into prospective avenues in the field, including the integration of multi-modal data, transfer learning, and the utilization of generative models. In summary, the paper offered valuable perspectives on the present status, challenges, and potential advancements of deep learning in the domain of medical image processing.

In addition, Litjens et al. ( 34 ) provided a comprehensive survey of the applications of deep learning in medical image analysis. A thorough introduction of the deep learning approaches used in each of these areas is provided by the authors as they look at a variety of tasks in medical imaging, including picture classification, segmentation, detection, registration, and creation. Additionally, they look at the difficulties and restrictions of using deep learning algorithms for medical image analysis, such as the need for sizable datasets with annotations and the interpretability of deep models. The growth of explainable and interpretable deep learning models is highlighted in the paper’s conclusion along with other potential future possibilities in the area, such as the integration of multimodal data. In summary, this survey serves as a valuable resource for researchers and practitioners, offering insights into the current state and future prospects of deep learning in the context of medical image analysis.

Additionally, Bzdok and Ioannidis ( 35 ) discussed the importance of exploration, inference, and prediction in the fields of neuroscience and biomedicine. The author highlighted the importance of integrating diverse data types, such as neuroimaging, genetics, and behavioral data, in order to achieve a comprehensive comprehension of intricate systems. Bzdok also delved into the role of machine learning in facilitating the identification of patterns and making predictions based on extensive datasets. The author provided an account of several specific applications of machine learning in neuroscience and biomedicine, including forecasting disease progression and treatment response, analyzing brain connectivity networks, and identifying biomarkers for disease diagnosis. The paper concluded by discussing the challenges and limitations encountered when employing machine learning in these domains, while emphasizing the essentiality of carefully considering the ethical and social implications of these technologies. Moreover, the paper underscored the potential of machine learning to transform our understanding of complex biological systems and enhance medical outcomes. Table 1 depicts summary of related works.

www.frontiersin.org

Table 1 . Summary of related works.

4. Methodology of research

We thoroughly examined pertinent documents that partially explored the utilization of DL methods in medical image analysis. By utilizing the Systematic Literature Review (SLR) methodology, this section comprehensively encompasses the field of medical image analysis. The SLR technique encompasses a thorough evaluation of all research conducted on a significant topic. This section concludes with an extensive investigation of ML techniques in the realm of medical image analysis. Furthermore, the reliability of the research selection methods is scrutinized. In the subsequent subsections, we have provided supplementary information concerning research techniques, encompassing the selection metrics and research inquiries.

4.1. Formalization of question

The primary aims of the research are to identify, assess, and differentiate all key papers within the realm of using DL methods medical image analysis. A systematic literature review (SLR) can be utilized to scrutinize the constituents and characteristics of methods for accomplishing the aforementioned objectives. Furthermore, an SLR facilitates the acquisition of profound comprehension of the pivotal challenges and difficulties in this domain. The following paragraph outlines several research inquiries:

Research Question 1: In what manners can DL techniques in the field of medical image analysis be categorized? The answer to this question can be found in Part 5.
Research Question 2: What types of techniques do scholars employ to execute their investigation? Parts 5.1 to 5.7 elucidate this query.
Research Question 3: Which parameters attracted the most attention in the papers? What are the most popular DL applications utilized in medical image analysis? The answer to this question is included in Part 6.
Research Question 4: What unexplored prospects exist in this area? Part 7 proffers the answer to this question.

4.2. The procedure of paper exploration

The present investigation’s pursuit and selection methodologies are classified into four distinct phases, as depicted in Figure 1 . In the initial phase, a comprehensive list of keywords and phrases was utilized to scour various sources, as demonstrated in Table 2 . An electronic database was employed to retrieve relevant documents, including Chapters, Journals, technical studies, conference papers, notes, and special issues, resulting in a total of 616 papers as is shown if Figure 2 . These papers were then subjected to an exhaustive analysis based on a set of predetermined standards, and only those meeting the stipulated criteria, illustrated in Figure 3 , were selected for further evaluation. The distribution of publishers in this initial phase is shown in Figure 4 , and the number of articles left after the first phase was 481.

www.frontiersin.org

Figure 1 . The phases of the article searching and selection process.

www.frontiersin.org

Table 2 . Keywords and search criteria.

www.frontiersin.org

Figure 2 . Frequency of publications of studied paper in first stage of paper selection.

www.frontiersin.org

Figure 3 . Criteria for inclusion in the paper selection process.

www.frontiersin.org

Figure 4 . Frequency of publications of studied paper in second stage of paper selection.

In the subsequent phase, a thorough review of the selected papers’ titles and abstracts was conducted, focusing on the papers’ discussion, methodology, analysis, and conclusion to ensure their relevance to the study. As demonstrated in Figure 5 , only 227 papers were retained after this step and 105 papers were further.

www.frontiersin.org

Figure 5 . Frequency of publications of studied paper in third stage of paper selection.

chosen for a more comprehensive review, as illustrated in Figure 6 , with the ultimate aim of selecting papers that adhered to the study’s predetermined metrics. Finally, after careful consideration, 25 articles were hand-picked to investigate other publications.

www.frontiersin.org

Figure 6 . Frequency of publications of studied paper in forth stage of paper selection.

5. ML/DL techniques for medical image analysis

In this section, we delve into the implementation of DL methods in the medical healthcare image analysis field. A total of 25 articles satisfying our selection criteria will be presented herein. Initially, we categorize the techniques into 5 primary groups comprising CNNs, RNNs, GANs, LSTMs, and hybrid methodologies encompassing diverse methods. The proposed taxonomy of DL-associated medical image analysis in medical healthcare is depicted in Figure 7 .

www.frontiersin.org

Figure 7 . The proposed taxonomy of Bioinformatics.

5.1. Convolutional neural network techniques for medical image analysis

When using deep learning approaches for medical image processing, convolutional neural networks (CNNs) play a significant role. They perform well in tasks like object localization, segmentation, and classification due to their capacity to automatically extract pertinent characteristics from intricate medical pictures. CNNs are able to accurately identify anomalies, diagnose tumors, and segment organs in medical pictures by capturing complex patterns and structures. Important characteristics may be learnt at various levels by utilizing the hierarchical structure of CNNs, which improves analysis and diagnosis. Employing CNNs in medical image analysis has notably improved the precision, effectiveness, and automation of diagnostic procedures, ultimately leading to advantageous patient care and treatment results.

In this regard, Singh et al. ( 36 ) highlighted the role of artificial intelligence (AI) and machine learning (ML) techniques in advancing biomedical material design and predicting their toxicity. The authors emphasized the need for efficient and safe materials for medical applications and how computational methods can aid in this process. The paper explored diverse categories of AI and ML algorithms, including random forests, decision trees, and support vector machines, which can be employed for predicting toxicity. The authors provided a case study wherein they utilized a random forest algorithm to predict the toxicity of carbon nanotubes. They also highlighted the importance of data quality and quantity for accurate predictions, as well as the need for interpretability and transparency of AI/ML models. The paper concluded by discussing future research directions in this area, including the integration of multi-omics data, network analysis, and deep learning techniques. This paper demonstrated the potential of AI/ML in advancing biomedical material design and reducing the need for animal testing.

Also, Jena et al. ( 37 ) investigated the impact of parameters on the performance of deep learning models for the classification of diabetic retinopathy (DR) in a smart healthcare system. Using retinal fundus pictures, the scientists developed a convolutional neural network (CNN) architecture with two branches to categorize diabetic retinopathy (DR). A branch for feature extraction and another for classification are both included in the suggested model. A pre-trained model is used in the feature extraction branch to extract pertinent characteristics from the input picture, and the classification branch uses these features to predict the severity of DR. The learning rate, number of epochs, batch size, and optimizer were among the variables that the authors experimented with in order to evaluate the model’s performance. The outcomes showed that the suggested model, when using the ideal parameter configuration, had an accuracy of 98.12%. The authors also suggested a secure IoT-based blockchain-based smart healthcare system for processing and storing medical data. The proposed system could be used for the early diagnosis and treatment of DR, thereby improving patient outcomes.

As well, Thilagam et al. ( 38 ) presented a secure Internet of Things (IoT) healthcare architecture with a deep learning-based access control system. The proposed system is designed to ensure that only authorized personnel can access the sensitive medical information stored in IoT devices. The authors used deep learning algorithms to develop a robust access control system that can identify and authenticate users with high accuracy. The system also included an encryption layer to ensure that all data transmitted between devices is secure. The authors assessed the proposed architecture through a prototype implementation, which revealed that the system can securely access medical data in real-time. Additionally, the authors conducted a comparison with existing solutions and demonstrated that their approach outperforms others in terms of accuracy, security, and scalability. The paper underscored the potential of employing deep learning algorithms in healthcare systems to enhance security and privacy, while facilitating real-time access to medical data.

Besides, Ismail et al. ( 39 ) proposed a CNN-based model for analyzing regular health factors in an IoMT (Internet-of-Medical-Things) environment. The model extracted feature from multiple health data sources, such as blood pressure, pulse rate, and body temperature, using CNN-based algorithms, which are then used to predict the risk of health issues. The proposed model is capable of classifying health data into five categories: normal, pre-hypertension, hypertension, pre-diabetes, and diabetes. The authors utilized a real-world dataset comprising health data from 50 individuals to train and evaluate the model. The findings indicated that the proposed model exhibited a remarkable level of accuracy and surpassed existing machine learning models in terms of both predictive accuracy and computational complexity. The authors expressed their confidence that the proposed model could contribute to the advancement of health monitoring systems, offering real-time monitoring and personalized interventions, thereby preventing health issues and enhancing patient outcomes.

And, More et al. ( 40 ) proposed a security-assured CNN-based model for the reconstruction of medical images on the Internet of Healthcare Things (IoHT) with the goal of ensuring the privacy and security of medical data. The proposed framework comprises two main components: a deep learning-based image reconstruction model and a security-enhanced encryption model. The image reconstruction model relies on a convolutional neural network (CNN) to accurately reconstruct original medical images from compressed versions. To safeguard the transmitted images, the encryption model employs a hybrid encryption scheme that combines symmetric and asymmetric techniques. Through evaluation using a widely recognized medical imaging dataset, the results demonstrated the model’s remarkable reconstruction accuracy and effective security performance. This study underscores the potential of leveraging deep learning models in healthcare, particularly within medical image processing, while emphasizing the crucial need for ensuring the security and privacy of medical data. Table 3 discusses the CNN methods used in medical image analysis and their properties.

www.frontiersin.org

Table 3 . The methods, properties, and features of CNN-medical image analysis mechanisms.

5.2. Generative adversarial network techniques for medical image analysis

The importance of GAN methods in medical image analysis using deep learning algorithms lies in their ability to generate realistic synthetic images, augment datasets, and improve the accuracy and effectiveness of diagnosis and analysis for various medical conditions. By the same token, in Vaccari et al. ( 41 ) the authors proposed a generative adversarial network (GAN) technique to address the issue of generating synthetic medical data for Internet of Medical Things (IoMT) applications. The authors detailed the application of their proposed method for generating a wide range of medical data samples encompassing both time series and non-time series data. They emphasized the advantages of employing a Generative Adversarial Network (GAN)-based approach, such as the capacity to generate realistic data capable of enhancing the performance of Internet of Medical Things (IoMT) systems. Through experiments utilizing authentic medical datasets like electrocardiogram (ECG) data and healthcare imaging data, the authors validated the efficacy of their proposed technique. The results demonstrated that their GAN-based method successfully produced synthetic medical data that closely resembled real medical data, both visually and statistically, as indicated by various metrics. The authors concluded that their proposed technique has the potential to be a valuable tool for generating synthetic medical data for use in IoMT applications.

Toward accurate prediction of patient length of stay at emergency.

As well, Kadri et al. ( 42 ) presented a framework that utilizes a deep learning model to predict the length of stay of patients at emergency departments. The proposed model employed a GAN to generate synthetic training data and address the problem of insufficient training data. The model used multiple input modalities, including demographic information, chief complaint, triage information, vital signs, and lab results, to predict the length of stay of patients. The authors demonstrated that their proposed framework surpassed multiple baseline models, showcasing its exceptional performance in accurately predicting the length of stay for patients in emergency departments. They recommended the deployment of the proposed framework in real-world settings, anticipating its potential to enhance the efficiency of emergency departments and ultimately improve patient outcomes.

Yang et al. ( 43 ) proposed a novel semi-supervised learning approach using GAN for clinical decision support in Health-IoT platform. The proposed model generated new samples from existing labeled data, creating additional labeled data for training. The GAN-based model undergoes training on a vast unlabeled dataset to generate medical images that exhibit enhanced realism for subsequent training purposes. These generated samples are then employed to fine-tune the pre-trained CNN, resulting in an improved classification accuracy. To assess the effectiveness of the proposed model, three medical datasets are utilized, and the findings demonstrate that the GAN-based semi-supervised learning approach surpasses the supervised learning approach, yielding superior accuracy and reduced loss values. The paper concludes that the proposed model presents the potential to enhance the accuracy of clinical decision support systems by generating supplementary training data. Furthermore, the proposed approach can be extended to diverse healthcare applications, including disease diagnosis and drug discovery.

Huang et al. ( 44 ) proposed a deep learning-based model, DU-GAN, for low-dose computed tomography (CT) denoising in the medical imaging field. The architecture of DU-GAN incorporates dual-domain U-Net-based discriminators and a GAN, aiming to enhance denoising performance and generate high-quality CT images. The proposed approach adopts a dual-domain architecture, effectively utilizing both the image domain and transform domain to differentiate real images from generated ones. DU-GAN is trained on a substantial dataset of CT images to grasp the noise distribution and noise from low-dose CT images. The results indicate that the DU-GAN model surpasses existing methods in terms of both quantitative and qualitative evaluation metrics. Furthermore, the proposed model exhibits robustness across various noise levels and different types of image data. The study showed the potential of the proposed approach for practical application in the clinical diagnosis and treatment of various medical conditions.

Purandhar et al. ( 45 ) proposes the use of Generative Adversarial Networks (GAN) for classifying clustered health care data. This study’s GAN classifier contains both a discriminator network and a generator network. While the discriminator tells the difference between genuine and false samples, the generator learns the underlying data distribution. Utilizing data from Electronic Health Records (EHRs), the MIMIC-III dataset was used by the scientists in their research. The outcomes show that the GAN classifier accurately and successfully categorizes the medical problems of patients. The authors also demonstrated the superiority of their GAN classifier by contrasting it with conventional machine learning techniques. The suggested GAN-based strategy shows promise for illness early detection and diagnosis, with potential for bettering healthcare outcomes and lowering costs. Table 4 discusses the GAN methods used in medical image analysis.

www.frontiersin.org

Table 4 . The methods, properties, and features of GAN-medical image analysis mechanisms.

5.3. Recurrent neural network techniques for medical image analysis

Recurrent Neural Networks (RNNs) are essential in medical image analysis using deep learning algorithms due to their ability to capture temporal dependencies and contextual information. RNNs excel in tasks involving sequential or time-series data, such as analyzing medical image sequences or dynamic imaging modalities. Their capability to model long-term dependencies and utilize information from previous time steps enables the detection of patterns, disease progression prediction, and tracking tumor growth. RNN variants like LSTM and GRU further enhance their ability to capture complex temporal dynamics, making them vital in extracting meaningful insights from medical image sequences.

Sridhar et al. ( 46 ) proposed a novel approach for reducing the size of medical images while preserving their diagnostic quality. The authors introduced a two-stage framework that combines a Recurrent Neural Network (RNN) and a Genetic Particle Swarm Optimization with Weighted Vector Quantization (GenPSOWVQ). In the first stage, the RNN is employed to learn the spatial and contextual dependencies within the images, capturing important features for preserving diagnostic information. In the second stage, the GenPSOWVQ algorithm optimized the image compression process by selecting the best encoding parameters. The experimental results demonstrated the effectiveness of the proposed model in achieving significant image size reduction while maintaining high diagnostic accuracy. The combination of RNN and GenPSOWVQ enabled an efficient and reliable approach for medical image compression, which can have practical implications in storage, transmission, and analysis of large-scale medical image datasets.

Pham et al. ( 47 ) discussed the use of DL to predict healthcare trajectories from medical records. The authors argued that deep learning can be used to model the complex relationships between different medical conditions and predict how a patient’s healthcare journey might evolve over time. The study used data from electronic medical records of patients with various conditions, including diabetes, hypertension, and heart disease. The proposed DL model used a CNNs and RNNs to capture both the temporal and spatial relationships in the data. The research discovered that the deep learning model exhibited a remarkable ability to accurately forecast the future healthcare path of patients with a notable level of precision. The authors’ conclusion highlighted the potential of deep learning to transform healthcare delivery through enhanced accuracy in predictions and personalized care. Nevertheless, the authors acknowledged that the integration of deep learning in healthcare is still at an early phase, necessitating further investigation to fully unleash its potential.

Wang et al. ( 48 ) proposed a new approach for dynamic treatment recommendation using supervised reinforcement learning with RNNs. The authors aimed to address the challenge of making treatment decisions for patients with complex and dynamic health conditions by developing an algorithm that can adapt to changes in patient health over time. The proposed approach involved using an RNN to model patient health trajectories and predict the optimal treatment at each step. The training of the model involves a blend of supervised and reinforcement learning techniques, aimed at optimizing treatment decisions for long-term health benefits. The authors assessed the effectiveness of this approach using a dataset comprising actual patients with hypertension and demonstrated its superiority over conventional machine learning methods in terms of predictive accuracy. The suggested method holds promise in enhancing patient outcomes by offering personalized treatment recommendations that can adapt to variations in the patient’s health status.

Jagannatha and Yu ( 49 ) discusses the use of bidirectional recurrent neural networks (RNNs) for medical event detection in electronic health records (EHRs). Electronic Health Records (EHRs) offer valuable insights for medical research, yet analyzing them can be arduous due to the intricate nature and fluctuations in the data. To address this, the authors introduce a bidirectional RNN model capable of capturing the interdependencies in the sequential data of EHRs, encompassing both forward and backward relations. Through training on an EHR dataset and subsequent evaluation, the model’s proficiency in detecting medical events is assessed. The findings reveal that the bidirectional RNN surpasses conventional machine learning methods in terms of medical event detection. The authors also compare different variations of the model, such as using different types of RNNs and adding additional features to the input. Overall, the study demonstrates the potential of using bidirectional RNNs for medical event detection in EHRs, which could have important implications for improving healthcare outcomes and reducing costs.

Cocos et al. ( 50 ) focused on developing a deep learning model for pharmacovigilance to identify adverse drug reactions (ADRs) mentioned on social media platforms such as Twitter. In the study, Adverse Drug Reactions (ADRs) were trained and classified using two unique RNN architectures, namely Bidirectional Long-Short Term Memory (Bi-LSTM) and Gated Recurrent Unit (GRU). Various feature extraction methods were also looked at, and their individual performances were discussed. The outcomes showed that the Bi-LSTM model performed better than the GRU model, obtaining an F1-score of 0.86. A comparison of the deep learning models with conventional machine learning models was also done, confirming the higher performance of the deep learning models. The study focused on the possibilities of utilizing social media platforms for pharmacovigilance and underlined the efficiency of deep learning models in precisely detecting ADRs. Table 5 discusses the RNN methods used in medical image analysis.

www.frontiersin.org

Table 5 . The methods, properties, and features of RNN-medical image analysis mechanisms.

5.4. Long short-term memory techniques for medical image analysis

The importance of Long Short-Term Memory (LSTM) method in medical image analysis using deep learning algorithms lies in its ability to capture and model sequential dependencies within the image data. Medical images often contain complex spatial and temporal patterns that require understanding of contextual information. LSTM, as a type of recurrent neural network (RNN), excels in modeling long-range dependencies and capturing temporal dynamics, making it suitable for tasks such as time series analysis, disease progression modeling, and image sequence analysis. By leveraging the memory and gating mechanisms of LSTM, it can effectively learn and retain relevant information over time, enabling more accurate and robust analysis of medical image data and contributing to improved diagnostic accuracy and personalized treatment in healthcare applications.

Butt et al. ( 51 ) presented a ML-based approach for diabetes classification and prediction. They used a dataset of 768 patients and 8 clinical features, including age, BMI, blood pressure, and glucose levels. Three different machine learning techniques–logistic regression, decision tree, and k-nearest neighbors–were applied to the preprocessed data before each of these algorithms was used. Sorting patients into the diabetic or non-diabetic category was the goal. Metrics including accuracy, precision, recall, and F1 score were used to evaluate the effectiveness of each method. In order to forecast the patients’ blood glucose levels, a deep learning system, namely a feedforward neural network, was used. A comparison between the performance of the deep learning algorithm and that of the traditional machine learning algorithms was conducted, revealing that the deep learning algorithm surpassed the other algorithms in terms of prediction accuracy. The authors concluded that their approach can be used for early diagnosis and management of diabetes in healthcare applications.

Awais et al. ( 52 ) proposed an Internet of Things (IoT) framework that utilizes Long Short-Term Memory (LSTM) based emotion detection for healthcare and distance learning during COVID-19. The proposed framework offers the ability to discern individuals’ emotions by leveraging physiological signals such as electrocardiogram (ECG), electrodermal activity (EDA), and photoplethysmogram (PPG). Collected data undergoes preprocessing and feature extraction prior to training an LSTM model. To assess its effectiveness, the framework is tested using the PhysioNet emotion database, where the results demonstrate its accurate emotion detection capabilities, reaching an accuracy level of up to 94.5%. With its potential applications in healthcare and distance learning amid the COVID-19 pandemic, the framework proves invaluable for remotely monitoring individuals’ emotional states and providing necessary support and interventions. The paper highlighted the importance of using IoT and machine learning in healthcare, and how it can help to address some of the challenges posed by the pandemic.

Nancy et al. ( 53 ) proposed an IoT-Cloud-based smart healthcare monitoring system for heart disease prediction using deep learning. The technology uses wearable sensors to gather physiological signs from patients, then delivers those signals to a cloud server for analysis. By training on a sizable dataset of ECG signals, a Convolutional Neural Network (CNN)-based deep learning model is used to predict cardiac illness. Transfer learning techniques, especially fine-tuning, are used to optimize the model. The suggested system’s exceptional accuracy in forecasting cardiac illness has been rigorously tested on a real-world dataset. Additionally, the model exhibits the capability to detect the early onset of heart disease, facilitating timely intervention and treatment. The paper concluded that the proposed system can be an effective tool for real-time heart disease monitoring and prediction, which can help improve patient outcomes and reduce healthcare costs.

Queralta et al. ( 54 ) presents an Edge-AI solution for fall detection in health monitoring using LoRa communication technology, fog computing, and LSTM recurrent neural networks. The proposed system consists of a wearable device, a LoRa gateway, and an edge server that processes and analyzes sensor data locally, reducing the dependence on cloud services and improving real-time fall detection. The system employs a MobileNetV2 convolutional neural network to extract features from accelerometer and gyroscope data, followed by an LSTM network that predicts falls. The authors evaluated the performance of the proposed system using a dataset collected from volunteers and achieved a sensitivity of 93.14% and a specificity of 98.9%. They also compared the proposed system with a cloud-based solution, showing that the proposed system had lower latency and reduced data transmission requirements. Overall, the proposed Edge-AI system can provide a low-cost and efficient solution for fall detection in health monitoring applications.

Gao et al. ( 55 ) introduced a novel approach called Fully Convolutional Structured LSTM Networks (FCSLNs) for joint 4D medical image segmentation. The proposed approach utilized the strengths of fully convolutional networks and structured LSTM networks to overcome the complexities arising from spatial and temporal dependencies in 4D medical image data. By integrating LSTM units into the convolutional layers, the FCSLNs successfully capture temporal information and propagate it throughout the spatial dimensions. Empirical findings strongly indicate the outstanding performance of the FCSLNs when compared to existing methods, achieving precise and resilient segmentation of 4D medical images. The proposed framework demonstrates significant promise in advancing medical image analysis tasks and enhancing clinical decision-making processes. Table 6 discusses the LSTM methods used in medical image analysis.

www.frontiersin.org

Table 6 . The methods, properties, and features of LSTM-medical image analysis mechanisms.

5.5. Hybrid techniques for bio and medical informatics

Hybrid methods in medical image analysis, which combine deep learning algorithms with other techniques or data modalities, are of significant importance. Deep learning has demonstrated remarkable success in tasks like image segmentation and classification. However, it may face challenges such as limited training data or interpretability issues. By incorporating hybrid methods, researchers can overcome these limitations and achieve enhanced performance. Hybrid approaches can integrate traditional machine learning techniques, statistical models, or domain-specific knowledge to address data scarcity or improve interpretability. Additionally, combining multiple data modalities, such as medical images with textual reports or physiological signals, enables a more comprehensive understanding of the medical condition and facilitates better decision-making. Ultimately, hybrid methods in medical image analysis empower healthcare professionals with more accurate and reliable tools for diagnosis, treatment planning, and patient care. In this regard, Shahzadi et al. ( 56 ) proposed a novel cascaded framework for accurately classifying brain tumors using a combination of convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. The proposed approach utilized the CNN’s capability to extract significant features from brain tumor images and the LSTM’s capacity to capture temporal dependencies present in the data. The cascaded framework comprised of two stages: firstly, a CNN was utilized to extract features from the tumor images, and subsequently, an LSTM network was employed to model the temporal information within these extracted features. The experimental findings clearly illustrate the exceptional performance of the CNN-LSTM framework when compared to other cutting-edge methods, exhibiting remarkable accuracy in the classification of brain tumors. The proposed method held promise for improving the diagnosis and treatment planning of brain tumors, ultimately benefiting patients and healthcare professionals in the field of neuro-oncology.

Also, Srikantamurthy et al. ( 57 ) proposed a hybrid approach for accurately classifying benign and malignant subtypes of breast cancer using histopathology imaging. Transfer learning was used to combine the strengths of long short-term memory (LSTM) networks with convolutional neural networks (CNNs) in a synergistic manner. The histopathological pictures were initially processed by the CNN to extract relevant characteristics, which were then sent into the LSTM network for sequential analysis and classification. By harnessing transfer learning, the model capitalized on pre-trained CNNs trained on extensive datasets, thereby facilitating efficient representation learning. The proposed hybrid approach showed promising results in accurately distinguishing between benign and malignant breast cancer subtypes, contributing to improved diagnosis and treatment decisions in breast cancer patients.

Besides, Banerjee et al. ( 58 ) presented a hybrid approach combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for the classification of histopathological breast cancer images. Using data augmentation approaches, the classifier’s robustness is increased. ResNet50, InceptionV3, and a CNN that has been pretrained on ImageNet are used to extract deep convolutional features. An LSTM Recurrent Neural Network (RNN) is then fed these features for classification. Comparing the performance of three alternative optimizers, it is found that Adam outperforms the others without leading to model overfitting. The experimental findings showed that, for both binary and multi-class classification problems, the suggested strategy outperforms cutting-edge approaches. Furthermore, the method showed promise for application in the classification of other types of cancer and diseases, making it a versatile and potentially impactful approach.

Moreover, Nandhini Abirami et al. ( 59 ) explored the application of deep Convolutional Neural Networks (CNNs) and deep Generative Adversarial Networks (GANs) in computational visual perception-driven image analysis. To increase the precision and resilience of image analysis tasks, the authors suggested a unique framework that combines the advantages of both CNNs and GANs. The deep GAN is used to create realistic and high-quality synthetic pictures, while the deep CNN is used for feature extraction and capturing high-level visual representations. The combination of these two deep learning models made it possible to analyze images more efficiently, especially when performing tasks like object identification, picture recognition, and image synthesis. Experimental results demonstrated the superiority of the proposed framework over traditional approaches, highlighting the potential of combining deep CNNs and GANs for advanced computational visual perception in image analysis.

Additionally, Yao et al. ( 60 ) proposed a parallel structure deep neural network for breast cancer histology image classification, combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with an attention mechanism. The histology pictures’ ability to extract both local and global characteristics thanks to the parallel construction improved the model’s capacity to gather pertinent data. The CNN component concentrated on obtaining spatial characteristics from picture patches, whereas the RNN component sequentially captured temporal relationships between patches. By focusing attention on key visual areas, the attention mechanism improved the model’s capacity for discrimination. The suggested method’s potential for accurate breast cancer histology picture categorization was shown by experimental findings, which showed that it performs better than baseline approaches. Table 7 discusses the hybrid methods used in medical image analysis.

www.frontiersin.org

Table 7 . The methods, properties, and features of hybrid-medical image analysis mechanisms.

6. Results and comparisons

The utilization of DL algorithms in medical image analysis purposes represents a pioneering stride toward the progress of medical and healthcare industries. This paper presents various innovative applications that demonstrate this paradigm, showcasing advanced knowledge in medical image analysis for motivating readers to explore innovative categories pertaining to DL algorithms in medical image analysis. The primary focus of this work is on different classifications of DL techniques utilized for DL methods in medical image analysis. Through a comprehensive analysis, it has been discovered that most DL methods in medical image analysis concentrate on advanced datasets, combined learning tasks, and annotation protocols. However, a significant limitation toward achieving the same level of functionality in medical images-DL algorithms is the inadequacy of large datasets for training, and standardized collection of data. It is crucial to ensure that diverse types of data require larger and more diverse datasets to provide reliable outcomes. Detection tasks in this field predominantly employ CNN or CNN-based techniques. In most of investigated papers the authors evaluated the topic based on several attributes, including accuracy, F-score, AUC, sensitivity, specificity, robustness, recall, adaptability, and flexibility. Sections 5.1 to 5.5 illustrate the medical image analysis-DL algorithms, where the majority of the proposed methods use both benchmark and real-time data. The DL methods used in these sections has been demonstrated in Figure 8 . The systems employed various datasets in terms of numbers and diverse categories, with accuracy, computational complexity, sensitivity, specificity, robustness, generalizability, adaptability, scalability, and F-score being the primary parameters evaluated. Accuracy was the main parameter for image analysis-based systems, whereas transparency was the least applied parameter as is depicted in Figure 9 . Its importance lies behind its direct impact on patient outcomes and healthcare decision-making. Medical image analysis plays a critical role in diagnosing and monitoring various diseases and conditions, and any inaccuracies or errors in the analysis can have serious consequences. High accuracy ensures that the deep learning algorithms can effectively and reliably detect abnormalities, classify different tissue types, and provide accurate predictions. This enables healthcare professionals to make well-informed decisions regarding treatment plans, surgical interventions, and disease management. Furthermore, accurate analysis helps reduce misdiagnosis rates, minimizes unnecessary procedures or tests, and improves overall patient care by enabling timely and appropriate interventions. In order to guarantee the efficiency and dependability of deep learning algorithms in medical image processing, accuracy acts as a crucial criterion. The majority of the solutions used the data normalization approach to combine photos from various sources that were of comparable size and quality. Some of the systems offered, however, did not provide the compute time since different datasets were utilized in the study. The datasets used in the study varied in terms of sample size, accessibility requirements, picture size, and classes. One of the most often employed algorithms was the RNN method, although cross-validation was seldom ever applied in most studies. Given that it is uncertain how the test results fluctuate, this might potentially reduce the outcomes’ resilience while delivering a high-functioning model. It is worth mentioning that cross-validation is crucial for evaluating the entire dataset. Multiple studies employ DL-based methodologies, and it is challenging to establish clear, robust, and resilient models. Future tasks include minimizing false-positive and false-negative rates to emphasize viral from bacterial pneumonia dependability. Associating DL methods in for developing medical image analysis represents a groundbreaking pace forward in technological development. It is worth mentioning that as is demonstrated in Figure 10 , Python is the most common programming language used in this context due to several key factors. Firstly, Python offers a rich ecosystem of libraries and frameworks specifically tailored for machine learning and deep learning tasks, such as TensorFlow, PyTorch, and Keras. These libraries provide efficient and user-friendly tools for developing and deploying deep learning models. Additionally, Python’s simplicity and readability make it an accessible language for researchers, clinicians, and developers with varying levels of programming expertise. Its extensive community support and vast online resources further contribute to its popularity. Moreover, Python’s versatility allows seamless integration with other scientific computing libraries, enabling researchers to preprocess, visualize, and analyze medical image data efficiently. Its wide adoption in academia, industry, and research communities fosters collaboration and knowledge sharing among experts in the field. Overall, Python’s powerful capabilities, ease of use, and collaborative ecosystem make it the preferred choice for implementing deep learning algorithms in medical image analysis. In the domain of Medical Image Analysis using Deep Learning Algorithms, diverse methodologies are employed to extract meaningful insights from complex medical imagery. CNNs are extensively utilized for their ability to automatically identify intricate patterns and features within images. RNNs, on the other hand, are crucial when dealing with sequential medical image data, such as video sequences or time-series images, as they capture temporal dependencies. Additionally, GANs play a pivotal role, especially in tasks requiring image generation or translation. Hybrid models, which integrate different architectures like CNNs and RNNs, offer a versatile approach for handling diverse types of medical image data that may require both spatial and temporal analysis. These methodologies are implemented and simulated within specialized environments, commonly leveraging Python libraries like TensorFlow, PyTorch, and Keras, which provide comprehensive support for deep learning. GPU acceleration is often utilized to expedite model training due to the computational intensity of deep learning tasks. Furthermore, custom simulation environments may be created to mimic specific aspects of medical imaging processes. The choice of datasets is paramount; researchers may draw from open-access repositories like ImageNet for pre-training, but specialized medical imaging repositories such as TCIA or RSNA are crucial for tasks in healthcare. Additionally, custom-collected datasets tailored to specific medical image analysis tasks are often employed to ensure data relevance and quality. Data augmentation techniques, like rotation and scaling, are applied to expand datasets and mitigate limitations associated with data scarcity. These synergistic efforts in methodologies, simulation environments, and datasets are essential for the successful development and evaluation of deep learning algorithms in medical image analysis, facilitating accurate and reliable results for a wide array of healthcare applications.

www.frontiersin.org

Figure 8 . DL methods used in medical image analysis.

www.frontiersin.org

Figure 9 . The most important parameters considered in investigated papers.

www.frontiersin.org

Figure 10 . Programming languages used in learning algorithms used for medical image analysis.

6.1. Convolutional neural network

CNNs have been used successfully in medical image processing applications, however they also have significant drawbacks and difficulties. Due to the high expense and complexity of image collecting and annotation, it may be challenging to get the vast quantity of labeled data needed to train the network in the context of medical imaging. Additionally, the labeling procedure may add some subjectivity or inter-observer variability, which can influence the CNN models’ accuracy and dependability ( 61 ). A further issue is the possible bias of CNN models toward the distribution of training data, which might result in subpar generalization performance on fresh or untried data. This is particularly relevant in medical imaging, where the patient population may be diverse and heterogeneous, and the image acquisition conditions may vary across different imaging modalities and clinical settings. Furthermore, the interpretability of CNN models in medical imaging is still a major concern, as they typically rely on complex and opaque learned features that are difficult to interpret or explain. This limits the ability of clinicians to understand and trust the decisions made by the CNN models, and may hinder their adoption in clinical practice. Finally, CNN models are computationally intensive and require significant computational resources, which may limit their scalability and practical use in resource-constrained environments or low-resource settings ( 62 ).

The CNN method offers several benefits in the context of healthcare applications. Firstly, CNNs can automatically learn relevant features from raw input data such as medical images or physiological signals, without requiring manual feature extraction. This makes them highly effective for tasks such as image classification, object detection, and segmentation, and can lead to more accurate and efficient analyzes. Secondly, CNNs can handle large amounts of complex data and improve classification accuracy, making them well-suited for medical diagnosis and prediction ( 63 ). Additionally, CNNs can be trained on large datasets, which can help in detecting rare or complex patterns in the data that may be difficult for humans to identify. Finally, the use of deep learning algorithms such as CNNs in healthcare applications has the potential to improve patient outcomes, enable early disease detection, and reduce medical costs.

6.2. Recurrent neural network

Recurrent Neural Networks (RNNs) have shown great success in modeling sequential data such as time series and natural language processing tasks. However, in medical image analysis, there are some challenges and limitations when using RNNs. RNNs are mainly designed to model temporal sequences and do not have a natural way of handling spatial information in images. This can limit their ability to capture local patterns and spatial relationships between pixels in medical images. RNNs require a lot of computational power to train, especially when dealing with large medical image datasets ( 64 ). This can make it difficult to train models with high accuracy. When training deep RNN models, the gradients can either vanish or explode, making it difficult to optimize the model parameters effectively. This can lead to longer training times and lower accuracy. RNNs are prone to overfitting when the size of the training dataset is small. This can result in poor generalization performance when the model is applied to new, unseen data. Unbalanced data: In medical image analysis, the dataset may be highly unbalanced, with a small number of positive cases compared to negative cases. This can make it difficult to train an RNN model that can accurately classify the data. Researchers have created a variety of RNN-based designs, including Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), which have demonstrated promising performance in applications involving medical picture interpretation. Additionally, combining RNNs with other deep learning techniques such as CNNs can help improve performance by capturing both spatial and temporal features ( 65 ).

It’s possible that these papers faced some challenges when using the RNN method. RNNs can suffer from vanishing gradients, where the gradients used for optimization become very small and make learning slow or even impossible. This can be a challenge for RNNs when working with long sequences of data. Overfitting is a problem with RNNs, when the model gets too complicated and begins to memorize the training set rather than generalizing to new data. When working with little data, like in applications for the healthcare industry, this can be particularly difficult. RNNs may be difficult to train computationally, especially when working with big volumes of data ( 66 ). This can be a challenge when working with IoT devices that have limited computational resources. There are many different types of RNNs and architectures to choose from, each with its own strengths and weaknesses. It can be challenging to select the right architecture for a given task. Overall, while RNNs can be powerful tools for analyzing time-series data in IoT applications, they do come with some potential challenges that must be carefully considered when using them.

6.3. Generative adversarial network

Generative Adversarial Networks (GANs) have shown promising results in various fields, including medical image analysis. However, GANs also face some challenges and limitations, which can affect their performance in medical image analysis. Medical image datasets are often limited due to the cost and difficulty of acquiring large amounts of high-quality data. To correctly understand the underlying distribution of the data, GANs need a lot of data. Therefore, when working with tiny medical picture datasets, the performance of GANs may be constrained ( 67 ). Medical picture databases may not be evenly distributed, which means that some classifications or diseases are underrepresented. For underrepresented classes or circumstances, GANs could find it difficult to provide realistic examples. Regardless of the input, mode collapse happens when a GAN’s generator learns to produce only a small number of samples. Mode collapse in medical image processing can lead to the creation of irrational pictures or the loss of crucial data. Overfitting is a problem with GANs that happens when the model memorizes the training data rather than generalizing to. There is currently no standardization for evaluating GANs in medical image analysis. This can make it challenging to compare different GAN models and assess their performance accurately. Addressing these challenges and limitations requires careful consideration of the specific medical image analysis task, the available data, and the design of the GAN model. Moreover, a multi-disciplinary approach involving clinicians, radiologists, and computer scientists is necessary to ensure that the GAN model’s outputs are meaningful and clinically relevant ( 68 ).

6.4. Long short-term memory

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network that has shown promising results in various applications, including medical image analysis. However, LSTMs also face some challenges and limitations, which can affect their performance in medical image analysis. LSTMs rely on a fixed-length input sequence, and the context provided by the input sequence may be limited, especially in the case of medical image analysis. For example, in a sequence of medical images, it may be challenging to capture the full context of the images in a fixed-length input sequence. LSTMs can be prone to overfitting, especially when dealing with small datasets. When the model starts to memorize the training data instead of generalizing to new, untried data, overfitting might happen. This might lead to subpar performance when the model is tested on fresh medical photos ( 69 ). LSTMs are sometimes referred to as “black box” models since it might be difficult to understand how the model generates its predictions. This can be a limitation in medical image analysis, where clinicians need to understand how the model arrived at its decision. LSTMs can be computationally expensive, especially when dealing with long input sequences or large medical image datasets. This can make it challenging to train the model on a standard computer or within a reasonable time frame. Medical image datasets can be imbalanced, meaning that certain classes or conditions are underrepresented in the dataset. LSTMs may struggle to learn the patterns of underrepresented classes or conditions. LSTMs may have limited generalizability to new medical image datasets or different medical conditions, especially if the model is trained on a specific dataset or medical condition Addressing these challenges and limitations requires careful consideration of the specific medical image analysis task, the available data, and the design of the LSTM model. Moreover, a multi-disciplinary approach involving clinicians, radiologists, and computer scientists is necessary to ensure that the LSTM model’s outputs are meaningful and clinically relevant. Additionally, techniques such as data augmentation, transfer learning, and model compression can be used to improve the performance of LSTMs in medical image analysis ( 70 ).

6.5. Hybrid

The reason for using hybrid methods, such as combining CNN and LSTM, is that they have complementary strengths and weaknesses. CNN is particularly good at extracting spatial features from high-dimensional data such as images, while LSTM is good at modeling temporal dependencies in sequences of data. By combining them, one can leverage the strengths of both to improve the accuracy of the prediction. Additionally, hybrid methods can be used to address challenges such as overfitting, where the model may become too specialized on the training data, and underfitting, where the model may not capture the underlying patterns in the data ( 71 ). Hybrid models can also provide a more robust approach to dealing with noisy or missing data by allowing for more complex interactions between features and time.

The use of hybrid approaches, like CNN-LSTM, in medical image analysis with deep learning algorithms, presents several challenges and limitations. Firstly, the complexity of the network architecture poses a significant hurdle in training these models. Integrating different models with diverse parameters, loss functions, and optimization algorithms can lead to suboptimal performance, potentially causing overfitting or underfitting issues, which adversely impact accuracy and generalizability ( 72 ). Secondly, a major challenge lies in obtaining a substantial amount of data to effectively train hybrid models. Medical image data is often scarce and costly to acquire, thereby restricting the capacity to train deep learning models comprehensively ( 73 ). Furthermore, medical image data’s high variability and subjectivity can compromise the training data quality and model performance. Moreover, interpreting the results generated by hybrid models can be problematic. The models’ complexity may obscure the understanding of how they arrive at predictions or classifications, limiting their practicality in clinical practice and possibly raising doubts or skepticism among medical professionals. Lastly, the computational cost associated with training and deploying hybrid models can be prohibitive ( 74 ). These models demand powerful hardware and are computationally intensive, limiting their applicability in real-world medical settings. The ability to utilize the capabilities of both models and enhance the accuracy and performance of the entire system are two advantages of utilizing hybrid approaches, such as the CNN-LSTM model. The CNN layer is utilized in the CNN-LSTM model-based COVID-19 prediction to extract spatial characteristics from the data, while the LSTM layer is used to capture temporal relationships and provide predictions based on time series data. Similar to how the CNN layer is used to extract spatial information from the EEG data in the low-invasive and low-cost BCI headband, the LSTM layer is used to collect temporal relationships and categorize the signals. When reconstructing an ECG signal using a Doppler sensor, the hybrid. Overall, the hybrid models can provide better performance and accuracy compared to using either model alone ( 75 ).

The utilization of hybrid methods, such as the CNN-LSTM model, offers various advantages, including the amalgamation of both models’ strengths to enhance the overall system’s accuracy and performance. For instance, the CNN layer is used to extract spatial characteristics from the data in the COVID-19 prediction using the CNN-LSTM model, while the LSTM layer collects temporal relationships and makes predictions based on the time series data. Similar to how the CNN layer gets spatial information from the EEG data in the instance of EEG detection using a low-invasive and affordable BCI headband, the LSTM layer captures temporal relationships and categorizes the signals ( 76 ). The hybrid model makes use of the CNN layer to extract high-level features from the Doppler signal in the context of reconstructing an ECG signal using a Doppler sensor, and the LSTM layer makes use of the derived features to help reconstruct the ECG signal. In summary, employing hybrid models can yield superior performance and accuracy compared to using either model individually. This approach enables the combination of spatial and temporal information, harnessing the strengths of both CNN and LSTM models to enhance various applications such as COVID-19 prediction, EEG detection, and ECG signal reconstruction.

6.6. Prevalent evaluation criteria

Due to its capacity to increase the precision and efficacy of medical diagnosis and therapy, deep learning algorithms for medical image analysis have grown in popularity in recent years. In evaluating the performance of deep learning algorithms in medical image analysis, there are several prevalent evaluation criteria, which are described below ( 13 ).

6.6.1. Accuracy

Accuracy is the most commonly used metric for evaluating the performance of deep learning algorithms in medical image analysis. It measures the percentage of correctly classified images or regions of interest (ROIs) in medical images.

6.6.2. Sensitivity and specificity

Sensitivity measures the proportion of true positive results, which are the number of positive cases that are correctly identified by the algorithm. Specificity measures the proportion of true negative results, which are the number of negative cases that are correctly identified by the algorithm. Both metrics are used to evaluate the diagnostic performance of deep learning algorithms in medical image analysis.

6.6.3. Precision and recall

Precision measures the proportion of true positive results among all the positive cases identified by the algorithm. Recall measures the proportion of true positive results among all the positive cases in the ground truth data. Both metrics are used to evaluate the performance of deep learning algorithms in medical image analysis, particularly in binary classification tasks.

6.6.4. F1-score

The F1-score is a metric that combines precision and recall into a single score. It is often used to evaluate the performance of deep learning algorithms in medical image analysis, particularly in binary classification tasks.

6.6.5. Hausdorff distance

The Hausdorff distance is a metric that measures the maximum distance between the boundaries of two sets of ROIs in medical images. It is often used to evaluate the segmentation accuracy of deep learning algorithms in medical image analysis.

In general, the unique task and setting of the medical image analysis determine the selection of assessment criteria. In order to evaluate the outcomes of deep learning algorithms in the context of clinical practice, it is crucial to choose appropriate assessment criteria that are pertinent to the therapeutic demands.

6.7. Challenges of the DL applications in medical image analysis

The lack of high-quality annotated data is one of the greatest problems with deep learning (DL) algorithms used for medical image analysis. For DL models to perform well and generalize, they need a lot of labeled data. But getting high-quality annotations for medical photos is challenging for a number of reasons: restricted accessibility: Because it is expensive and time-consuming to capture and annotate medical pictures, the amount of data from annotated images is constrained ( 76 ). Additionally, the process of annotating calls for medical professionals with particular training and understanding, who are not always available. Due to changes in patient anatomy, imaging modality, and disease pathology, medical pictures are complicated and extremely varied. Annotating medical images requires a high degree of accuracy and consistency, which can be challenging for complex and heterogeneous medical conditions. Privacy and ethical issues: The annotation process has the potential to make medical photographs containing sensitive patient data vulnerable to abuse or unauthorized access. Medical image analysis has a significant difficulty in protecting patient privacy and confidentiality while preserving the caliber of annotated data. Annotating medical pictures requires making subjective assessments, which might result in bias and variability in the annotations. These variables may have an impact on the effectiveness and generalizability of DL models, especially when the annotations are inconsistent among datasets or annotators ( 77 ). To address the challenge of limited availability of high-quality annotated data, several approaches have been proposed, including:

• Transfer learning: To enhance the performance of DL models on smaller datasets, transfer learning uses pre-trained models that have been learned on big datasets. By using this method, the volume of annotated data needed to train DL models may be decreased, and the generalizability of the models can be increased.

• Data augmentation: By applying modifications to already-existing, annotated data, data augmentation includes creating synthetic data. The diversity and quantity of annotated data available for DL model training may be increased using this method, and it can also raise the models’ resistance to fluctuations in medical pictures.

• Active learning: Active learning involves selecting the most informative and uncertain samples for annotation, rather than annotating all the data. This approach can reduce the annotation workload and improve the efficiency of DL model training.

• Collaborative annotation: Collaborative annotation involves engaging medical experts, patients, and other stakeholders in the annotation process to ensure the accuracy, consistency, and relevance of annotations to clinical needs and values.

Overall, addressing the challenge of limited availability of high-quality annotated data in medical image analysis requires a combination of technical, ethical, and social solutions that can improve the quality, quantity, and diversity of annotated data while ensuring patient privacy and ethical standards.

Deep learning algorithms for medical image analysis have a significant problem in terms of data quality. The model’s performance may be considerably impacted by the caliber of the data utilized to train the deep learning algorithms ( 78 ). Obtaining medical pictures may be difficult, and their quality can vary based on a number of variables, such as the image capture equipment used, the image resolution, noise, artifacts, and the imaging technique. Furthermore, the annotations or labels used for training can also impact the quality of the data. Annotations may not always be accurate, and they may suffer from inter-and intra-observer variability, which can lead to biased models or models with poor generalization performance. To overcome the challenge of data quality, researchers need to establish robust quality control procedures for both image acquisition and annotation. Additionally, they need to develop algorithms that can handle noisy or low-quality data and improve the accuracy of annotations. Finally, they need to develop methods to evaluate the quality of the data used to train the deep learning models ( 79 ).

Interpretability poses a significant challenge in medical image analysis when employing deep learning algorithms, primarily due to the conventional black-box nature of these models, which makes it arduous to comprehend the reasoning behind their predictions. This lack of interpretability hinders clinical acceptance, as healthcare professionals necessitate understanding and trust in a model’s decision-making process to utilize it effectively. Moreover, interpretability plays a vital role in identifying and mitigating biases within the data and model, ensuring that decisions are not influenced by irrelevant or discriminatory features. Various approaches have been developed to enhance the interpretability of deep learning models for medical image analysis ( 80 ). These approaches include visualization techniques, saliency maps, and model explanations. Nonetheless, achieving complete interpretability remains a challenge in this field as it necessitates a trade-off between model performance and interpretability. Striking the right balance between these factors remains an ongoing endeavor. Transferability refers to the ability of a deep learning model trained on a particular dataset to generalize and perform well on new datasets that have different characteristics. In the context of medical image analysis, transferability is a significant challenge due to the diversity of medical imaging data, such as variations in image quality, imaging protocols, and imaging modalities. Deep learning models that are trained on a specific dataset may not perform well on different datasets that have variations in data quality and imaging characteristics. This can be problematic when developing deep learning models for medical image analysis because it is often not feasible to train a new model for every new dataset. To address this challenge, researchers have explored techniques such as transfer learning and domain adaptation. Transfer learning involves using a pre-trained model on a different but related dataset to initialize the model weights for the new dataset, which can improve performance and reduce the amount of training required. Domain adaptation involves modifying the model to account for the differences between the source and target domains, such as differences in imaging protocols or modalities ( 81 ). However, the challenge of transferability remains a significant issue in medical image analysis, and there is ongoing research to develop more robust and transferable deep learning models for this application.

In deep learning-based medical image analysis, overfitting is a frequent problem when a model gets overly complicated and fits the training data too closely, leading to poor generalization to new, unforeseen data. Numerous factors, including the inclusion of noise in the training data, an unbalanced class distribution, or a lack of training data, can lead to overfitting ( 64 ). The latter is a prevalent problem in medical imaging since the dataset size is constrained by the absence of annotated data. Overfitting can provide erroneous positive or negative findings because it can produce high accuracy on training data but poor performance on validation or testing data. To avoid overfitting in deep learning, several strategies may be used, including regularization, early halting, and data augmentation. In medical image analysis, ensuring the quality of data and increasing the size of the dataset are essential to prevent overfitting.

Clinical adoption refers to the process of integrating new technologies or methodologies into clinical practice. In the context of medical image analysis using deep learning algorithms, clinical adoption is a challenge because it requires a significant change in how physicians and healthcare providers diagnose and treat patients ( 82 ). Clinical adoption involves not only technical considerations such as integrating the algorithms into existing systems and workflows, but also ethical, legal, and regulatory considerations, as well as training healthcare providers to use the new technology effectively and safely. One of the key challenges of clinical adoption is ensuring that the deep learning algorithms are accurate and reliable enough to be used in clinical decision-making. This requires rigorous validation and testing of the algorithms, as well as addressing concerns around the interpretability and generalizability of the results. Additionally, healthcare providers and patients may have concerns about the use of these algorithms in making medical decisions, particularly if the algorithms are seen as replacing or minimizing the role of the human clinician. Another challenge of clinical adoption is the need for regulatory approval, particularly in cases where the algorithms are used to support diagnosis or treatment decisions. Regulatory bodies such as the FDA may require clinical trials to demonstrate the safety and effectiveness of the algorithms before they can be used in clinical practice. The adoption of these technologies may be slowed down by this procedure since it can be time-consuming and expensive. Overall, clinical adoption is an important challenge to consider in the development and deployment of medical image analysis using deep learning algorithms, as it affects the ultimate impact of these technologies on patient care ( 83 ).

6.8. Dataset in medical image analysis using ML algorithms

In medical image analysis, a dataset is a collection of medical images that are used to train machine learning algorithms to detect and classify abnormalities or diseases. The dataset could be obtained from various sources such as clinical trials, imaging studies, or public repositories ( 84 ). The dataset’s data quality and size have a significant impact on how well the machine learning algorithm performs. Therefore, a dataset should be diverse and representative of the population under study to ensure the accuracy and generalizability of the algorithm. In addition, datasets may require pre-processing, such as normalization or augmentation, to address issues such as data imbalance, low contrast, or artifacts. A fundamental issue in the field of medical image analysis is still finding and using big, carefully managed medical picture databases. However, efforts are underway to improve the quality and availability of medical image datasets for researchers to advance the development of ML algorithms for medical diagnosis and treatment. In medical image analysis using machine learning (ML) algorithms, a dataset is a collection of images that are used to train and test ML models. Any ML project must include a dataset since the dataset’s size and quality directly affect how well the model performs. Obtaining and annotating medical photos from a variety of sources, including hospitals, clinics, and research organizations, is a standard step in the process of producing a dataset ( 85 ). To specify the areas of interest or characteristics that the ML model needs to learn, the pictures must be tagged. These labels could provide details about the disease shown in the picture, the anatomy of the area being imaged, or other pertinent facts. The training set and the test set are formed once the dataset is first established. The ML model is trained using the training set, and tested using the test set. As such, there is ongoing research in the field of medical image analysis aimed at improving dataset quality and size, as well as developing better methods for acquiring and labeling medical images ( 74 , 86 ).

6.9. Security issues, challenges, risks, IoT and blockchain usage

Medical image analysis using deep learning algorithms raises several security issues, particularly with regard to patient privacy and data protection. The medical images used for training the deep learning models may contain sensitive information, such as personally identifiable information (PII), health records, and demographic information, which must be kept confidential and secure. One of the main security issues is the risk of data breaches, which can occur during the data collection, storage, and transmission stages. Hackers or unauthorized personnel can intercept the data during transmission, gain access to the storage systems, or exploit vulnerabilities in the software or hardware infrastructure used to process the data ( 13 ). To mitigate this risk, various security measures must be put in place, such as encryption, access controls, and monitoring tools ( 87 ). Another security issue is the possibility of malicious attacks on the deep learning models themselves. Attackers can attempt to manipulate the models’ outputs by feeding them with malicious inputs, exploiting vulnerabilities in the models’ architecture or implementation, or using adversarial attacks to deceive the models into making wrong predictions. This can have serious consequences for patient diagnosis and treatment, and thus, it is critical to design and implement secure deep learning models. In summary, security is a critical concern in medical image analysis using deep learning algorithms, and it is essential to adopt appropriate security measures to protect the confidentiality, integrity, and availability of medical data and deep learning models.

There are several risks associated with medical image analysis using deep learning algorithms. Some of the key risks include:

• Inaccuracy: Deep learning algorithms may sometimes provide inaccurate results, which can lead to incorrect diagnoses or treatment decisions.

• Bias: Deep learning algorithms may exhibit bias in their decision-making processes, leading to unfair or inaccurate results for certain groups of patients.

• Privacy concerns: Medical images often contain sensitive information about patients, and there is a risk that this data could be exposed or compromised during the analysis process.

• Cybersecurity risks: As with any technology that is connected to the internet or other networks, there is a risk of cyberattacks that could compromise the security of medical images and patient data.

• Lack of transparency: Deep learning algorithms can be difficult to interpret, and it may be challenging to understand how they arrive at their conclusions. This lack of transparency can make it difficult to trust the results of the analysis.

Overall, it is important to carefully consider these risks and take steps to mitigate them when using deep learning algorithms for medical image analysis. This includes implementing strong cybersecurity measures, ensuring data privacy and confidentiality, and thoroughly validating the accuracy and fairness of the algorithms.

The term “Internet of Things” (IoT) describes how physical “things” are linked to the internet so they can trade and gather data. IoT may be used to link medical imaging devices and enable real-time data collecting and analysis in the field of medical image analysis. For instance, a network may be used to connect medical imaging equipment like CT scanners, MRIs, and ultrasounds, which can then transfer data to a cloud-based system for analysis ( 88 ). This can facilitate remote consultations and diagnostics and speed up the examination of medical images. IoT can also make it possible to combine different medical tools and data sources, leading to more thorough and individualized patient treatment. However, the use of IoT in medical image analysis also raises security and privacy concerns, as sensitive patient data is transmitted and stored on a network that can be vulnerable to cyber-attacks.

7. Open issues

There are several open issues related to medical image analysis using deep learning algorithms. These include:

7.1. Data privacy

Data privacy is a major concern in medical image analysis using deep learning algorithms. Medical images contain sensitive patient information that must be kept confidential and secure. In order to secure patient data from illegal access, usage, or disclosure, any algorithm or system used for medical image analysis must follow this rule. This can be particularly difficult since medical image analysis sometimes involves enormous volumes of data, which raises the possibility of data breaches or unwanted access. The need to strike a balance between the demands of data access and patient privacy protection is one of the primary issues with data privacy in medical image analysis. Many medical image analysis algorithms rely on large datasets to achieve high levels of accuracy and performance, which may require sharing data between multiple parties ( 89 ). This can be particularly challenging when dealing with sensitive patient information, as there is a risk of data leakage or misuse. Several methods may be utilized to protect data privacy in medical image analysis in order to deal with these issues. These include rules and processes to guarantee that data is accessed and used only for legal purposes, data anonymization, encryption, and access restrictions. Additionally, to guarantee that patient data is safeguarded and handled properly, healthcare companies must ensure that they adhere to pertinent data privacy laws, such as HIPAA in the United States or GDPR in the European Union.

7.2. Data bias

When employing deep learning algorithms to analyze medical images, data bias is a serious open problem. It alludes to the fact that the data used to train the deep learning models contains systematic flaws ( 90 ). These blunders may result from variables including the choice of training data, how the data is labeled, and how representative the data are of the population of interest. Data bias can result in the creation of models that underperform on particular segments of the population, such as members of underrepresented groups or those who suffer from unusual medical diseases. This can have serious implications for the accuracy and fairness of medical image analysis systems, as well as for the potential harm caused to patients if the models are used in clinical decision-making. Addressing data bias requires careful consideration of the data sources, data labeling, and model training strategies to ensure that the models are representative and unbiased ( 91 ).

7.3. Limited availability of annotated data

Deep learning algorithms in medical image analysis need a lot of annotated data to be taught properly. Annotated data refers to medical images that have been labeled by experts to indicate the location and type of abnormalities, such as tumors, lesions, or other pathologies. However, obtaining annotated medical image datasets is particularly challenging due to several factors. First off, annotating medical photos takes time and requires in-depth understanding. Only experienced radiologists or clinicians can accurately identify and label abnormalities in medical images, which can limit the availability of annotated data. Secondly, there are privacy concerns associated with medical image data. Patient privacy is a critical concern in healthcare, and medical image data is considered particularly sensitive ( 92 ). As a result, obtaining large-scale annotated medical image datasets for deep learning is challenging due to privacy concerns and the need to comply with regulations such as HIPAA. Thirdly, the diversity of medical image data can also pose a challenge. Medical images can vary widely in terms of modality, acquisition protocols, and image quality, making it difficult to create large, diverse datasets for deep learning. Deep learning algorithms for medical image analysis may be limited in their ability to develop and be validated as a result of the difficulties in getting datasets of annotated medical images. In order to decrease the volume of labeled data needed for training, researchers have tackled this issue by adopting methods including transfer learning, data augmentation, and semi-supervised learning ( 93 ). However, these techniques may not be sufficient in all cases, and there is a need for more annotated medical image datasets to be made available to researchers to advance the field of medical image analysis using deep learning.

7.4. Interpretability and transparency

When employing deep learning algorithms for medical picture analysis, interpretability and transparency are crucial concerns. Deep learning models are sometimes referred to as “black boxes” because they can be tricky to read, making it difficult to comprehend how they made judgments. In medical image analysis, interpretability is essential for clinicians to understand and trust the algorithms, as well as to identify potential errors or biases. Interpretability refers to the ability to understand the reasoning behind a model’s decision-making process. Convolutional neural networks (CNNs), one type of deep learning model, can include millions of parameters that interact in intricate ways. This complexity can make it difficult to understand how the model arrived at a particular decision, especially for clinicians who may not have experience with deep learning. Transparency refers to the ability to see inside the model and understand how it works ( 94 ). In other words, transparency means that the model’s decision-making process is clear and understandable, and can be validated and audited. Transparency is essential for ensuring that the model is working correctly and not introducing errors or biases. In medical image analysis, interpretability and transparency are critical because clinicians need to understand how the algorithm arrived at its decisions. This understanding can help clinicians identify errors or biases and ensure that the algorithm is making decisions that are consistent with clinical practice. To increase the interpretability and transparency of deep learning models in medical image analysis, several techniques have been developed. For instance, heatmaps that display which areas of an image the model is utilizing to make judgments may be produced using visualization approaches. Additionally, attention mechanisms can be used to highlight important features in an image and explain the model’s decision-making process. Other techniques include using explainable AI (XAI) methods and incorporating domain knowledge into the models. While these techniques have shown promise, there is still a need for more transparent and interpretable deep learning models in medical image analysis to improve their utility in clinical practice.

7.5. Generalizability

A significant unresolved problem in deep learning-based medical picture analysis is generalizability. The capacity of a model to function effectively on data that differs from the data it was trained on is referred to as generalizability. In other words, a trained model should be able to generalize to other datasets and still perform well. In medical image analysis, generalizability is critical because it ensures that the deep learning algorithms can be used on new patient populations or in different clinical settings. However, deep learning models can be prone to overfitting, which occurs when a model performs well on the data it was trained on but performs poorly on new data. This can be particularly problematic in medical image analysis, where a model that overfits can lead to inaccurate or inconsistent diagnoses. The generalizability of deep learning models for medical image processing might vary depending on a number of variables. For instance, a model’s capacity to generalize can be significantly impacted by the variety of the dataset used to train it ( 95 ). The model might not be able to identify anomalies that it has never seen before if the training dataset is not sufficiently varied. Another factor that can affect generalizability is the performance of the model on different types of medical images. For example, a model that is trained on CT scans may not perform well on MRI scans because the image modality and acquisition protocols are different. Researchers are examining methods including transfer learning, data augmentation, and domain adaptation to increase the generalizability of deep learning models in medical picture analysis. Transfer learning entails fine-tuning a pre-trained model using a fresh dataset as a starting point. Data augmentation entails using transformations like rotations and translations to artificially expand the size and variety of the training dataset. The process of domain adaptation is modifying a model that has been trained on one dataset to function on another dataset with different properties. The generalizability of deep learning models in medical image processing has to be improved in order to assure their safe and efficient application in clinical practice, even if these approaches have showed promise ( 96 ).

7.6. Validation and regulatory approval

Validation and regulatory approval are important open issues in medical image analysis using deep learning algorithms. Validation refers to the process of verifying that a model is accurate and reliable. Regulatory approval refers to the process of obtaining approval from regulatory bodies, such as the FDA in the US, before a model can be used in clinical practice. Validation is critical in medical image analysis because inaccurate or unreliable models can lead to incorrect diagnoses and treatment decisions. Validation involves testing the model on a separate dataset that was not used for training and evaluating its performance on a range of metrics. Validation can also involve comparing the performance of the model to that of human experts. Regulatory approval is important in medical image analysis to ensure that the models are safe and effective for use in clinical practice. Regulatory bodies require evidence of the model’s safety, efficacy, and performance before approving it for use. This evidence can include clinical trials, real-world data studies, and other forms of validation. There are several challenges associated with validation and regulatory approval of deep learning models in medical image analysis. One challenge is the lack of standardized validation protocols, which can make it difficult to compare the performance of different models ( 97 ). Another challenge is the lack of interpretability and transparency of deep learning models, which can make it difficult to validate their performance and ensure their safety and efficacy. Researchers and regulatory organizations are collaborating to provide standardized validation processes and criteria for regulatory approval of deep learning models in medical image analysis in order to overcome these issues. For instance, the FDA has published guidelines for the creation and approval of medical devices based on machine learning and artificial intelligence (AI/ML). These guidelines provide recommendations for the design and validation of AI/ML-based medical devices, including those used for medical image analysis. While these efforts are promising, there is still a need for further research and collaboration between researchers and regulatory bodies to ensure the safe and effective use of deep learning models in medical image analysis ( 98 ).

7.7. Ethical and legal considerations

Deep learning algorithms for medical image processing raise a number of significant outstanding questions about moral and legal dilemmas. These factors concern the use of patient data in research, the possibility of algorithmic biases, and the duty of researchers and healthcare professionals to guarantee the ethical and safe application of these technologies. Use of patient data in research is one ethical issue. Large volumes of patient data are needed for medical image analysis, and the use of this data raises questions concerning patient privacy and permission. Patients’ privacy must be maintained, and researchers and healthcare professionals must make sure that patient data is utilized responsibly ( 99 ). The possibility for prejudice in algorithms is another ethical issue. Deep learning algorithms may be taught on skewed datasets, which might cause the model’s outputs to become biased. Biases can result in incorrect diagnosis and treatment choices in medical image analysis, which can have catastrophic repercussions. Researchers must take action to address any potential biases in their datasets and algorithms. Deep learning algorithms for medical image interpretation raise legal questions around intellectual property, liability, and compliance with regulations. Concerns exist around the possibility of unwanted access to patient data as well as the requirement to uphold data protection regulations in order to preserve patient privacy. To address these ethical and legal considerations, researchers and healthcare providers must ensure that they are following best practices for data privacy and security, obtaining informed consent from patients, and working to mitigate potential biases in their algorithms. It is also important to engage with stakeholders, including patients, regulatory bodies, and legal experts, to ensure that the development and use of these technologies is safe, ethical, and compliant with relevant laws and regulations ( 100 ).

7.8. Future works

Future research in the fast-developing field of medical image analysis utilizing deep learning algorithms has a lot of potential to increase the precision and effectiveness of medical diagnosis and therapy. Some of these areas include:

7.8.1. Multi-modal image analysis

Future research in medical image analysis utilizing deep learning algorithms will focus on multi-modal picture analysis. Utilizing a variety of imaging modalities, including MRI, CT, PET, ultrasound, and optical imaging, allows for a more thorough understanding of a patient’s anatomy and disease ( 101 ). This strategy can aid in enhancing diagnostic precision and lowering the possibility of missing or incorrect diagnoses. Multi-modal picture data may be used to train deep learning algorithms for a range of tasks, including segmentation, registration, classification, and prediction. An algorithm built on MRI and PET data, for instance, might be used to identify areas of the brain afflicted by Alzheimer’s disease. Similarly, a deep learning algorithm could be trained on ultrasound and CT data to identify tumors in the liver. Multi-modal image analysis poses several challenges for deep learning algorithms. For example, different imaging modalities have different resolution, noise, and contrast characteristics, which can affect the performance of the algorithm. Additionally, multi-modal data can be more complex and difficult to interpret than single-modality data, requiring more advanced algorithms and computational resources ( 102 ). To address these challenges, researchers are developing new deep learning models and algorithms that can integrate and analyze data from multiple modalities. For example, multi-modal fusion networks can be used to combine information from different imaging modalities, while attention mechanisms can be used to focus the algorithm’s attention on relevant features in each modality. Overall, multi-modal image analysis holds promise for improving the accuracy and efficiency of medical diagnosis and treatment using deep learning algorithms. As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations.

7.8.2. Explainable AI

Future research in deep learning algorithms for medical image analysis will focus on explainable AI (XAI). XAI is the capacity of an AI system to explain its decision-making process in a way that is intelligible to a human ( 103 ). XAI can assist to increase confidence in deep learning algorithms when employed in the context of medical image analysis, guarantee that they are utilized safely and morally, and allow clinicians to base their judgments more intelligently on the results of these algorithms. XAI in medical image analysis involves developing algorithms that can not only make accurate predictions or segmentations but also provide clear and interpretable reasons for their decisions. This can be particularly important in cases where the AI system’s output contradicts or differs from the clinician’s assessment or prior knowledge. One approach to XAI in medical image analysis is to develop visual explanations or heatmaps that highlight the regions of an image that were most important in the algorithm’s decision-making process. These explanations can help to identify regions of interest, detect subtle abnormalities, and provide insight into the algorithm’s thought process ( 104 ). Another approach to XAI in medical image analysis is to incorporate external knowledge or prior information into the algorithm’s decision-making process. For example, an algorithm that analyzes brain MRIs could be designed to incorporate known patterns of disease progression or anatomical landmarks. Overall, XAI holds promise for improving the transparency, interpretability, and trustworthiness of deep learning algorithms in medical image analysis. As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations ( 105 ).

7.8.3. Transfer learning

Future research in the field of deep learning-based medical image processing will focus on transfer learning. Transfer learning is the process of using previously trained deep learning models to enhance a model’s performance on a new task or dataset. Transfer learning can be particularly helpful in the interpretation of medical images as it can eliminate the requirement for significant volumes of labeled data, which can be challenging and time-consuming to gather. Researchers can use pre-trained models that have already been trained on huge datasets to increase the precision and effectiveness of their own models by taking advantage of the information and representations acquired by these models. Since transfer learning can do away with the need for large amounts of labeled data, which can be difficult and time-consuming to collect, it can be very useful in the interpretation of medical pictures. By utilizing the knowledge and representations amassed by pre-trained models that have previously been trained on massive datasets, researchers may utilize them to improve the accuracy and efficacy of their own models ( 106 ). The pre-trained model could be a useful place to start for the medical image analysis problem since it enables the model to learn from less data and might lessen the possibility of overfitting. Additionally, transfer learning may increase the generalizability of deep learning models used for medical picture interpretation. Medical image analysis models may be able to develop more reliable and generalizable representations of medical pictures that are relevant to a wider range of tasks and datasets by making use of pre-trained models that have learnt representations of real images. Transfer learning has the potential to enhance the effectiveness, precision, and generalizability of deep learning models used for medical image interpretation. As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations.

7.8.4. Federated learning

Future research in deep learning algorithms for medical image analysis will focus on federated learning. Without the need to move the data to a central server, federated learning refers to the training of machine learning models on data that is dispersed among several devices or institutions. Federated learning can be especially helpful in the context of medical image analysis since it permits the exchange of information and expertise between institutions while safeguarding the confidentiality and security of sensitive patient data ( 107 ). In situations where patient data is subject to strong privacy laws, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, this can be particularly crucial. Federated learning works by training a central machine learning model on a set of initial weights, which are then sent to each of the participating devices or institutions. Each device or institution then trains the model on their own local data, using the initial weights as a starting point. The updated weights from each device or institution are then sent back to the central server, where they are aggregated to update the central model. This process is repeated iteratively until the model converges. By training models using federated learning, medical institutions can leverage the collective knowledge and expertise of multiple institutions, improving the accuracy and generalizability of the models. Additionally, the confidentiality and privacy of patient data are preserved because the data stays on local devices or organizations. Overall, federated learning shows potential for enhancing deep learning models’ generalizability, speed, and privacy in the context of medical picture analysis ( 108 ). As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations.

7.8.5. Integration with electronic health records (EHRs)

Future development in deep learning algorithms for medical image analysis will focus on integration with electronic health records (EHRs). EHRs contain a wealth of clinical information, including patient demographics, medical history, laboratory results, and imaging studies. Researchers and clinicians may be able to increase the precision and effectiveness of medical image analysis by merging deep learning algorithms with EHRs. One potential application of this integration is to improve the interpretation of medical images by incorporating patient-specific information from EHRs. For example, deep learning algorithms could be trained to predict the likelihood of certain diseases or conditions based on a patient’s clinical history, laboratory results, and imaging studies. This may decrease the need for invasive or pricey diagnostic procedures and increase the accuracy of medical picture interpretation. Using deep learning algorithms to automatically extract data from medical photos and incorporate it into EHRs is a further possible use ( 109 ). For example, deep learning algorithms could be trained to automatically segment and measure lesions or tumors in medical images and record this information in the patient’s EHR. This may decrease the need for invasive or pricey diagnostic procedures and increase the accuracy of medical picture interpretation. Using deep learning algorithms to automatically extract data from medical photos and incorporate it into EHRs is a further possible use. This may lessen the strain on physicians and increase the effectiveness and precision of clinical decision-making. Overall, deep learning algorithm integration with EHRs shows potential for enhancing the precision, efficacy, and efficiency of medical picture processing. It will be crucial to make sure that these technologies are utilized safely, morally, and in line with all applicable laws and regulations regarding patient privacy and data security as they continue to advance ( 110 ).

7.8.6. Few-shots learning

Future research in Medical Image Analysis using DL algorithms should delve into the realm of Few-shot Learning. This approach holds great potential for scenarios where labeled data is limited or difficult to obtain, which is often the case in medical imaging ( 111 ). Investigating techniques that enable models to learn from a small set of annotated examples, and potentially even adapt to new, unseen classes, will be instrumental. Meta-learning algorithms, which aim to train models to quickly adapt to new tasks with minimal data, could be explored for their applicability in medical image analysis. Additionally, methods for data augmentation and synthesis specifically tailored for few-shot scenarios could be developed. By advancing Few-shot Learning in the context of medical imaging, we can significantly broaden the scope of applications, improve the accessibility of AI-driven healthcare solutions, and ultimately enhance the quality of patient care ( 112 ).

8. Conclusion and limitation

In recent years, there has been significant progress in medical image analysis using deep learning algorithms, with numerous studies highlighting the effectiveness of DL in various areas like cell, bone, tissue, tumor, vessel, and lesion segmentation. However, as the field continues to evolve, further research is essential to explore new techniques and methodologies that can improve the performance and robustness of DL algorithms in image analysis. Comprehensive evaluations of DL algorithms in real-world scenarios are needed, along with the development of scalable and robust systems for healthcare settings. Continuing research in this area is imperative to fully utilize the potential of DL in medical image segmentation and enhance healthcare outcomes. This article presents a systematic review of DL-based methods for image analysis, discussing advantages, disadvantages, and the strategy employed. The evaluation of DL-image analysis platforms and tools is also covered. Most papers are assessed based on qualitative features, but some important aspects like security and convergence time are overlooked. Various programming languages are used to evaluate the proposed methods. The investigation aims to provide valuable guidance for future research on DL application in medical and healthcare image analysis. However, the study encountered constraints, including limited access to non-English papers and a scarcity of high-quality research focusing on this topic. The heterogeneity in methodologies, datasets, and evaluation metrics used in the studies presents challenges in drawing conclusive insights and performing quantitative meta-analysis. Additionally, the rapidly evolving nature of DL techniques and the emergence of new algorithms may necessitate frequent updates to remain current. Despite these limitations, DL has proven to be a game-changing approach for addressing complex problems, and the study’s results are expected to advance DL approaches in real-world applications.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

ML: Investigation, Writing – original draft. YJ: Investigation, Writing – review & editing. YZ: Investigation, Supervision, Writing – original draft. HZ: Investigation, Writing – original draft.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1. Zhang, D, Wu, G, and Zhou, L. Machine learning in medical imaging . Berlin: Springer (2014).

Google Scholar

2. Amiri, Z, Heidari, A, Navimipour, NJ, Unal, M, and Mousavi, A. Adventures in data analysis: a systematic review of deep learning techniques for pattern recognition in cyber-physical-social systems. Multimed Tools Appl . (2023) 2023:1–65. doi: 10.1007/s11042-023-16382-x

CrossRef Full Text | Google Scholar

3. Sudheer Kumar, E, and Shoba Bindu, C. Medical image analysis using deep learning: a systematic literature review In: AK Somani, S Ramakrishna, A Chaudhary, C Choudhary, and B Agarwal, editors. Emerging Technologies in Computer Engineering: Microservices in big data analytics: Second international conference, ICETCE 2019 . Jaipur, India: Springer (2019). 81–97.

4. Klang, E. Deep learning and medical imaging. J Thorac Dis . (2018) 10:1325–8. doi: 10.21037/jtd.2018.02.76

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Altaf, F, Islam, SM, Akhtar, N, and Janjua, NK. Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access . (2019) 7:99540–72. doi: 10.1109/ACCESS.2019.2929365

6. Barragán-Montero, A, Javaid, U, Valdés, G, Nguyen, D, Desbordes, P, Macq, B, et al. Artificial intelligence and machine learning for medical imaging: a technology review. Phys Med . (2021) 83:242–56. doi: 10.1016/j.ejmp.2021.04.016

7. Salahuddin, Z, Woodruff, HC, Chatterjee, A, and Lambin, P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput Biol Med . (2022) 140:105111. doi: 10.1016/j.compbiomed.2021.105111

8. Guo, Z, Li, X, Huang, H, Guo, N, and Li, Q. Deep learning-based image segmentation on multimodal medical imaging. IEEE Trans Radiat Plasma Med Sci . (2019) 3:162–9. doi: 10.1109/TRPMS.2018.2890359

9. Zhang, Y, Gorriz, JM, and Dong, Z. Deep learning in medical image analysis. J Imaging . (2021) 7:74. doi: 10.3390/jimaging7040074

10. Van der Velden, BH, Kuijf, HJ, Gilhuijs, KG, and Viergever, MA. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med Image Anal . (2022) 79:102470. doi: 10.1016/j.media.2022.102470

11. Suzuki, K. Overview of deep learning in medical imaging. Radiol Phys Technol . (2017) 10:257–73. doi: 10.1007/s12194-017-0406-5

12. Amiri, Z, Heidari, A, Darbandi, M, Yazdani, Y, Jafari Navimipour, N, Esmaeilpour, M, et al. The personal health applications of machine learning techniques in the internet of behaviors. Sustainability . (2023) 15:12406. doi: 10.3390/su151612406

13. Amiri, Z, Heidari, A, Navimipour, NJ, and Unal, M. Resilient and dependability management in distributed environments: a systematic and comprehensive literature review. Clust Comput . (2023) 26:1565–600. doi: 10.1007/s10586-022-03738-5

14. Kim, J, Hong, J, and Park, H. Prospects of deep learning for medical imaging. Precis. Future Med . (2018) 2:37–52. doi: 10.23838/pfm.2018.00030

15. Shen, D, Wu, G, Zhang, D, Suzuki, K, Wang, F, and Yan, P. Machine learning in medical imaging. Comput Med Imaging Graph . (2015) 41:1–2. doi: 10.1016/j.compmedimag.2015.02.001

16. Dhiman, G, Juneja, S, Viriyasitavat, W, Mohafez, H, Hadizadeh, M, Islam, MA, et al. A novel machine-learning-based hybrid CNN model for tumor identification in medical image processing. Sustainability . (2022) 14:1447. doi: 10.3390/su14031447

17. Vijayalakshmi, M. Melanoma skin cancer detection using image processing and machine learning. Int J Trend Sci Res Dev . (2019) 3:780–4. doi: 10.31142/ijtsrd23936

18. De Fauw, J, Keane, P, Tomasev, N, Visentin, D, van den Driessche, G, Johnson, M, et al. Automated analysis of retinal imaging using machine learning techniques for computer vision. F1000Research . (2016) 5:1573. doi: 10.12688/f1000research.8996.1

19. Meena, T, and Roy, S. Bone fracture detection using deep supervised learning from radiological images: a paradigm shift. Diagnostics . (2022) 12:2420. doi: 10.3390/diagnostics12102420

20. Popescu, AB, Taca, IA, Vizitiu, A, Nita, CI, Suciu, C, Itu, LM, et al. Obfuscation algorithm for privacy-preserving deep learning-based medical image analysis. Appl Sci . (2022) 12:3997. doi: 10.3390/app12083997

21. Singha, A, Thakur, RS, and Patel, T. Deep learning applications in medical image analysis. Biomed Data Min Inf Retr . (2021) 2021:293–350. doi: 10.1002/9781119711278.ch11

22. Mohapatra, S, Swarnkar, T, and Das, J. Deep convolutional neural network in medical image processing In: VE Balas, BK Mishra, and R Kumar, editors. Handbook of deep learning in biomedical engineering . Amsterdam, Netherlands: Elsevier (2021). 25–60.

23. Debelee, TG, Schwenker, F, Ibenthal, A, and Yohannes, D. Survey of deep learning in breast cancer image analysis. Evol Syst . (2020) 11:143–63. doi: 10.1007/s12530-019-09297-2

24. Qureshi, I, Yan, J, Abbas, Q, Shaheed, K, Riaz, AB, Wahid, A, et al. Medical image segmentation using deep semantic-based methods: a review of techniques, applications and emerging trends. Inf Fusion . (2022) 90:316–52. doi: 10.1016/j.inffus.2022.09.031

25. Shin, H-C, Lu, L, and Summers, RM. Natural language processing for large-scale medical image analysis using deep learning. Deep Learn Med Image Anal . (2017) 2017:405–21. doi: 10.1016/B978-0-12-810408-8.00023-7

26. Tajbakhsh, N, Jeyaseelan, L, Li, Q, Chiang, JN, Wu, Z, and Ding, X. Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Med Image Anal . (2020) 63:101693. doi: 10.1016/j.media.2020.101693

27. Uwimana, A, and Senanayake, R. Out of distribution detection and adversarial attacks on deep neural networks for robust medical image analysis. arXiv . ICML 2021 workshop on A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning. (2021) 2021:04882. doi: 10.48550/arXiv.2107.04882

28. Al-Galal, SAY, Alshaikhli, IFT, and Abdulrazzaq, M. MRI brain tumor medical images analysis using deep learning techniques: a systematic review. Heal Technol . (2021) 11:267–82. doi: 10.1007/s12553-020-00514-6

29. Duncan, JS, Insana, MF, and Ayache, N. Biomedical imaging and analysis in the age of big data and deep learning [scanning the issue]. Proc IEEE . (2019) 108:3–10. doi: 10.1109/JPROC.2019.2956422

30. Zhou, SK, Le, HN, Luu, K, Nguyen, HV, and Ayache, N. Deep reinforcement learning in medical imaging: a literature review. Med Image Anal . (2021) 73:102193. doi: 10.1016/j.media.2021.102193

31. Gupta, A, and Katarya, R. Social media based surveillance systems for healthcare using machine learning: a systematic review. J Biomed Inform . (2020) 108:103500. doi: 10.1016/j.jbi.2020.103500

32. Kourou, K, Exarchos, TP, Exarchos, KP, Karamouzis, MV, and Fotiadis, DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J . (2015) 13:8–17. doi: 10.1016/j.csbj.2014.11.005

33. Razzak, MI, Naz, S, and Zaib, A. Deep learning for medical image processing: overview, challenges and the future. In: N Dey, AS Ashour, and S Borra, Classification in BioApps: Automation of Decision Making , Berlin: Springer, pp. 323–350. (2018).

34. Litjens, G, Kooi, T, Bejnordi, BE, Setio, AAA, Ciompi, F, Ghafoorian, M, et al. A survey on deep learning in medical image analysis. Med Image Anal . (2017) 42:60–88. doi: 10.1016/j.media.2017.07.005

35. Bzdok, D, and Ioannidis, JP. Exploration, inference, and prediction in neuroscience and biomedicine. Trends Neurosci . (2019) 42:251–62. doi: 10.1016/j.tins.2019.02.001

36. Singh, AV, Rosenkranz, D, Ansari, MHD, Singh, R, Kanase, A, Singh, SP, et al. Artificial intelligence and machine learning empower advanced biomedical material design to toxicity prediction. Adv Intell Syst . (2020) 2:2000084. doi: 10.1002/aisy.202000084

37. Jena, M, Mishra, D, Mishra, SP, Mallick, PK, and Kumar, S. Exploring the parametric impact on a deep learning model and proposal of a 2-branch CNN for diabetic retinopathy classification with case study in IoT-Blockchain based smart healthcare system. Informatica . (2022) 46:3906. doi: 10.31449/inf.v46i2.3906

38. Thilagam, K, Beno, A, Lakshmi, MV, Wilfred, CB, George, SM, Karthikeyan, M, et al. Secure IoT healthcare architecture with deep learning-based access control system. J Nanomater . (2022) 2022:1–8. doi: 10.1155/2022/2638613

39. Ismail, WN, Hassan, MM, Alsalamah, HA, and Fortino, G. CNN-based health model for regular health factors analysis in internet-of-medical things environment. IEEE Access . (2020) 8:52541–9. doi: 10.1109/ACCESS.2020.2980938

40. More, S, Singla, J, Verma, S, Kavita, K, Ghosh, U, Rodrigues, JJPC, et al. Security assured CNN-based model for reconstruction of medical images on the internet of healthcare things. IEEE Access . (2020) 8:126333–46. doi: 10.1109/ACCESS.2020.3006346

41. Vaccari, I, Orani, V, Paglialonga, A, Cambiaso, E, and Mongelli, M. A generative adversarial network (Gan) technique for internet of medical things data. Sensors . (2021) 21:3726. doi: 10.3390/s21113726

42. Kadri, F, Dairi, A, Harrou, F, and Sun, Y. Towards accurate prediction of patient length of stay at emergency department: a GAN-driven deep learning framework. J Ambient Intell Humaniz Comput . (2022) 14:11481–95. doi: 10.1007/s12652-022-03717-z

43. Yang, Y, Nan, F, Yang, P, Meng, Q, Xie, Y, Zhang, D, et al. GAN-based semi-supervised learning approach for clinical decision support in health-IoT platform. IEEE Access . (2019) 7:8048–57. doi: 10.1109/ACCESS.2018.2888816

44. Huang, Z, Zhang, J, Zhang, Y, and Shan, H. DU-GAN: generative adversarial networks with dual-domain U-net-based discriminators for low-dose CT denoising. IEEE Trans Instrum Meas . (2021) 71:1–12. doi: 10.1109/TIM.2021.3128703

45. Purandhar, N, Ayyasamy, S, and Siva Kumar, P. Classification of clustered health care data analysis using generative adversarial networks (GAN). Soft Comput . (2022) 26:5511–21. doi: 10.1007/s00500-022-07026-7

46. Sridhar, C, Pareek, PK, Kalidoss, R, Jamal, SS, Shukla, PK, and Nuagah, SJ. Optimal medical image size reduction model creation using recurrent neural network and GenPSOWVQ. J Healthc Eng . (2022) 2022:1–8. doi: 10.1155/2022/2354866

47. Pham, T, Tran, T, Phung, D, and Venkatesh, S. Predicting healthcare trajectories from medical records: a deep learning approach. J Biomed Inform . (2017) 69:218–29. doi: 10.1016/j.jbi.2017.04.001

48. Wang, L, Zhang, W, He, X, and Zha, H. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation . In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining . Washington, DC, USA. 2447–2456. (2018).

49. Jagannatha, AN, and Yu, H. Bidirectional RNN for medical event detection in electronic health records . In: Proceedings of the conference. Association for Computational Linguistics. North American chapter. Meeting, p. 473. (2016).

50. Cocos, A, Fiks, AG, and Masino, AJ. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in twitter posts. J Am Med Inform Assoc . (2017) 24:813–21. doi: 10.1093/jamia/ocw180

51. Butt, UM, Letchmunan, S, Ali, M, Hassan, FH, Baqir, A, and Sherazi, HHR. Machine learning based diabetes classification and prediction for healthcare applications. J Healthc Eng . (2021) 2021:1–17. doi: 10.1155/2021/9930985

52. Awais, M, Raza, M, Singh, N, Bashir, K, Manzoor, U, Islam, SU, et al. LSTM-based emotion detection using physiological signals: IoT framework for healthcare and distance learning in COVID-19. IEEE Internet Things J . (2020) 8:16863–71. doi: 10.1109/JIOT.2020.3044031

53. Nancy, AA, Ravindran, D, Raj Vincent, PD, Srinivasan, K, and Gutierrez Reina, D. Iot-cloud-based smart healthcare monitoring system for heart disease prediction via deep learning. Electronics . (2022) 11:2292. doi: 10.3390/electronics11152292

54. Queralta, JP, Gia, TN, Tenhunen, H, and Westerlund, T. Edge-AI in LoRa-based health monitoring: fall detection system with fog computing and LSTM recurrent neural networks . In: 2019 42nd international conference on telecommunications and signal processing (TSP), IEEE, pp. 601–604. (2019).

55. Gao, Y, Phillips, JM, Zheng, Y, Min, R, Fletcher, PT, and Gerig, G. Fully convolutional structured LSTM networks for joint 4D medical image segmentation . In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), IEEE, pp. 1104–1108. (2018).

56. Shahzadi, I, Tang, TB, Meriadeau, F, and Quyyum, A. CNN-LSTM: cascaded framework for brain tumour classification . In: 2018 IEEE-EMBS conference on biomedical engineering and sciences (IECBES), IEEE, 633–637. (2018).

57. Srikantamurthy, MM, Rallabandi, V, Dudekula, DB, Natarajan, S, and Park, J. Classification of benign and malignant subtypes of breast cancer histopathology imaging using hybrid CNN-LSTM based transfer learning. BMC Med Imaging . (2023) 23:1–15. doi: 10.1186/s12880-023-00964-0

58. Banerjee, I, Ling, Y, Chen, MC, Hasan, SA, Langlotz, CP, Moradzadeh, N, et al. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med . (2019) 97:79–88. doi: 10.1016/j.artmed.2018.11.004

59. Nandhini Abirami, R, Vincent, PDR, Srinivasan, K, Tariq, U, and Chang, C-Y. Deep CNN and deep GAN in computational visual perception-driven image analysis. Complexity . (2021) 2021:1–30. doi: 10.1155/2021/5541134

60. Yao, H, Zhang, X, Zhou, X, and Liu, S. Parallel structure deep neural network using CNN and RNN with an attention mechanism for breast cancer histology image classification. Cancers . (2019) 11:1901. doi: 10.3390/cancers11121901

61. Sarbaz, M, Soltanian, M, Manthouri, M, and Zamani, I. Adaptive optimal control of chaotic system using backstepping neural network concept . In: 2022 8th international conference on control, instrumentation and automation (ICCIA), IEEE, 1–5. (2022).

62. Bagheri, M, Zhao, H, Sun, M, Huang, L, Madasu, S, Lindner, P, et al. Data conditioning and forecasting methodology using machine learning on production data for a well pad . in Houston, Texas, USA: Offshore technology conference, OTC, No. D031S037R002. (2020).

63. Soleimani, R, and Lobaton, E. Enhancing inference on physiological and kinematic periodic signals via phase-based interpretability and multi-task learning. Information . (2022) 13:326. doi: 10.3390/info13070326

64. Ahmadi, SS, and Khotanlou, H. A hybrid of inference and stacked classifiers to indoor scenes classification of rgb-d images . In: 2022 international conference on machine vision and image processing (MVIP), IEEE, pp. 1–6. (2022).

65. Morteza, A, Yahyaeian, AA, Mirzaeibonehkhater, M, Sadeghi, S, Mohaimeni, A, and Taheri, S. Deep learning hyperparameter optimization: application to electricity and heat demand prediction for buildings. Energ Buildings . (2023) 289:113036. doi: 10.1016/j.enbuild.2023.113036

66. Webber, J, Mehbodniya, A, Hou, Y, Yano, K, and Kumagai, T. Study on idle slot availability prediction for WLAN using a probabilistic neural network . In: 2017 23rd Asia-Pacific conference on communications (APCC), IEEE, 1–6. (2017).

67. Webber, J, Mehbodniya, A, Arafa, A, and Alwakeel, A. Improved human activity recognition using majority combining of reduced-complexity sensor branch classifiers. Electronics . (2022) 11:392. doi: 10.3390/electronics11030392

68. Gera, T, Singh, J, Mehbodniya, A, Webber, JL, Shabaz, M, and Thakur, D. Dominant feature selection and machine learning-based hybrid approach to analyze android ransomware. Secur Commun Netw . (2021) 2021:1–22. doi: 10.1155/2021/7035233

69. Bukhari, SNH, Webber, J, and Mehbodniya, A. Decision tree based ensemble machine learning model for the prediction of Zika virus T-cell epitopes as potential vaccine candidates. Sci Rep . (2022) 12:7810. doi: 10.1038/s41598-022-11731-6

70. Singh, R, Mehbodniya, A, Webber, JL, Dadheech, P, Pavithra, G, Alzaidi, MS, et al. Analysis of network slicing for management of 5G networks using machine learning techniques. Wirel Commun Mob Comput . (2022) 2022:1–10. doi: 10.1155/2022/9169568

71. He, P, Almasifar, N, Mehbodniya, A, Javaheri, D, and Webber, JL. Towards green smart cities using internet of things and optimization algorithms: a systematic and bibliometric review. Sustain Comput . (2022) 36:100822. doi: 10.1016/j.suscom.2022.100822

72. Sadi, M, He, Y, Li, Y, Alam, M, Kundu, S, Ghosh, S, et al. Special session: on the reliability of conventional and quantum neural network hardware . In 2022 IEEE 40th VLSI test symposium (VTS) , IEEE, 1–12. (2022).

73. Moradi, M, Weng, Y, and Lai, Y-C. Defending smart electrical power grids against cyberattacks with deep Q-learning. P R X Energy . (2022) 1:033005. doi: 10.1103/PRXEnergy.1.033005

74. Esmaeili, N, and Bamdad Soofi, J. Expounding the knowledge conversion processes within the occupational safety and health management system (OSH-MS) using concept mapping. Int J Occup Saf Ergon . (2022) 28:1000–15. doi: 10.1080/10803548.2020.1853957

75. Zhang, Y, Mu, L, Shen, G, Yu, Y, and Han, C. Fault diagnosis strategy of CNC machine tools based on cascading failure. J Intell Manuf . (2019) 30:2193–202. doi: 10.1007/s10845-017-1382-7

76. Shen, G, Zeng, W, Han, C, Liu, P, and Zhang, Y. Determination of the average maintenance time of CNC machine tools based on type II failure correlation. Ekspl Niezawodność . (2017) 19:604–14. doi: 10.17531/ein.2017.4.15

77. Shen, G, Han, C, Chen, B, Dong, L, and Cao, P. Fault analysis of machine tools based on grey relational analysis and main factor analysis. J Phys Conf Ser . (2018) 1069:012112. doi: 10.1088/1742-6596/1069/1/012112

78. Han, C, and Fu, X. Challenge and opportunity: deep learning-based stock price prediction by using bi-directional LSTM model. Front Bus Econ Manage . (2023) 8:51–4. doi: 10.54097/fbem.v8i2.6616

79. Dehghani, F, and Larijani, A. A machine learning-Jaya algorithm (ml-Ijaya) approach for rapid optimization using high performance computing. SAE International Journal of Commercial Vehicles-V127-2EJ . (2023).

80. Dehghani, F., and Larijani, A. Average portfolio optimization using multi-layer neural networks with risk consideration . Social Science Research Network (SSRN). (2023).

81. Rezaei, M, Rastgoo, R, and Athitsos, V. TriHorn-net: a model for accurate depth-based 3D hand pose estimation. Expert Syst Appl . (2023) 223:119922. doi: 10.1016/j.eswa.2023.119922

82. Ahmadi, SS, and Khotanlou, H. Enhance support relation extraction accuracy using improvement of segmentation in RGB-D images . In: 2017 3rd international conference on pattern recognition and image analysis (IPRIA), IEEE, 166–169. (2017).

83. Mirzapour, O, and Arpanahi, SK. Photovoltaic parameter estimation using heuristic optimization . In: 2017 IEEE 4th international conference on knowledge-based engineering and innovation (KBEI), IEEE, pp. 792–797. (2017).

84. Khorshidi, M, Ameri, M, and Goli, A. Cracking performance evaluation and modelling of RAP mixtures containing different recycled materials using deep neural network model. Road Mater Pavement Des . (2023) 2023:1–20. doi: 10.1080/14680629.2023.2222835

85. Rastegar, RM, Saghafi Moghaddam, S, Haghnazar, R, and Zimring, C. From evidence to assessment: developing a scenario-based computational design algorithm to support informed decision-making in primary care clinic design workflow. Int J Archit Comput . (2022) 20:567–86. doi: 10.1177/14780771221121031

86. Jafari, BM, Zhao, M, and Jafari, A. Rumi: an intelligent agent enhancing learning management systems using machine learning techniques. J Softw Eng Appl . (2022) 15:325–43. doi: 10.4236/jsea.2022.159019

87. Moradi, MR, Kalhori, SRN, Saeedi, MG, Zarkesh, MR, Habibelahi, A, and Panahi, AH. Designing a remote closed-loop automatic oxygen control in preterm infants. Iran J Pediatr . (2020) 30:101715. doi: 10.5812/ijp.101715

88. Kosarirad, H, Ghasempour Nejati, M, Saffari, A, Khishe, M, and Mohammadi, M. Feature selection and training multilayer perceptron neural networks using grasshopper optimization algorithm for design optimal classifier of big data sonar. J Sens . (2022) 2022:1–14. doi: 10.1155/2022/9620555

89. Wu, D-C, Momeni, M, Razban, A, and Chen, J. Optimizing demand-controlled ventilation with thermal comfort and CO2 concentrations using long short-term memory and genetic algorithm. Build Environ . (2023) 243:110676. doi: 10.1016/j.buildenv.2023.110676

90. Momeni, M, Wu, DC, Razban, A, and Chen, J. Data-driven demand control ventilation using machine learning CO 2 occupancy detection method . (2020).

91. Zhang, J, Shen, Q, Ma, Y, Liu, L, Jia, W, Chen, L, et al. Calcium homeostasis in Parkinson’s disease: from pathology to treatment. Neurosci Bull . (2022) 38:1267–70. doi: 10.1007/s12264-022-00899-6

92. Qi, M, Cui, S, Chang, X, Xu, Y, Meng, H, Wang, Y, et al. Multi-region nonuniform brightness correction algorithm based on L-channel gamma transform. Secur Commun Netw . (2022) 2022:1–9. doi: 10.1155/2022/2675950

93. Yang, S, Li, Q, Li, W, Li, X, and Liu, A-A. Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Trans Circuits Syst Video Technol . (2022) 32:8037–50. doi: 10.1109/TCSVT.2022.3182426

94. Liu, A-A, Zhai, Y, Xu, N, Nie, W, Li, W, and Zhang, Y. Region-aware image captioning via interaction learning. IEEE Trans Circuits Syst Video Technol . (2021) 32:3685–96. doi: 10.1109/TCSVT.2021.3107035

95. Wang, Y, Xu, N, Liu, A-A, Li, W, and Zhang, Y. High-order interaction learning for image captioning. IEEE Trans Circuits Syst Video Technol . (2021) 32:4417–30. doi: 10.1109/TCSVT.2021.3121062

96. Wang, H, Wang, K, Xue, Q, Peng, M, Yin, L, Gu, X, et al. Transcranial alternating current stimulation for treating depression: a randomized controlled trial. Brain . (2022) 145:83–91. doi: 10.1093/brain/awab252

97. Zhang, Z, Guo, D, Zhou, S, Zhang, J, and Lin, Y. Flight trajectory prediction enabled by time-frequency wavelet transform. Nat Commun . (2023) 14:5258. doi: 10.1038/s41467-023-40903-9

98. Shan, Y, Wang, H, Yang, Y, Wang, J, Zhao, W, Huang, Y, et al. Evidence of a large current of transcranial alternating current stimulation directly to deep brain regions. Mol Psychiatry . (2023) 2023:1–9. doi: 10.1038/s41380-023-02150-8

99. Lin, Z, Wang, H, and Li, S. Pavement anomaly detection based on transformer and self-supervised learning. Autom Constr . (2022) 143:104544. doi: 10.1016/j.autcon.2022.104544

100. Shen, Y, Ding, N, Zheng, H-T, Li, Y, and Yang, M. Modeling relation paths for knowledge graph completion. IEEE Trans Knowl Data Eng . (2020) 33:3607–17. doi: 10.1109/TKDE.2020.2970044

101. Cheng, B, Wang, M, Zhao, S, Zhai, Z, Zhu, D, and Chen, J. Situation-aware dynamic service coordination in an IoT environment. IEEE/ACM Trans Networking . (2017) 25:2082–95. doi: 10.1109/TNET.2017.2705239

102. Tang, Y, Liu, S, Deng, Y, Zhang, Y, Yin, L, and Zheng, W. An improved method for soft tissue modeling. Biomed Signal Proc Control . (2021) 65:102367. doi: 10.1016/j.bspc.2020.102367

103. Zhang, Z, Wang, L, Zheng, W, Yin, L, Hu, R, and Yang, B. Endoscope image mosaic based on pyramid ORB. Biomed Signal Proc Control . (2022) 71:103261. doi: 10.1016/j.bspc.2021.103261

104. Lu, S, Yang, B, Xiao, Y, Liu, S, Liu, M, Yin, L, et al. Iterative reconstruction of low-dose CT based on differential sparse. Biomed Signal Proc Control . (2023) 79:104204. doi: 10.1016/j.bspc.2022.104204

105. Liu, M, Zhang, X, Yang, B, Yin, Z, Liu, S, Yin, L, et al. Three-dimensional modeling of heart soft tissue motion. Appl Sci . (2023) 13:2493. doi: 10.3390/app13042493

106. Dang, W, Xiang, L, Liu, S, Yang, B, Liu, M, Yin, Z, et al. A feature matching method based on the convolutional neural network. J Imaging Sci Technol . (2023) 67:030402. doi: 10.2352/J.ImagingSci.Technol.2023.67.3.030402

107. Wang, L, She, A, and Xie, Y. The dynamics analysis of Gompertz virus disease model under impulsive control. Sci Rep . (2023) 13:10180. doi: 10.1038/s41598-023-37205-x

108. Zeng, Q, Bie, B, Guo, Q, Yuan, Y, Han, Q, Han, X, et al. Hyperpolarized Xe NMR signal advancement by metal-organic framework entrapment in aqueous solution. Proc Natl Acad Sci . (2020) 117:17558–63. doi: 10.1073/pnas.2004121117

109. Gao, Z, Pan, X, Shao, J, Jiang, X, Su, Z, Jin, K, et al. Automatic interpretation and clinical evaluation for fundus fluorescein angiography images of diabetic retinopathy patients by deep learning. Br J Ophthalmol . (2022):bjophthalmol-2022-321472. doi: 10.1136/bjo-2022-321472

110. Jin, K, Gao, Z, Jiang, X, Wang, Y, Ma, X, Li, Y, et al. MSHF: a multi-source heterogeneous fundus (MSHF) dataset for image quality assessment. Sci Data . (2023) 10:286. doi: 10.1038/s41597-023-02188-x

111. Wang, Y, Zhai, W, Yang, L, Cheng, S, Cui, W, and Li, J. Establishments and evaluations of post-operative adhesion animal models. Adv Ther . (2023) 6:2200297. doi: 10.1002/adtp.202200297

112. Ye, X, Wang, J, Qiu, W, Chen, Y, and Shen, L. Excessive gliosis after vitrectomy for the highly myopic macular hole: a spectral domain optical coherence tomography study. Retina . (2022):1097. doi: 10.1097/IAE.0000000000003657

Keywords: deep learning, machine learning, medical images, image analysis, convolutional neural networks

Citation: Li M, Jiang Y, Zhang Y and Zhu H (2023) Medical image analysis using deep learning algorithms. Front. Public Health . 11:1273253. doi: 10.3389/fpubh.2023.1273253

Received: 08 August 2023; Accepted: 05 October 2023; Published: 07 November 2023.

Reviewed by:

Copyright © 2023 Li, Jiang, Zhang and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yanzhou Zhang, [email protected]

† These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

arXiv's Accessibility Forum starts next month!

Help | Advanced Search

Electrical Engineering and Systems Science > Image and Video Processing

Title: sam & sam 2 in 3d slicer: segmentwithsam extension for annotating medical images.

Abstract: Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in the form of an extension to the popular annotation software: 3D Slicer. Our extension allows users to place point prompts on 2D slices to generate annotation masks and propagate these annotations across entire volumes in either single-directional or bi-directional manners. Our code is publicly available on this https URL and can be easily installed directly from the Extension Manager of 3D Slicer as well.
Comments: Future work: support for box and mask inputs for the video predictor of SAM 2
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
Cite as: [eess.IV]
  (or [eess.IV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite (pending registration)

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Medical image analysis based on deep learning approach

Muralikrishna puttagunta.

Department of Computer Science, School of Engineering and Technology, Pondicherry University, Pondicherry, India

Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning Approach (DLA) in medical image analysis emerges as a fast-growing research field. DLA has been widely used in medical imaging to detect the presence or absence of the disease. This paper presents the development of artificial neural networks, comprehensive analysis of DLA, which delivers promising medical imaging applications. Most of the DLA implementations concentrate on the X-ray images, computerized tomography, mammography images, and digital histopathology images. It provides a systematic review of the articles for classification, detection, and segmentation of medical images based on DLA. This review guides the researchers to think of appropriate changes in medical image analysis based on DLA.

Introduction

In the health care system, there has been a dramatic increase in demand for medical image services, e.g. Radiography, endoscopy, Computed Tomography (CT), Mammography Images (MG), Ultrasound images, Magnetic Resonance Imaging (MRI), Magnetic Resonance Angiography (MRA), Nuclear medicine imaging, Positron Emission Tomography (PET) and pathological tests. Besides, medical images can often be challenging to analyze and time-consuming process due to the shortage of radiologists.

Artificial Intelligence (AI) can address these problems. Machine Learning (ML) is an application of AI that can be able to function without being specifically programmed, that learn from data and make predictions or decisions based on past data. ML uses three learning approaches, namely, supervised learning, unsupervised learning, and semi-supervised learning. The ML techniques include the extraction of features and the selection of suitable features for a specific problem requires a domain expert. Deep learning (DL) techniques solve the problem of feature selection. DL is one part of ML, and DL can automatically extract essential features from raw input data [ 88 ]. The concept of DL algorithms was introduced from cognitive and information theories. In general, DL has two properties: (1) multiple processing layers that can learn distinct features of data through multiple levels of abstraction, and (2) unsupervised or supervised learning of feature presentations on each layer. A large number of recent review papers have highlighted the capabilities of advanced DLA in the medical field MRI [ 8 ], Radiology [ 96 ], Cardiology [ 11 ], and Neurology [ 155 ].

Different forms of DLA were borrowed from the field of computer vision and applied to specific medical image analysis. Recurrent Neural Networks (RNNs) and convolutional neural networks are examples of supervised DL algorithms. In medical image analysis, unsupervised learning algorithms have also been studied; These include Deep Belief Networks (DBNs), Restricted Boltzmann Machines (RBMs), Autoencoders, and Generative Adversarial Networks (GANs) [ 84 ]. DLA is generally applicable for detecting an abnormality and classify a specific type of disease. When DLA is applied to medical images, Convolutional Neural Networks (CNN) are ideally suited for classification, segmentation, object detection, registration, and other tasks [ 29 , 44 ]. CNN is an artificial visual neural network structure used for medical image pattern recognition based on convolution operation. Deep learning (DL) applications in medical images are visualized in Fig.  1 .

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig1_HTML.jpg

a X-ray image with pulmonary masses [ 121 ] b CT image with lung nodule [ 82 ] c Digitized histo pathological tissue image [ 132 ]

Neural networks

History of neural networks.

The study of artificial neural networks and deep learning derives from the ability to create a computer system that simulates the human brain [ 33 ]. A neurophysiologist, Warren McCulloch, and a mathematician Walter Pitts [ 97 ] developed a primitive neural network based on what has been known as a biological structure in the early 1940s. In 1949, a book titled “Organization of Behavior” [ 100 ] was the first to describe the process of upgrading synaptic weights which is now referred to as the Hebbian Learning Rule. In 1958, Frank Rosenblatt’s [ 127 ] landmark paper defined the structure of the neural network called the perceptron for the binary classification task.

In 1962, Windrow [ 172 ] introduced a device called the Adaptive Linear Neuron (ADALINE) by implementing their designs in hardware. The limitations of perceptions were emphasized by Minski and Papert (1969) [ 98 ]. The concept of the backward propagation of errors for purposes of training is discussed in Werbose1974 [ 171 ]. In 1979, Fukushima [ 38 ] designed artificial neural networks called Neocognitron, with multiple pooling and convolution layers. One of the most important breakthroughs in deep learning occurred in 2006, when Hinton et al. [ 9 ] implemented the Deep Belief Network, with several layers of Restricted Boltzmann Machines, greedily teaching one layer at a time in an unsupervised fashion. In 1989, Yann LeCun [ 71 ] combined CNN with backpropagation to effectively perform the automated recognition of handwritten digits. Figure ​ Figure2 2 shows important advancements in the history of neural networks that led to a deep learning era.

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig2_HTML.jpg

Demonstrations of significant developments in the history of neural networks [ 33 , 134 ]

Artificial neural networks

Artificial Neural Networks (ANN) form the basis for most of the DLA. ANN is a computational model structure that has some performance characteristics similar to biological neural networks. ANN comprises simple processing units called neurons or nodes that are interconnected by weighted links. A biological neuron can be described mathematically in Eq. ( 1 ). Figure ​ Figure3 3 shows the simplest artificial neural model known as the perceptron.

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig3_HTML.jpg

Perceptron [ 77 ]

Training a neural network with Backpropagation (BP)

In the neural networks, the learning process is modeled as an iterative process of optimization of the weights to minimize a loss function. Based on network performance, the weights are modified on a set of examples belonging to the training set. The necessary steps of the training procedure contain forward and backward phases. For Neural Network training, any of the activation functions in forwarding propagation is selected and BP training is used for changing weights. The BP algorithm helps multilayer FFNN to learn input-output mappings from training samples [ 16 ]. Forward propagation and backpropagation are explained with the one hidden layer deep neural networks in the following algorithm.

The backpropagation algorithm is as follows for one hidden layer neural network

  • Initialize all weights to small random values.
  • While the stopping condition is false, do steps 3 through10.
  • For each training pair (( x 1 ,  y 1 )…( x n ,  y n ) do steps 4 through 9.

Feed-forward propagation:

  • 4. Each input unit ( X i , i  = 1, 2, … n ) receives the input signal x i and send this signal to all hidden units in the above layer.
  • 5. Each hidden unit ( Z j ,  j  = 1. .,  p ) compute output using the below equation, and it transmits to the output unit (i.e.) z j _ in = b j + ∑ i = 1 n w ij x i applies to an activation function Z j  =  f ( Z j  _  in ).

y k _ in = b k + ∑ j = 1 p z j w jk and calculate activation y k  =  f ( y k  _  in )

Backpropagation

At output-layer neurons δ k  = ( t k  −  y k ) f ′ ( y k  _  in )

At Hidden layer neurons δ j = f ′ z j _ in ∑ k m δ k w jk

  • 9. Update weights and biases using the following formulas where η is learning rate

Each output layer ( Y k , k  = 1, 2, …. m ) updates its weights ( J  = 0, 1, … P ) and bias

w jk ( new ) =  w jk ( old ) +  ηδ k z j ; b k ( new ) =  b k ( old ) +  ηδ k

Each hidden layer ( Z J ,  J  = 1, 2, … p ) updates its weights ( i  = 0, 1, … n ) biases:

w ij ( new ) =  w ij ( old ) +  ηδ j x i ; b j ( old ) =  b j ( old ) +  ηδ j

  • 10. Test stopping condition

Activation function

The activation function is the mechanism by which artificial neurons process and transfers information [ 42 ]. There are various types of activation functions which can be used in neural networks based on the characteristic of the application. The activation functions are non-linear and continuously differentiable. Differentiability property is important mainly when training a neural network using the gradient descent method. Some widely used activation functions are listed in Table ​ Table1 1 .

Activation functions

Function nameFunction equationFunction derivate
Sigmoid [ ]  =  ( )(1 −  ( ))
Hyperbolic tangent [ ]  = 1 −  ( ) 
Soft sign activation
Rectified Linear Unit [ , ] (ReLU)

Leaky Rectified Linear Unit [ ]

(leaky ReLU)

Parameterized Rectified Linear Unit(PReLU) [ ]PReLU is the same as leaky ReLU. The difference is ∝ can be learned from training data via backpropagation
Randomized Leaky Rectified Linear Unit [ ]
Soft plus [ ] ( ) = ln(1 +  )
Exponential Linear Unit (ELU) [ , ]
Scaled exponential Linear Unit (SELU) [ ]

Deep learning

Deep learning is a subset of the machine learning field which deals with the development of deep neural networks inspired by biological neural networks in the human brain .

Autoencoder

Autoencoder (AE) [ 128 ] is one of the deep learning models which exemplifies the principle of unsupervised representation learning as depicted in Fig.  4a . AE is useful when the input data have more number of unlabelled data compared to labeled data. AE encodes the input x into a lower-dimensional space z. The encoded representation is again decoded to an approximated representation  x ′ of the input x through one hidden layer z.

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig4_HTML.jpg

a Autoencoder [ 187 ] b Restricted Boltzmann Machine with n hidden and m visible units [ 88 ] c Deep Belief Networks [ 88 ]

Basic AE consists of three main steps:

Encode: Convert input vector x ϵ R m into h ϵ R n , the hidden layer by h  =  f ( wx  +  b )where w ϵ R m ∗ n and b ϵ R n . m  and n are dimensions of the input vector and converted hidden state. The dimension of the hidden layer h is to be smaller than x . f is an activate function.

Decode: Based on the above  h , reconstruct input vector z by equation z  =  f ′ ( w ′ h  +  b ′ ) where w ′ ϵ R n ∗ m and b ′ ϵ R m . The f ′ is the same as the above activation function.

Calculate square error: L recons ( x , z) =  ∥  x  − z∥ 2 , which is the reconstruction error cost function. Reconstruct error minimization is achieved by optimizing the cost function (2)

Another unsupervised algorithm representation is known as Stacked Autoencoder (SAE). The SAE comprises stacks of autoencoder layers mounted on top of each other where the output of each layer was wired to the inputs of the next layer. A Denoising Autoencoder (DAE) was introduced by Vincent et al. [ 159 ]. The DAE is trained to reconstruct the input from random noise added input data. Variational autoencoder (VAE) [ 66 ] is modifying the encoder where the latent vector space is used to represent the images that follow a Gaussian distribution unit. There are two losses in this model; one is a mean squared error and the Kull back Leibler divergence loss that determines how close the latent variable matches the Gaussian distribution unit. Sparse autoencoder [ 106 ] and variational autoencoders have applications in unsupervised, semi-supervised learning, and segmentation.

Restricted Boltzmann machine

A Restricted Boltzmann machine [RBM] is a Markov Random Field (MRF) associated with the two-layer undirected probabilistic generative model, as shown in Fig. ​ Fig.4b. 4b . RBM contains visible units (input) v and hidden (output) units  h . A significant feature of this model is that there is no direct contact between the two visible units or either of the two hidden units. In binary RBMs, the random variables ( v ,  h ) takes ( v ,  h ) ∈ {0, 1} m  +  n . Like the general Boltzmann machine [ 50 ], the RBM is an energy-based model. The energy of the state { v ,  h } is defined as (3)

where v j , h i are the binary states of visible unit j  ∈ {1, 2, … m } and hidden unit i  ∈ {1, 2, .. n }, b j , c i  are their biases of visible and hidden units, w ij is the symmetric interaction term between the units v j and h i them. A joint probability of ( v ,  h ) is given by the Gibbs distribution in Eq. ( 4 )

Z is a “partition function” that can be given by summing over all possible pairs of visual v  and hidden h (5).

A significant feature of the RBM model is that there is no direct contact between the two visible units or either of the two hidden units. In term of probability, conditional distributions p ( h |  v ) and p ( v |  h ) is computed as (6) p h v = ∏ i = 1 n p h i v

For binary RBM condition distribution of visible and hidden are given by (7) and (8)

where σ( · ) is a sigmoid function

RBMs parameters ( w ij ,  b j ,  c i ) are efficiently calculated using the contrastive divergence learning method [ 150 ]. A batch version of k-step contrastive divergence learning (CD-k) can be discussed in the algorithm below [ 36 ]

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Figd_HTML.jpg

Deep belief networks

The Deep Belief Networks (DBN) proposed by Hinton et al. [ 51 ] is a non-convolution model that can extract features and learn a deep hierarchical representation of training data. DBNs are generative models constructed by stacking multiple RBMs. DBN is a hybrid model, the first two layers are like RBM, and the rest of the layers form a directed generative model. A DBN has one visible layer v and a series of hidden layers h (1) , h (2) , …, h ( l ) as shown in Fig. ​ Fig.4c. 4c . The DBN model joint distribution between the observed units v and the l  hidden layers h k (  k  = 1, … l ) as (9)

where v  =  h (0) , P ( h k |  h k  + 1 ) is a conditional distribution (10) for the layer k given the units of k  + 1

A DBN has l weight matrices: W (1) , …. , W ( l ) and l  + 1 bias vectors: b (0) , …, b ( l ) P ( h ( l ) ,  h ( l  − 1) ) is the joint distribution of top-level RBM (11).

The probability distribution of DBN is given by Eq. ( 12 )

Convolutional neural networks (CNN)

In neural networks, CNN is a unique family of deep learning models. CNN is a major artificial visual network for the identification of medical image patterns. The family of CNN primarily emerges from the information of the animal visual cortex [ 55 , 116 ]. The major problem within a fully connected feed-forward neural network is that even for shallow architectures, the number of neurons may be very high, which makes them impractical to apply to image applications. The CNN is a method for reducing the number of parameters, allows a network to be deeper with fewer parameters.

CNN’s are designed based on three architectural ideas that are shared weights, local receptive fields, and spatial sub-sampling [ 70 ]. The essential element of CNN is the handling of unstructured data through the convolution operation. Convolution of the input signal  x ( t ) with filter signal  h ( t ) creates an output signal y ( t ) that may reveal more information than the input signal itself. 1D convolution of a discrete signals x ( t ) and h ( t ) is (13)

A digital image x ( n 1 ,  n 2 ) is a 2-D discrete signal. The convolution of images  x ( n 1 ,  n 2 ) and h ( n 1 ,  n 2 ) is (14)

where 0 ≤  n 1  ≤  M  − 1, 0 ≤  n 2  ≤  N  − 1.

The function of the convolution layer is to detect local features x l from input feature maps x l  − 1 using kernels k l by convolution operation (*) i.e. x l  − 1  ∗  k l . This convolution operation is repeated for every convolutional layer subject to non-linear transform (15)

where k mn l represents weights between feature map  m at layer l  − 1 and feature map n at l . x m l − 1 represents the  m  feature map of the layer l  − 1 and x n l is n  feature map of the layer l . b m l is the bias parameter. f (.) is the non-linear activation function.  M l  − 1 denotes a set of feature maps. CNN significantly reduces the number of parameters compared with a fully connected neural network because of local connectivity and weight sharing. The depth, zero-padding, and stride are three hyperparameters for controlling the volume of the convolution layer output.

A pooling layer comes after the convolutional layer to subsample the feature maps. The goal of the pooling layers is to achieve spatial invariance by minimizing the spatial dimension of the feature maps for the next convolution layer. Max pooling and average pooling are commonly used two different polling operations to achieve downsampling. Let the size of the pooling region M  and each element in the pooling region is given as x j  = ( x 1 ,  x 2 , … x M  ×  M ), the output after pooling is given as x i . Max pooling and average polling are described in the following Eqs. ( 16 ) and ( 17 ).

The max-pooling method chooses the most superior invariant feature in a pooling region. The average pooling method selects the average of all the features in the pooling area. Thus, the max-pooling method holds texture information that can lead to faster convergence, average pooling method is called Keep background information [ 133 ]. Spatial pyramid pooling [ 48 ], stochastic polling [ 175 ], Def-pooling [ 109 ], Multi activation pooling [ 189 ], and detailed preserving pooling [ 130 ] are different pooling techniques in the literature. A fully connected layer is used at the end of the CNN model. Fully connected layers perform like a traditional neural network [ 174 ]. The input to this layer is a vector of numbers (output of the pooling layer) and outputs an N-dimensional vector (N number of classes). After the pooling layers, the feature of previous layer maps is flattened and connected to fully connected layers.

The first successful seven-layered LeNet-5 CNN was developed by Yann LeCunn in 1990 for handwritten digit recognition successfully. Krizhevsky et al. [ 68 ] proposed AlexNet is a deep convolutional neural network composed of 5 convolutional and 3 fully-connected layers. In AlexNet changed the sigmoid activation function to a ReLU activation function to make model training easier.

K. Simonyan and A. Zisserman invented the VGG-16 [ 143 ] which has 13 convolutional and 3 fully connected layers. The Visual Geometric Group (VGG) research group released a series of CNN starting from VGG-11, VGG-13, VGG-16, and VGG-19. The main intention of the VGG group to understand how the depth of convolutional networks affects the accuracy of the models of image classification and recognition. Compared to the maximum VGG19, which has 16 convolutional layers and 3 fully connected layers, the minimum VGG11 has 8 convolutional layers and 3 fully connected layers. The last three fully connected layers are the same as the various variations of VGG.

Szegedy et al. [ 151 ] proposed an image classification network consisting of 22 different layers, which is GoogleNet. The main idea behind GoogleNet is the introduction of inception layers. Each inception layer convolves the input layers partially using different filter sizes. Kaiming He et al. [ 49 ] proposed the ResNet architecture, which has 33 convolutional layers and one fully-connected layer. Many models introduced the principle of using multiple hidden layers and extremely deep neural networks, but then it was realized that such models suffered from the issue of vanishing or exploding gradients problem. For eliminating vanishing gradients’ problem skip layers (shortcut connections) are introduced. DenseNet developed by Gao et al. [ 54 ] consists of several dense blocks and transition blocks, which are placed between two adjacent dense blocks. The dense block consists of three layers of batch normalization, followed by a ReLU and a 3 × 3 convolution operation. The transition blocks are made of Batch Normalization, 1 × 1 convolution, and average Pooling.

Compared to state-of-the-art handcrafted feature detectors, CNNs is an efficient technique for detecting features of an object and achieving good classification performance. There are drawbacks to CNNs, which are that unique relationships, size, perspective, and orientation of features are not taken into account. To overcome the loss of information in CNNs by pooling operation Capsule Networks (CapsNet) are used to obtain spatial information and most significant features [ 129 ]. The special type of neurons, called capsules, can detect efficiently distinct information. The capsule network consists of four main components that are matrix multiplication, Scalar weighting of the input, dynamic routing algorithm, and squashing function.

Recurrent neural networks (RNN)

RNN is a class of neural networks used for processing sequential information (deal with sequential data). The structure of the RNN shown in Fig.  5a is like an FFNN and the difference is that recurrent connections are introduced among hidden nodes. A generic RNN model at time t , the recurrent connection hidden unit h t receives input activation from the present data x t and the previous hidden state  h t  − 1 . The output y t is calculated given the hidden state h t . It can be represented using the mathematical Eqs. ( 18 ) and ( 19 ) as

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig5_HTML.jpg

a Recurrent Neural Networks [ 163 ] b Long Short-Term Memory [ 163 ] c Generative Adversarial Networks [ 64 ]

Here f is a non-linear activation function, w hx is the weight matrix between the input and hidden layers, w hh is the matrix of recurrent weights between the hidden layers and itself w yh is the weight matrix between the hidden and output layer, and b h and b y are biases that allow each node to learn and offset. While the RNN is a simple and efficient model, in reality, it is, unfortunately, difficult to train properly. Real-Time Recurrent Learning (RTRL) algorithm [ 173 ] and Back Propagation Through Time (BPTT) [ 170 ] methods are used to train RNN. Training with these methods frequently fails because of vanishing (multiplication of many small values) or explode (multiplication of many large values) gradient problem [ 10 , 112 ]. Hochreiter and Schmidhuber (1997) designed a new RNN model named Long Short Term Memory (LSTM) that overcome error backflow problems with the aid of a specially designed memory cell [ 52 ]. Figure ​ Figure5b 5b shows an LSTM cell which is typically configured by three gates: input gate g t , forget gate  f t and output gate  o t , these gates add or remove information from the cell.

An LSTM can be represented with the following Eqs. ( 20 ) to ( 25 )

Generative adversarial networks (GAN)

In the field of deep learning, one of the deep generative models are Generative Adversarial Networks (GANs) introduced by Good Fellow in [ 43 ]. GANs are neural networks that can generate synthetic images that closely imitate the original images. In GAN shown in Fig. ​ Fig.5c, 5c , there are two neural networks, namely generator, and discriminator, which are trained simultaneously. The generator G generates counterfeit data samples which aim to “fool” the discriminator  D , while the discriminator attempts to correctly distinguish the true and false samples. In mathematical terms, D and G play a two player minimax game with the cost function of (26) [ 64 ].

Where x represents the original image, z is a noise vector with random numbers. p data ( x ) and p z ( z ) are probability distributions of x and  z , respectively.  D ( x ) represents the probability that x comes from the actual data p data ( x ) rather than the generated data. 1 −  D ( G (z)) is the probability that it can be generated from p z (z). The expectation of x from the real data distribution  p data is expressed by E x ~ p data x and the expectation of z sampled from noise is E z ~ P z z . The goal of the training is to maximize the loss function for the discriminator, while the training objective for the generator is to reduce the term log (1 −  D ( G ( z ))).The most utilization of GAN in the field of medical image analysis is data augmentation (generating new data) and image to image translation [ 107 ]. Trustability of the Generated Data, Unstable Training, and evaluation of generated data are three major drawbacks of GAN that might hinder their acceptance in the medical community [ 183 ].

Ronneberger et al. [ 126 ] proposed CNN based U-Net architecture for segmentation in biomedical image data. The architecture consists of a contracting path (left side) to capture context and an expansive symmetric path (right side) that enables precise localization. U-Net is a generalized DLA used for quantification tasks such as cell detection and shape measurement in medical image data [ 34 ].

Software frameworks

There are several software frameworks available for implementing DLA which are regularly updated as new approaches and ideas are created. DLA encapsulates many levels of mathematical principles based on probability, linear algebra, calculus, and numerical computation. Several deep learning frameworks exist such as Theano, TensorFlow, Caffe, CNTK, Torch, Neon, pylearn, etc. [ 138 ]. Globally, Python is probably the most commonly used programming language for DL. PyTorch and Tensorflow are the most widely used libraries for research in 2019. Table ​ Table2 2 shows the analysis of various Deep Learning Frameworks based on the core language and supported interface language.

Comparison of various Deep Learning Frameworks

FrameworkCore LanguageInterface providedLink
Caffe [ ]C ++Python,MATLAB, C ++
CNTK [ ]C ++C ++,Python,Brain Script
ChainerPython
DL4jJavaJava, Python, Scala
MXNetC ++

Python, R, Scala, Perl,

Julia, C ++, etc.

MatConvNet [ ]MATLAB
Tensor Flow [ ]C ++
Theano [ , ]PythonPython
Torch [ ]Lua

Use of deep learning in medical imaging

X-ray image.

Chest radiography is widely used in diagnosis to detect heart pathologies and lung diseases such as tuberculosis, atelectasis, consolidation, pleural effusion, pneumothorax, and hyper cardiac inflation. X-ray images are accessible, affordable, and less dose-effective compared to other imaging methods, and it is a powerful tool for mass screening [ 14 ]. Table ​ Table3 3 presents a description of the DL methods used for X-ray image analysis.

An overview of the DLA for the study of X-ray images

ReferenceDatasetMethodApplicationMetrics
Lo et al.,1995 [ ]CNNTwo-layer CNN, each with 12 5 × five filters for lung nodule detection.ROC
S.Hwang et al. 2016 [ ]KIT, MC, and ShenzhenDeep CNNThe first deep CNN-based Tuberculosis screening system with transfer learning techniqueAUC
Rajpurkar et al. 2017 [ ]ChestX-ray14CNNDetects Pneumonia using CheXNet is a 121-layer CNN from a chest X-ray image.F1 score

Lopes & Valiati

2017 [ ]

Shenzhen and MontgomeryCNNComparative analysis of Pre-trained CNN as feature extractors for tuberculosis detectionAccuracy, ROC
Mittal et al. 2018 [ ]JSRTLF-SegNetSegmentation of lung field from CXR images using Fully convolutional encoder-decoder networkAccuracy
E.J.Hwang et al. 2019 [ ]57,481 CXR imagesCNNDeep learning-based automatic detection (DLAD) algorithm for tuberculosis detection on CXRROC
Souza et al. 2019 [ ]MontgomeryCNNSegmentation of lungs in CXR for detection and diagnosis of pulmonary diseases using two CNN architectureDice coefficient
Hooda et al. [ ]Shenzhen, Montgomery Belarus, JSRTCNNAn ensemble of three pre-trained architectures ResNet, AlexNet, and GoogleNet for TB detectionAccuracy, ROC
Xu et al. 2019 [ ]chest X-ray14CNN, CXNet-m1Design a hierarchical CNN structure for a new network CXNet-m1 to detect anomaly of chest X-ray imagesAccuracy, F1-score, and AUC
Murphy et al. 2019 [ ]5565 CXR imagesDeep learning-based CAD4TB software evaluationROC
Rajaraman and Antani 2020 [ ]RSNA, Pediatric pneumonia, and Indiana,CNNAn ensemble of modality-specific deep learning models for Tuberculosis (TB) detection from CXR

Accuracy,

AUC, CI

Capizzi et al. 2020 [ ]Open data set from PNNThe fuzzy system, combined with a neural network, can detect low-contrast nodules.Accuracy
Abbas et al. 2020 [ ]196 X-ray imagesCNNClassification of COVID-19 CXR images using Decompose, Transfer, and Compose (DeTraC)Accuracy, SN, SP
Basu et al. 2020 [ ]225 COVID-19 CXR imagesCNNDETL (Domain Extension Transfer Learning) method for the screening of COVID-19 from CXR imagesAccuracy
Wang & Wong 2020 [ ]13,975 X-ray imagesCNNA deep convolutional neural network COVID-Net design for the detection of COVID-19 casesAccuracy, SN, PPV.
Ozturk et al. 2020 [ ]127 X-ray imagesCNNDeep learning-based DarkCovid net model to detect and classify COVID-19 cases from X-ray imagesAccuracy.
Loey et al. 2020 [ ]306 X-ray imagesAlexNet google Resnet18A GAN with deep transfer learning for COVID-19 detection in limited CXR images.Accuracy,
Apostolopoulos & Mpesiana 2020 [ ]1427 X-ray imagesCNNTransfer Learning-based CNN architectures to the detection of the Covid-19.Accuracy, SN, SP

S. Hwang et al. [ 57 ] proposed the first deep CNN-based Tuberculosis screening system with a transfer learning technique. Rajaraman et al. [ 119 ] proposed modality-specific ensemble learning for the detection of abnormalities in chest X-rays (CXRs). These model predictions are combined using various ensemble techniques toward minimizing prediction variance. Class selective mapping of interest (CRM) is used for visualizing the abnormal regions in the CXR images. Loey et al. [ 90 ] proposed A GAN with deep transfer training for COVID-19 detection in CXR images. The GAN network was used to generate more CXR images due to the lack of the COVID-19 dataset. Waheed et al. [ 160 ] proposed a CovidGAN model based on the Auxiliary Classifier Generative Adversarial Network (ACGAN) to produce synthetic CXR images for COVID-19 detection. S. Rajaraman and S. Antani [ 120 ] introduced weakly labeled data augmentation for increasing training dataset to improve the COVID-19 detection performance in CXR images.

Computerized tomography (CT)

CT uses computers and rotary X-ray equipment to create cross-section images of the body. CT scans show the soft tissues, blood vessels, and bones in different parts of the body. CT is a high detection ability, reveals small lesions, and provides a more detailed assessment. CT examinations are frequently used for pulmonary nodule identification [ 93 ]. The detection of malignant pulmonary nodules is fundamental to the early diagnosis of lung cancer [ 102 , 142 ]. Table ​ Table4 4 summarizes the latest deep learning developments in the study of CT image analysis.

A review of articles that use DL techniques for the analysis of the CT image

ReferenceDatasetMethodApplicationMetrics

Van Ginneken

2015 [ ]

LIDC (865 CT scans)CNNNodule detects in chest CT with pre-trained CNN models from orthogonal patches around the candidateFROC
Li et al. 2016 [ ]LIDC database.CNNNodule classification with 2D CNN that processes small patches around a nodule

SN, FP/exam

Accuracy

Setio et al. 2016 [ ]

LIDC-IDRI,

ANODE09

Multi-view

Conv Net

CNN-based algorithms for pulmonary nodule detection with 9-patches per candidate.

Sensitivity

FROC

Shin et al. 2016 [ ]ILD datasetCNNInterstitial lung disease (ILD) classification and Lymph node (LN) detection using transfer learning-based CNNsAUC
Qiang, Yan et al. 2017 [ ]Independent datasetDeep SDAE-ELMDiscriminative features of nodules in CT and PET images are combined using the fusion method for classification of nodulesSN,SP,AUC,
Onishi Y et al. 2019 [ ]Independent datasetCNNCNN trained by Wasserstein GAN for pulmonary nodule classificationSN, SP, AUC Accuracy
Li et al. .2018 [ ]2017 LiTS, 3DIRCADb datasetH-Dense UnetH-Dense UNet for tumor and liver segmentation from CT volumeDICE
Pezeshk et al. 2018 [ ]LIDC3DFCN and 3DCNN3DFCN is used for nodule candidate generation and 3D CNN for reducing the false-positive rateFROC
Balagourouchetty et.al 2019 [ ]634 liver CT imagesGoogLeNet based FCNet ClassifierThe liver lesion classification using GoogLeNet based ensemble FCNet classifier

Accuracy,

ROC

Y.Wang et a2019 [ ]Independent datasetFaster RCNN and ResNetIntelligent Imaging Layout System (IILS) for the detection and classification of pulmonary nodulesSN, SP AUC Accuracy
Pang et al. 2020 [ ]

Shandong

Provincial Hospital

CNN

(DenseNet)

Classification of lung cancer type from CT images using the DenseNet network.Accuracy

Masood et al.

2020 [ ]

LIDCmRFCNLung nodule classification and detection using mRFCN based automated decision support system

SN, SP, AUC,

Accuracy

Zhao and Zeng 2019 [ ]

KiTS19

challenge

3D-UNetMulti-scale supervised 3D U-Net to simultaneously segment kidney and kidney tumors from CT images

DICE, Recall

Accuracy

Precision

Fan et al. 2020 [ ]

COVID-19 infection

dataset

Inf-NetCOVID-19 lung CT infection segmentation network

DICE, SN, SP

MAE

Li et al. 2020 [ ]4356 Chest CT imagesCOVNetCOVID-19 detection neural network (COVNet) used for the recognition of COVID-19 from volumetric chest CT examsAUC, SN, SP

AUC: area under ROC curve; FROC: Area under the Free-Response ROC Curve; SN: sensitivity; SP: specificity; MAE: mean absolute error LIDC: Lung Image Database Consortium; LIDC-IDRI: Lung Image Database Consortium-Image Database Resource Initiative.

Li et al. 2016 [ 74 ] proposed deep CNN for the detection of three types of nodules that are semisolid, solid, and ground-glass opacity. Balagourouchetty et al. [ 5 ] proposed GoogLeNet based an ensemble FCNet classifier for The liver lesion classification. For feature extraction, basic Googlenet architecture is modified with three modifications. Masood et al. [ 95 ] proposed the multidimensional Region-based Fully Convolutional Network (mRFCN) for lung nodule detection/classification and achieved a classification accuracy of 97.91%. In lung nodule detection, the feature work is the detection of micronodules (less than 3 mm) without loss of sensitivity and accuracy. Zhao and Zeng 2019 [ 190 ] proposed DLA based on supervised MSS U-Net and 3DU-Net to automatically segment kidneys and kidney tumors from CT images. In the present pandemic situation, Fan et al. [ 35 ] and Li et al. [ 79 ] used deep learning-based techniques for COVID-19 detection from CT images.

Mammograph (MG)

Breast cancer is one of the world’s leading causes of death among women with cancer. MG is a reliable tool and the most common modality for early detection of breast cancer. MG is a low-dose x-ray imaging method used to visualize the breast structure for the detection of breast diseases [ 40 ]. Detection of breast cancer on mammography screening is a difficult task in image classification because the tumors constitute a small part of the actual breast image. For analyzing breast lesions from MG, three steps are involved that are detection, segmentation, and classification [ 139 ].

The automatic classification and detection of masses at an early stage in MG is still a hot subject of research. Over the past decade, DLA has shown some significant overcome in breast cancer detection and classification problem. Table ​ Table5 5 summarizes the latest DLA developments in the study of mammogram image analysis.

Summary of DLA for MG image analysis

ReferenceDatasetMethodApplicationMetrics
Sahiner et al.1996 [ ]Manually extracted ROIs from 168 mammogramsCNNCNN for classification of masses and normal tissue on MG.ROC,TP,FP
Fonseca et al. 2015 [ ]CNNCNN for feature extraction in combing with an SVM as a classifier for breast density estimationAccuracy
Huych et al. .2016 [ ]607 Digital MG images(219 breast lesions)CNNPre-trained CNN models (MG-CNN) for mass classificationAUC
Wang et al. .2017 [ ]840 standard screening FFDMsDeep CNNDetection of cardiovascular disease based on vessel calcificationFROC
Geras et al. 2017 [ ]Screening mammograms images 129, 208MV-CNNMulti-view deep CNN for breast cancer screening and image resolution on the prediction accuracyAccuracy, ROC, TP, FP
Zhang et al. 2017 [ ]3000 MG imagesCNNData augmentation and transfer learning methods with a CNN for classificationROC
Wu et al. 2017 [ ]200,000 Breast cancer screening examsDCNDeep CNN for breast density classificationAUC
Kyono et al. 2018 [ ]Private dataset of 8162 patientsMAMMO-CNNMAMMO is a novel multi-view CNN with multi-task learning (MTL) a clinical decision support system capable of triaging MGAccuracy
Lehman et al. [ ]41,479 Mammogram imagesResNet-18Deep learning-based CNN for mammographic breast density classificationAccuracy
Kim et al. 2018 [ ]29,107 Digital MG (24,765 normal cases and 4339 cancer cases)DIB-MGDIB-MG is weakly supervised learning. DIB-MG learns radiologic features without any human annotations.SN, SP, Accuracy
Ribli et al. 2018 [ ]DDSM (2620), INbreast (115), Private database

Faster -CNN,

VGG16

CNN detects and classifies malignant or benign lesions on MG imagesAU
Chougrad et al. 2018 [ ]MIAS,DDSM, INbreast, BCDR

VGG16, ResNet50,

Inceptionv3

Transfer learning and fine-tuning strategy based CNN to classify MG mass lesionsAUC, Accuracy
Karthik et al. 2018 [ ]WBCDDNN-RFSDeep neural network (DNN) as a classifier model for breast cancer dataAccuracy, Precision, SP, SN, F-score
Cai et al. 2019 [ ]990 MG images, 540 Malignant masses, and 450 benign lesionsDCNNDeep CNN for microcalcification discrimination for breast cancer screeningAccuracy, Precision, SP, AUC, SN
Wu et al. 2019 [ ]1000 000 imagesDCNNCNN-based breast cancer screening classifierAUC
Conant et al. .2019 [ ]12,000 cases, including 4000 biopsy-proven cancersDCNNDeep CNN based system detected soft tissue and calcific lesions in the DBT imagesAUC
Rodriguez-Ruiz et al. 2019 [ ]

9000 Cancer cases and

180,000 normal cases Radiologists

DCNNCNN based CAD systemAUC
Ionescu et al. 2019 [ ]Private data setCNNBreast density estimation and risk scoring

MIAS: Mammographic Image Analysis Society dataset; DDSM: Digital Database for Screening Mammography; BI-RADS: Breast Imaging Reporting and Data System; `WBCD: Wisconsin Breast Cancer Dataset; DIB-MG: data-driven imaging biomarker in mammography. FFDMs: Full-Field Digital Mammograms; MAMMO: Man and Machine Mammography Oracle; FROC: Free response receiver operating characteristic analysis; SN: sensitivity; SP: specificity.

Fonseca et al. [ 37 ] proposed a breast composition classification according to the ACR standard based on CNN for feature extraction. Wang et al. [ 161 ] proposed twelve-layer CNN to detect Breast arterial calcifications (BACs) in mammograms image for risk assessment of coronary artery disease. Ribli et al. [ 124 ] developed a CAD system based on Faster R-CNN for detection and classification of benign and malignant lesions on a mammogram image without any human involvement. Wu et al. [ 176 ] present a deep CNN trained and evaluated on over 1,000,000 mammogram images for breast cancer screening exam classification. Conant et al. [ 26 ] developed a Deep CNN based AI system to detect calcified lesions and soft- tissue in digital breast tomosynthesis (DBT) images. Kang et al. [ 62 ] introduced Fuzzy completely connected layer (FFCL) architecture, which focused primarily on fused fuzzy rules with traditional CNN for semantic BI-RADS scoring. The proposed FFCL framework achieved superior results in BI-RADS scoring for both triple and multi-class classifications.

Histopathology

Histopathology is the field of study of human tissue in the sliding glass using a microscope to identify different diseases such as kidney cancer, lung cancer, breast cancer, and so on. The staining is used in histopathology for visualization and highlight a specific part of the tissue [ 45 ]. For example, Hematoxylin and Eosin (H&E) staining tissue gives a dark purple color to the nucleus and pink color to other structures. H&E stain plays a key role in the diagnosis of different pathologies, cancer diagnosis, and grading over the last century. The recent imaging modality is digital pathology

Deep learning is emerging as an effective method in the analysis of histopathology images, including nucleus detection, image classification, cell segmentation, tissue segmentation, etc. [ 178 ]. Tables ​ Tables6 6 and ​ and7 7 summarize the latest deep learning developments in pathology. In the study of digital pathology image analysis, the latest development is the introduction of whole slide imaging (WSI). WSI allows digitizing glass slides with stained tissue sections at high resolution. Dimitriou et al. [ 30 ] reviewed challenges for the analysis of multi-gigabyte WSI images for building deep learning models. A. Serag et al. [ 135 ] discuss different public “Grand Challenges” that have innovations using DLA in computational pathology.

Summary of articles using DLA for digital pathology image - Organ segmentation

ReferenceStaining/
Image modality
MethodApplicationDatasetMetrics
Ronneberger et al. .2015 [ ]EMU-net architecture with deformation augmentationSegmentation of neuronal structures, cell segmentationISBI cell tracking challenge 2014 and 2015Warping, Rand, Pixel Error
Song et al. 2016 [ ]

Pap,

H & E

Multi-scale CNN modelSegmentation of cervical cells in Pap smear imagesISBI 2015 Challenge, Shenzhen University (SZU) DatasetDice Coefficient
Xing et al. 2016 [ ]

IHC

H & E,

CNN and sparse shape modelNuclei segmentationPrivate set containing brain tumor (31), pancreatic NET (22), breast cancer (35) images
Chen et al. 2017 [ ]H & EMulti-task learning framework with contour-aware FCN model for instance segmentation

Deep contour-aware CNN Segmentation of colon

glands

GLAS challenge (165 images), MICCAI2015 nucleus segmentation

challenge (33 images)

Dice coefficient
Van Eycke et al. (2018) [ ]H & EIntegration of DCAN, UNet, and ResNet modelsSegmentation of glandular epithelium in H & E and IHC staining imagesGlaS challenge (165 images) and a private set containing colorectal tissue microarray images

F1-score,

object dice coefficient

Liang et al. 2018 [ ]H & EPatch-based FCN + iterative learning approachfirst-time deep learning applied to the gastric tumor segmentation2017 China Big Data and AI challenge (1900 images)Mean IoU, mean accuracy
Qu et al. 2019 [ ]H & EFCN trained with perceptual lossJointly classifies and segments various types of nuclei from histopathology images

40 tissue images of lung adenocarcinoma

(private set)

F1score,

Dice coefficient accuracy,

Pinckaers and Litjens 2019 [ ]H & EIncorporating NODE in U - Net to allow an adaptive receptive fieldSegmentation of colon glandsGlaS challenge (165) imagesObject Dice, F1 score
Gadermayr et al. 2019 [ ]Stain agnosticCycleGAN + UNet segmentationMulti-Domain Unsupervised Segmentation of object-of interest in WSIs23 PAS, 6 AFOG, 6 Col3 & 6 CD31 WSIsF1 score
Sun et al. 2019 [ ]H & EMulti-scale modules and specific convolutional operations

Deep learning architecture

for gastric cancer segmentation

500 pathological images of gastric areas, with cancerous regions

Summary of articles using DLA for digital pathology image - Detection and classification of disease

ReferenceStaining/image modalityMethodApplicationData set
Xu et al. 2016 [ ]H&EStacked sparse autoencodersNucleus detection from Breast Cancer Histopathology Images537 H&E images from Case Western Reserve University
Coudray et al. (2018) [ ]H&E

Patch-based Inception-V3

model

Lung cancer histopathology images classify them into LUAD, LUSC, or normal lung tissue

FFPE sections (140 s)

Frozen sections (98 s),

and lung biopsies (102 s)

Song et al. 2018 [ ]H&EDeep autoencoderSimultaneous detection and classification of cells in bone marrow histology images
Yi et al. 2018 [ ]H&EFCNMicrovessel prediction in H&E Stained Pathology Images

Lung adenocarcinoma

(ADC) patients images 38

Bulten and Litjens 2018 [ ]H&E, IHCSelf-clustering Convolutional adverse Arial AutoencodersClassification of the pros take into tumor vs non-tumor

94 registered WSIs from

Radboud University Medical Center

Valkonen et al. 2019 [ ]

ER, PR,

Ki-67

Fine-tuning partially pre-trained CNN networkRecognition of epithelial cells in breast cancers stained for ER, PR, and Ki-67Digital Pan CK (152 – invasive breast cancer images)
Wei et al. 2019 [ ]H&EResNet-18 based patch classifierClassification of histologic subtypes on lung adenocarcinoma143 WSIs private set
Wang et al. (2019) [ ]H & EPatch-based FCN and context-aware block selection + feature aggregation strategyLung cancer image classificationPrivate (939 WSIs), TCGA (500 WSIs)
Li et al. 2019 [ ]H & EFCN trained with a concentric loss on weakly annotated centroid labelMitosis detection in breast histopathology imagesICPR12 (50 images), ICPR14 (1696 images), AMIDA13 (606 images), TUPAC16 (107 images)
Tabibu et al. .2019 [ ]H & E

Pre-trained Res Net based

patch classifier

Classification of Renal Cell Carcinoma subtypes and survival predictionTCGA(2, 093WSI)
Lin et al. 2019 [ ]H & EFast Scan Net: FCN based modelAutomatic detection of breast cancer metastases from whole-slide image2016 Camelyon Grand Challenge (400 WSI)

NODE: Neural Ordinary Differential Equations; IoU: mean Intersection over Union coefficient

Other images

Endoscopy is the insertion of a long nonsurgical solid tube directly into the body for the visual examination of an internal organ or tissue in detail. Endoscopy is beneficial in studying several systems inside the human body, such as the gastrointestinal tract, the respiratory tract, the urinary tract, and the female reproductive tract [ 60 , 101 ]. Du et al. [ 31 ] reviewed the Applications of Deep Learning in the Analysis of Gastrointestinal Endoscopy Images. A revolutionary device for direct, painless, and non-invasive inspection of the gastrointestinal (GI) tract for detecting and diagnosing GI diseases (ulcer, bleeding) is Wireless capsule endoscopy (WCE). Soffer et al. [ 145 ] performed a systematic analysis of the existing literature on the implementation of deep learning in the WCE. The first deep learning-based framework was proposed by He et al. [ 46 ] for the detection of hookworm in WCE images. Two CNN networks integrated (edge extraction and classification of hookworm) to detect hookworm. Since tubular structures are crucial elements for hookworm detection, the edge extraction network was used for tubular region detection. Yoon et al. [ 185 ] developed a CNN model for early gastric cancer (EGC) identification and prediction of invasion depth. The depth of tumor invasion in early gastric cancer (EGC) is a significant factor in deciding the method of treatment. For the classification of endoscopic images as EGC or non-EGC, the authors employed a VGG-16 model. Nakagawa et al. [ 105 ] applied DL technique based on CNN to enhance the diagnostic assessment of oesophageal wall invasion using endoscopy. J.choi et al. [ 22 ] express the feature aspects of DL in endoscopy.

Positron Emission Tomography (PET) is a nuclear imaging tool that is generally used by the injection of particular radioactive tracers to visualize molecular-level activities within tissues. T. Wang et al. [ 168 ] reviewed applications of machine learning in PET attenuation correction (PET AC) and low-count PET reconstruction. The authors discussed the advantages of deep learning over machine learning in the applications of PET images. AJ reader et al. [ 123 ] reviewed the reconstruction of PET images that can be used in deep learning either directly or as a part of traditional reconstruction methods.

The primary purpose of this paper is to review numerous publications in the field of deep learning applications in medical images. Classification, detection, and segmentation are essential tasks in medical image processing [ 144 ]. For specific deep learning tasks in medical applications, the training of deep neural networks needs a lot of labeled data. But in the medical field, at least thousands of labeled data is not available. This issue is alleviated by a technique called transfer learning. Two transfer learning approaches are popular and widely applied that are fixed feature extractors and fine-tuning a pre-trained network. In the classification process, the deep learning models are used to classify images into two or more classes. In the detection process, Deep learning models have the function of identifying tumors and organs in medical images. In the segmentation task, deep learning models try to segment the region of interest in medical images for processing.

Segmentation

For medical image segmentation, deep learning has been widely used, and several articles have been published documenting the progress of deep learning in the area. Segmentation of breast tissue using deep learning alone has been successfully implemented [ 104 ]. Xing et al. [ 179 ] used CNN to acquire the initial shape of the nucleus and then isolate the actual nucleus using a deformable pattern. Qu et al. [ 118 ] suggested a deep learning approach that could segment the individual nucleus and classify it as a tumor, lymphocyte, and stroma nuclei. Pinckaers and Litjens [ 115 ] show on a colon gland segmentation dataset (GlaS) that these Neural Ordinary Differential Equations (NODE) can be used within the U-Net framework to get better segmentation results. Sun 2019 [ 149 ] developed a deep learning architecture for gastric cancer segmentation that shows the advantage of utilizing multi-scale modules and specific convolution operations together. Figure ​ Figure6 6 shows U-Net is the most usually used network for segmentation (Fig. ​ (Fig.6 6 ).

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig6_HTML.jpg

U-Net architecture for segmentation,comprising encoder (downsampling) and decoder (upsampling) sections [ 135 ]

The main challenge posed by methods of detection of lesions is that they can give rise to multiple false positives while lacking a good proportion of true positive ones . For tuberculosis detection using deep learning methods applied in [ 53 , 57 , 58 , 91 , 119 ]. Pulmonary nodule detection using deep learning has been successfully applied in [ 82 , 108 , 136 , 157 ].

Shin et al. [ 141 ] discussed the effect of CNN pre-trained architectures and transfer learning on the identification of enlarged thoracoabdominal lymph nodes and the diagnosis of interstitial lung disease on CT scans, and considered transfer learning to be helpful, given the fact that natural images vary from medical images. Litjens et al. [ 85 ] introduced CNN for the identification of Prostate cancer in biopsy specimens and breast cancer metastasis identification in sentinel lymph nodes. The CNN has four convolution layers for feature extraction and three classification layers. Riddle et al. [ 124 ] proposed the Faster R-CNN model for the detection of mammography lesions and classified these lesions into benign and malignant, which finished second in the Digital Mammography DREAM Challenge. Figure ​ Figure7 7 shows VGG architecture for detection.

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig7_HTML.jpg

CNN architecture for detection [ 144 ]

An object detection framework named Clustering CNN (CLU-CNNs) was proposed by Z. Li et al. [ 76 ] for medical images. CLU-CNNs used Agglomerative Nesting Clustering Filtering (ANCF) and BN-IN Net to avoid much computation cost facing medical images. Image saliency detection aims at locating the most eye-catching regions in a given scene [ 21 , 78 ]. The goal of image saliency detection is to locate a given scene in the most eye-catching regions. In different applications, it also acts as a pre-processing tool including video saliency detection [ 17 , 18 ], object recognition, and object tracking [ 20 ]. Saliency maps are a commonly used tool for determining which areas are most important to the prediction of a trained CNN on the input image [ 92 ]. NT Arun et al. [ 4 ] evaluated the performance of several popular saliency methods on the RSNA Pneumonia Detection dataset and was found that GradCAM was sensitive to the model parameters and model architecture.

Classification

In classification tasks, deep learning techniques based on CNN have seen several advancements. The success of CNN in image classification has led researchers to investigate its usefulness as a diagnostic method for identifying and characterizing pulmonary nodules in CT images. The classification of lung nodules using deep learning [ 74 , 108 , 117 , 141 ] has also been successfully implemented.

Breast parenchymal density is an important indicator of the risk of breast cancer. The DL algorithms used for density assessment can significantly reduce the burden of the radiologist. Breast density classification using DL has been successfully implemented [ 37 , 59 , 72 , 177 ]. Ionescu et al. [ 59 ] introduced a CNN-based method to predict Visual Analog Score (VAS) for breast density estimation. Figure ​ Figure8 8 shows AlexNet architecture for classification.

An external file that holds a picture, illustration, etc.
Object name is 11042_2021_10707_Fig8_HTML.jpg

CNN architecture for classification [ 144 ]

Alcoholism or alcohol use disorder (AUD) has effects on the brain. The structure of the brain was observed using the Neuroimaging approach. S.H.Wang et al. [ 162 ] proposed a 10-layer CNN for alcohol use disorder (AUD) problem using dropout, batch normalization, and PReLU techniques. The authors proposed a 10 layer CNN model that has obtained a sensitivity of 97.73, a specificity of 97.69, and an accuracy of 97.71. Cerebral micro-bleeding (CMB) are small chronic brain hemorrhages that can result in cognitive impairment, long-term disability, and neurologic dysfunction. Therefore, early-stage identification of CMBs for prompt treatment is essential. S. Wang et al. [ 164 ] proposed the transfer learning-based DenseNet to detect Cerebral micro-bleedings (CMBs). DenseNet based model attained an accuracy of 97.71% (Fig. ​ (Fig.8 8 ).

Limitations and challenges

The application of deep learning algorithms to medical imaging is fascinating, but many challenges are pulling down the progress. One of the limitations to the adoption of DL in medical image analysis is the inconsistency in the data itself (resolution, contrast, signal-to-noise), typically caused by procedures in clinical practice [ 113 ]. The non-standardized acquisition of medical images is another limitation in medical image analysis. The need for comprehensive medical image annotations limits the applicability of deep learning in medical image analysis. The major challenge is limited data and compared to other datasets, the sharing of medical data is incredibly complicated. Medical data privacy is both a sociological and a technological issue that needs to be discussed from both viewpoints. For building DLA a large amount of annotated data is required. Annotating medical images is another major challenge. Labeling medical images require radiologists’ domain knowledge. Therefore, it is time-consuming to annotate adequate medical data. Semi-supervised learning could be implemented to make combined use of the existing labeled data and vast unlabelled data to alleviate the issue of “limited labeled data”. Another way to resolve the issue of “data scarcity” is to develop few-shot learning algorithms using a considerably smaller amount of data. Despite the successes of DL technology, there are many restrictions and obstacles in the medical field. Whether it is possible to reduce medical costs, increase medical efficiency, and improve the satisfaction of patients using DL in the medical field cannot be adequately checked. However, in clinical trials, it is necessary to demonstrate the efficacy of deep learning methods and to develop guidelines for the medical image analysis applications of deep learning.

Conclusion and future directions

Medical imaging is a place of origin of the information necessary for clinical decisions. This paper discusses the new algorithms and strategies in the area of deep learning. In this brief introduction to DLA in medical image analysis, there are two objectives. The first one is an introduction to the field of deep learning and the associated theory. The second is to provide a general overview of the medical image analysis using DLA. It began with the history of neural networks since 1940 and ended with breakthroughs in medical applications in recent DL algorithms. Several supervised and unsupervised DL algorithms are first discussed, including auto-encoders, recurrent, CNN, and restricted Boltzmann machines. Several optimization techniques and frameworks in this area include Caffe, TensorFlow, Theano, and PyTorch are discussed. After that, the most successful DL methods were reviewed in various medical image applications, including classification, detection, and segmentation. Applications of the RBM network is rarely published in the medical image analysis literature. In classification and detection, CNN-based models have achieved good results and are most commonly used. Several existing solutions to medical challenges are available. However, there are still several issues in medical image processing that need to be addressed with deep learning. Many of the current DL implementations are supervised algorithms, while deep learning is slowly moving to unsupervised and semi-supervised learning to manage real-world data without manual human labels.

DLA can support clinical decisions for next-generation radiologists. DLA can automate radiologist workflow and facilitate decision-making for inexperienced radiologists. DLA is intended to aid physicians by automatically identifying and classifying lesions to provide a more precise diagnosis. DLA can help physicians to minimize medical errors and increase medical efficiency in the processing of medical image analysis. DL-based automated diagnostic results using medical images for patient treatment are widely used in the next few decades. Therefore, physicians and scientists should seek the best ways to provide better care to the patient with the help of DLA. The potential future research for medical image analysis is the designing of deep neural network architectures using deep learning. The enhancement of the design of network structures has a direct impact on medical image analysis. Manual design of DL Model structure requires rich knowledge; hence Neural Network Search will probably replace the manual design [ 73 ]. A meaningful feature research direction is also the design of various activation functions. Radiation therapy is crucial for cancer treatment. Different medical imaging modalities are playing a critical role in treatment planning. Radiomics was defined as the extraction of high throughput features from medical images [ 28 ]. In the feature, Deep-learning analysis of radionics will be a promising tool in clinical research for clinical diagnosis, drug development, and treatment selection for cancer patients . Due to limited annotated medical data, unsupervised, weakly supervised, and reinforcement learning methods are the emerging research areas in DL for medical image analysis. Overall, deep learning, a new and fast-growing field, offers various obstacles as well as opportunities and solutions for a range of medical image applications.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Muralikrishna Puttagunta, Email: moc.liamg@04939ilarum .

S. Ravi, Email: moc.liamg@eticivars .

The Overview of Medical Image Processing Based on Deep Learning

  • Conference paper
  • First Online: 15 August 2021
  • Cite this conference paper

research papers on medical image processing

  • Qing An 39 ,
  • Bo Jiang 39 &
  • Jupu Yuan 39  

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 784))

Included in the following conference series:

  • International Conference on Medical Imaging and Computer-Aided Diagnosis

927 Accesses

1 Citations

With the rapid development of artificial intelligence technology, deep learning is being applied to the field of medical image analysis. This paper summarizes the deep learning models related to medical image analysis, and the application results of these models in medical image classification, detection, segmentation and registration. It specifically involves the image analysis tasks of nerve, retina, lung, digital pathology, breast, musculoskeletal and other aspects. Finally, it summarizes the current research status of deep learning related to medical image analysis, and discusses the challenges and direction of future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

research papers on medical image processing

Medical Image Analysis Through Deep Learning Techniques: A Comprehensive Survey

research papers on medical image processing

Deep Learning for Medical Image Processing: Overview, Challenges and the Future

research papers on medical image processing

Deep Learning and the Future of Biomedical Image Analysis

Lecun, Y., Bottou, L., Bengio, Y., et al.: Gradient - based learning applied to document recognition. Proc. IEEE 86 (11), 2278–2324 (1998)

Google Scholar  

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 25 , 1097–1105 (2012)

Li, R., Zhang, W., Suk, H.I., et al.: Deep learning based imaging data completion for improved brain disease diagnosis. Med. Image Comput. Assist. 17 (3), 305–312 (2014)

Khatami, A., Khosravi, A., Nguyen, T., et al.: Medical image analysis using wavelet transform and deep belief networks. Expert Syst. Appl. 86 , 190–198 (2017)

Article   Google Scholar  

Van Tulder, G., de Bruijne, M.: Combining generative and discriminative representation learning for lung CT analysis with convolutional restricted Boltzmann machines. IEEE Trans. Med. Imaging 35 (5), 1262–1272 (2016)

Hinton, G.E., Srivastava, N., Krizhevsky, A., et al.: Improving neural networks by preventing co - adaptation of feature detectors. Comput. Sci. 3 (4), 212–223 (2012)

Han, H., Xie, L., Ding, F., et al.: Hierarchical least - squares based iterative identification for multivariable systems with moving average noises. Math. Comput. Model. Inter. J. 51 (9–10), 1213–1220 (2010)

Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

Erhan, D., Bengio, Y., Courville, A., et al.: Why does unsupervised pre – training help deep learning. J. Mach. Learn. Res. 11 (3), 625–660 (2010)

Wan, L., Zeiler, M., Zhang, S., et al.: Regularization of neural networks using drop connect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)

Futoma, J., Morris, J., Lucas, J.: A comparison of models for predicting early hospital readmissions. J. Biomed. Inform. 56 , 229–238 (2015)

Han, H., Xie, L., Ding, F., et al.: Hierarchical least - squares based iterative identification for multivariable systems with moving average noises. Math. Comput. Model. Int. J. 51 (9–10), 1213–1220 (2010)

Ross, G.J., Adams, N.M., Tasoulis, D.K., et al.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit. Lett. 33 (2), 191–198 (2012)

Download references

Author information

Authors and affiliations.

Wuchang University of Technology, No. 16 of Jiang Xia Avenue, Wuhan, 430223, Hubei, China

Qing An, Bo Jiang & Jupu Yuan

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, China

Department of Informatics, University of Leicester, Leicester, UK

Yu-Dong Zhang

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

An, Q., Jiang, B., Yuan, J. (2022). The Overview of Medical Image Processing Based on Deep Learning. In: Su, R., Zhang, YD., Liu, H. (eds) Proceedings of 2021 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2021). MICAD 2021. Lecture Notes in Electrical Engineering, vol 784. Springer, Singapore. https://doi.org/10.1007/978-981-16-3880-0_43

Download citation

DOI : https://doi.org/10.1007/978-981-16-3880-0_43

Published : 15 August 2021

Publisher Name : Springer, Singapore

Print ISBN : 978-981-16-3879-4

Online ISBN : 978-981-16-3880-0

eBook Packages : Engineering Engineering (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

processes-logo

Article Menu

research papers on medical image processing

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Research on defect diagnosis of transmission lines based on multi-strategy image processing and improved deep network.

research papers on medical image processing

1. Introduction

2. multi-strategy image processing, 2.1. image enhancement based on wavelet denoising, 2.2. multi-threshold segmentation based on hsv color space, 2.3. extraction of transmission line regions based on morphological processing, 3. detection of transmission line defects, 3.2. googlenet, 3.3. focal loss, 3.4. proposed method, 4. experimental results and analysis, 4.1. dataset introduction, 4.2. results and analysis, 5. conclusions, author contributions, data availability statement, conflicts of interest.

  • Li, Y.; Liu, M.; Li, Z.; Jiang, X. CSSAdet: Real-Time end-to-end small object detection for power transmission line inspection. IEEE Trans. Power Deliv. 2023 , 38 , 4432–4442. [ Google Scholar ] [ CrossRef ]
  • Liu, Z.; Wu, G.; He, W.; Fan, F.; Ye, X. Key target and defect detection of high-voltage power transmission lines with deep learning. Int. J. Electr. Power Energy Syst. 2022 , 142 , 108277. [ Google Scholar ] [ CrossRef ]
  • Mishra, D.; Ray, P. Fault detection, location and classification of a transmission line. Neural Comput. Appl. 2018 , 30 , 1377–1424. [ Google Scholar ] [ CrossRef ]
  • Chen, K.; Hu, J.; He, J. Detection and classification of transmission line faults based on unsupervised feature learning and convolutional sparse autoencoder. IEEE Trans. Smart Grid 2016 , 9 , 1748–1758. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Li, Q.; Chen, B. Image classification towards transmission line fault detection via learning deep quality-aware fine-grained categorization. J. Vis. Commun. Image Represent. 2019 , 64 , 102647. [ Google Scholar ] [ CrossRef ]
  • Zheng, X.; Jia, R.; Gong, L.; Zhang, G.; Dang, J. Component identification and defect detection in transmission lines based on deep learning. J. Intell. Fuzzy Syst. 2021 , 40 , 3147–3158. [ Google Scholar ] [ CrossRef ]
  • Deng, F.; Zeng, Z.; Mao, W.; Wei, B.; Li, Z. A novel transmission line defect detection method based on adaptive federated learning. IEEE Trans. Instrum. Meas. 2023 , 72 , 3508412. [ Google Scholar ] [ CrossRef ]
  • Komoda, M.; Kawashima, T.; Minemura, M.; Mineyama, A.; Aihara, M.; Ebinuma, Y.; Kiuchi, M. Electromagnetic induction method for detecting and locating flaws on overhead transmission lines. IEEE Trans. Power Deliv. 1990 , 5 , 1484–1490. [ Google Scholar ] [ CrossRef ]
  • Cheng, H.; Zhai, Y.; Chen, R.; Wang, D.; Dong, Z.; Wang, Y. Self-shattering defect detection of glass insulators based on spatial features. Energies 2019 , 12 , 543. [ Google Scholar ] [ CrossRef ]
  • Yuan, C.; Xie, C.; Li, L.; Zhang, F.; Gubanski, S. Ultrasonic phased array detection of internal defects in composite insulators. IEEE Trans. Power Deliv. 2016 , 23 , 525–531. [ Google Scholar ] [ CrossRef ]
  • Xiao, Y.; Xiong, L.; Zhang, Z.; Dan, Y. A novel defect detection method for overhead ground wire. Sensors 2023 , 24 , 192. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fu, W.; Yang, K.; Wen, B.; Shan, Y.; Li, S.; Zheng, B. Rotating machinery fault diagnosis with limited multisensor fusion samples by fused attention-guided wasserstein GAN. Symmetry 2024 , 16 , 285. [ Google Scholar ] [ CrossRef ]
  • Liao, W.; Fu, W.; Yang, K.; Tan, C. Multi-scale residual neural network with enhanced gated recurrent unit for fault diagnosis of rolling bearing. Meas. Sci. Technol. 2024 , 35 , 056114. [ Google Scholar ] [ CrossRef ]
  • Ni, H.; Wang, M.; Zhao, L. An improved Faster R-CNN for defect recognition of key components of transmission line. Math. Biosci. Eng. 2021 , 18 , 4679–4695. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chen, Y.; Wang, H.; Shen, J.; Zhang, X.; Gao, X. Application of Data-Driven Iterative Learning Algorithm in Transmission Line Defect Detection. Sci. Program. 2021 , 2021 , 9976209. [ Google Scholar ] [ CrossRef ]
  • Fu, Q.; Liu, J.; Zhang, X.; Zhang, Y.; Ou, Y.; Jiao, R.; Mazzanti, G. A small-sized defect detection method for Overhead transmission lines based on convolutional neural networks. IEEE Trans. Instrum. Meas. 2023 , 72 , 3524612. [ Google Scholar ] [ CrossRef ]
  • Yu, Z.; Lei, Y.; Shen, F.; Zhou, S.; Yuan, Y. Research on identification and detection of transmission line insulator defects based on a lightweight YOLOv5 network. Remote Sens. 2023 , 15 , 4552. [ Google Scholar ] [ CrossRef ]
  • Bhadra, A.B.; Hasan, K.; Islam, S.S.; Sarker, N.; Tama, I.J.; Khan, S.M. Robust Short-Circuit Fault Analysis Scheme for Overhead Transmission Line. In Proceedings of the IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), Dhaka, Bangladesh, 4–5 December 2021; pp. 104–107. [ Google Scholar ] [ CrossRef ]
  • Shen, T.; Liang, X.; Zhang, B.; Yang, G.; Li, D.; Zu, J.; Pan, S. Transmission line safety early warning technology based on multi-source data perception. In Proceedings of the IEEE 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, 24–26 September 2021; pp. 261–264. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Liang, Y.; Wang, J.; Zhang, S. Image improvement in the wavelet domain for optical coherence tomograms. J. Innov. Opt. Health Sci. 2021 , 4 , 73–78. [ Google Scholar ] [ CrossRef ]
  • Song, Q.; Ma, L.; Cao, J.; Han, X. Image denoising based on mean filter and wavelet transform. In Proceedings of the IEEE 4th International Conference on Advanced Information Technology and Sensor Application (AITS), Harbin, China, 21–23 August 2015; pp. 39–42. [ Google Scholar ] [ CrossRef ]
  • Kurniastuti, I.; Wulan, T.D.; Andini, A. Color Feature Extraction of Fingernail Image based on HSV Color Space as Early Detection Risk of Diabetes Mellitus. In Proceedings of the IEEE International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), Banyuwangi, Indonesia, 27–28 October 2021; pp. 51–55. [ Google Scholar ] [ CrossRef ]
  • Popuri, A.; Miller, J. Generative Adversarial Networks in Image Generation and Recognition. In Proceedings of the IEEE International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 13–15 December 2023; pp. 1294–1297. [ Google Scholar ] [ CrossRef ]
  • Doi, K.; Iwasaki, A. The effect of focal loss in semantic segmentation of high resolution aerial image. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6919–6922. [ Google Scholar ] [ CrossRef ]
  • Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014 , 27 , 2672–2680. [ Google Scholar ] [ CrossRef ]
  • Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. Int. Conf. Mach. Learn. 2017 , 70 , 214–223. [ Google Scholar ] [ CrossRef ]
  • Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017 , 30 , 5767–5777. [ Google Scholar ] [ CrossRef ]
  • Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [ Google Scholar ] [ CrossRef ]
  • LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998 , 86 , 2278–2324. [ Google Scholar ] [ CrossRef ]
  • Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2012 , 2 , 1097–1105. [ Google Scholar ] [ CrossRef ]
  • Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [ Google Scholar ] [ CrossRef ]
  • Howard, A.; Zhmoginov, A.; Chen, L.C.; Sandler, M.; Zhu, M. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. In Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition CVPR, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [ Google Scholar ]
  • Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Defect ClassNormalLooseBroken
Train600586294
Test606060
Label012
MethodAccuracy (%)Recall (%)F1-Score (%)Loss
AlexNet86.3490.1189.100.20
DenseNet94.0197.2097.370.16
MobileNet-V291.7696.9796.680.19
Proposed method97.8397.8197.780.04
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Gou, M.; Tang, H.; Song, L.; Chen, Z.; Yan, X.; Zeng, X.; Fu, W. Research on Defect Diagnosis of Transmission Lines Based on Multi-Strategy Image Processing and Improved Deep Network. Processes 2024 , 12 , 1832. https://doi.org/10.3390/pr12091832

Gou M, Tang H, Song L, Chen Z, Yan X, Zeng X, Fu W. Research on Defect Diagnosis of Transmission Lines Based on Multi-Strategy Image Processing and Improved Deep Network. Processes . 2024; 12(9):1832. https://doi.org/10.3390/pr12091832

Gou, Ming, Hao Tang, Lei Song, Zhong Chen, Xiaoming Yan, Xiangwen Zeng, and Wenlong Fu. 2024. "Research on Defect Diagnosis of Transmission Lines Based on Multi-Strategy Image Processing and Improved Deep Network" Processes 12, no. 9: 1832. https://doi.org/10.3390/pr12091832

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. 🎉 Medical image processing research papers. Most Downloaded Medical

    research papers on medical image processing

  2. (PDF) Survey Paper on Diagnosis of Breast Cancer Using Image Processing

    research papers on medical image processing

  3. (PDF) Advances in medical image computing

    research papers on medical image processing

  4. (PDF) Medical Image Processing-An Introduction

    research papers on medical image processing

  5. (PDF) A review of medical image processing using deep learning

    research papers on medical image processing

  6. (PDF) Viewpoints on Medical Image Processing: From Science to Application

    research papers on medical image processing

VIDEO

  1. AI in Healthcare Series Workshop: 4

  2. 机器帮医生看医疗图像,介绍医疗图像分割的通用模型MedSAM

  3. Medical Image Processing with MATLAB

  4. Medical checkup for gulf job/Gumca medical processing

  5. medical officer past papers| medical officer mcqs| medical officer test syllabus

  6. Medical Surgical Nursing 2 Model Question 2024 // Gnm Nursing 2nd Year

COMMENTS

  1. Medical image analysis based on deep learning approach

    Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning ...

  2. A cognitive deep learning approach for medical image processing

    This paper presents CoDLRBVS, a pioneering cognitive-based deep learning model for medical image processing, namely retinal blood vessel segmentation. Our approach combines (1) a Matched Filter to ...

  3. How Artificial Intelligence Is Shaping Medical Imaging Technology: A

    The described innovations have been crucial in advancing the state of the art in medical image processing, covering machine learning tasks, such as classification, segmentation, synthesis (image or video), detection, and captioning [34,45]. By enhancing the model's ability to focus on relevant information and understand complex relationships ...

  4. Medical image analysis using deep learning algorithms

    Medical image processing is an area of research that encompasses the creation and application of algorithms and methods to analyze and decipher medical images . ... The paper concluded by discussing future research directions in this area, including the integration of multi-omics data, network analysis, and deep learning techniques. ...

  5. The Constantly Evolving Role of Medical Image Processing in Oncology

    In this paper, it is argued that the evolution of medical image processing has been a gradual process, and the diverse factors that contributed to unprecedented progress in the field with the use of AI are explained. ... During the last decades CAD-driven precision diagnosis has been the holy grail of medical image processing research efforts ...

  6. Segment anything in medical images

    During data pre-processing, we obtained 1,570,263 medical image-mask pairs for model development and validation. For internal validation, we randomly split the dataset into 80%, 10%, and 10% as ...

  7. A review on deep learning in medical image analysis

    In the field of medical image processing methods and analysis, fundamental information and state-of-the-art approaches with deep learning are presented in this paper. The primary goals of this paper are to present research on medical image processing as well as to define and implement the key guidelines that are identified and addressed.

  8. Medical Image Analysis for Detection, Treatment and Planning of Disease

    In this paper, a framework for the segmentation of X-ray images using artificial ... Medical image processing for disease prediction is a complex and iterative process that requires expertise in both medical imaging and machine learning. Proper evaluation and ... In the research by Saygılı [9], it was examined that how

  9. Advances in Computer-Aided Medical Image Processing

    The primary objective of this study is to provide an extensive review of deep learning techniques for medical image recognition, highlighting their potential for improving diagnostic accuracy and efficiency. We systematically organize the paper by first discussing the characteristics and challenges of medical imaging techniques, with a particular focus on magnetic resonance imaging (MRI) and ...

  10. Critical Analysis of the Current Medical Image-Based Processing ...

    Medical image processing and analysis techniques play a significant role in diagnosing diseases. Thus, during the last decade, several noteworthy improvements in medical diagnostics have been made based on medical image processing techniques. In this article, we reviewed articles published in the most important journals and conferences that used or proposed medical image analysis techniques to ...

  11. Recent Advances in Medical Image Processing

    Key Message: In this paper, we will review recent advances in artificial intelligence, machine learning, and deep convolution neural network, focusing on their applications in medical image processing. To illustrate with a concrete example, we discuss in detail the architecture of a convolution neural network through visualization to help ...

  12. Research in Medical Imaging Using Image Processing Techniques

    The image processing techniques were founded in the 1960s. Those techniques were used for different fields such as Space, clinical purposes, arts, and TV image improvement. In the 1970s with the ...

  13. Medical images classification using deep learning: a survey

    This paper discusses the different evaluation metrics used in medical imaging classification. Provides a conclusion and future directions in the field of medical image processing using deep learning. This is the outline of the survey paper. In Section 2, medical image analysis is discussed in terms of its applications.

  14. Medical image segmentation using deep learning: A survey

    IET Image Processing journal publishes the latest research in image and video processing, covering the generation, processing and communication of visual information. Abstract Deep learning has been widely used for medical image segmentation and a large number of papers has been presented recording the success of deep learning in the field.

  15. Viewpoints on Medical Image Processing: From Science to Application

    Abstract. Medical image processing provides core innovation for medical imaging. This paper is focused on recent developments from science to applications analyzing the past fifteen years of history of the proceedings of the German annual meeting on medical image processing (BVM). Furthermore, some members of the program committee present their ...

  16. Frontiers

    1. Introduction. The origin of radiology can be seen as the beginning of medical image processing. The discovery of X-rays by Röntgen and its successful application in clinical practice ended the era of disease diagnosis relying solely on the clinical experience of doctors (Glasser, 1995).The production of medical images provides doctors with more data, enabling them to diagnose and treat ...

  17. Medical image analysis using deep learning algorithms

    Medical image processing is an area of research that encompasses the creation and application of algorithms and methods to analyze and decipher medical images . ... The paper concluded by discussing future research directions in this area, including the integration of multi-omics data, network analysis, and deep learning techniques. ...

  18. BDCC

    Federated learning is an emerging technology that enables the decentralised training of machine learning-based methods for medical image analysis across multiple sites while ensuring privacy. This review paper thoroughly examines federated learning research applied to medical image analysis, outlining technical contributions. We followed the guidelines of Okali and Schabram, a review ...

  19. Medical Image Processing

    This paper focuses on one of the recent breakthroughs in the field of deep learning - Generative Adversarial Network (GAN) (Goodfellow et al. (2014)) ... Medical image processing and research is a critical part of study and prognosis using magnetic resonance imaging (MRI). It is used in the study of the brain's anatomical structure, in which ...

  20. Artificial Intelligence Techniques for Medical Image ...

    Medical imaging is crucial for diagnosing and monitoring diseases, utilizing various modalities such as X-ray, MRI, computed tomography (CT), and ultrasound [] Each modality is selected based on factors such as acquisition speed, image resolution, and patient comfort.Following the acquisition of medical images, healthcare professionals conduct a thorough analysis to detect diseases and assess ...

  21. Medical image processing: A review

    The advent of computer aided technologies image processing techniques have become increasingly important in a wide variety of medical applications. Intervention between the protection of useful diagnostic information and noise suppression must be treasured in medical images. Image denoising is an applicable issue found in diverse image processing and computer vision problems. There are various ...

  22. Deep learning and medical image processing for coronavirus (COVID-19

    Motivated by this fact, a large number of research works have been proposed and developed for the initial months of 2020. In this paper, we first focus on summarizing the state-of-the-art research works related to deep learning applications for COVID-19 medical image processing.

  23. [2408.15224] SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for

    Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in ...

  24. Technologies

    Medical imaging (MI) [ 1] utilizes various technologies to produce images of the human body's internal structures and functions [ 2 ]. Healthcare professionals (HPs) [ 3] use these medical images for four purposes: diagnosis [ 4 ], treatment planning [ 5 ], monitoring [ 6 ], and research. Firstly, the HPs utilize medical images to identify ...

  25. Medical image analysis based on deep learning approach

    Deep Learning Approach (DLA) in medical image analysis emerges as a fast-growing research field. DLA has been widely used in medical imaging to detect the presence or absence of the disease. This paper presents the development of artificial neural networks, comprehensive analysis of DLA, which delivers promising medical imaging applications.

  26. The Overview of Medical Image Processing Based on Deep Learning

    This paper summarizes the deep learning models related to medical image analysis, and the application results of these models in medical image classification, detection, segmentation and registration. It specifically involves the image analysis tasks of nerve, retina, lung, digital pathology, breast, musculoskeletal and other aspects.

  27. Processes

    The current manual inspection of transmission line images captured by unmanned aerial vehicles (UAVs) is not only time-consuming and labor-intensive but also prone to high rates of false detections and missed inspections. With the development of artificial intelligence, deep learning-based image recognition methods can automatically detect various defect categories of transmission lines based ...