Multi-Phase CNN-ViT-Wavelet Fusion with Attention for Robust Kidney Stone Detection from Ultrasound Images

Authors

DOI:

https://doi.org/10.65718/inspireAI.2026.1008

Keywords:

Kidney Stone Detection, Ultrasound Imaging, Convolutional Neural Network (CNN), Vision Transformer (ViT), Wavelet Transform, Attention Mechanism, Feature Fusion, External Validation

Abstract

Accurate detection of kidney stones in ultrasound images is hindered by low contrast, speckle noise, and operator-dependent variability. The objective of this study is to develop a robust multi-phase model that integrates the local, global, and frequency-domain features to enable a reliable and generalizable kidney stone classification. Three convolutional neural networks (ResNet50, DenseNet121, and EfficientNet-B0) and vision transformer variants (ViT-Base, Swin-Transformer, and DeiT-Small) were independently trained on a primary renal ultrasound dataset. The dataset was split patient-wise in a 70-15-15 manner; extensive data augmentation, normalization, and five-fold cross-validation were implemented to avoid data leakage and ensure the model's robustness. The best-performing CNN and ViT features were fused at the feature level with an attention mechanism and classified using ML classifiers, among which XGBoost demonstrated the optimal performance. Next, a discrete wavelet transform (DWT) branch was incorporated to acquire complementary frequency information for further enhancing the discriminative capability. The multi-phase framework achieved 97.9%, 97.8%, and 0.997 for accuracy, F1-score, and AUC, respectively, on the internal dataset. Similarly, it obtained 94.3%, 94.0%, and 0.970 for accuracy, F1-score, and AUC on the external renal ultrasound dataset. These results demonstrate a robust generalization backed up by Grad-CAM and attention maps. The Multi-Phase Framework (MPF) offers a consistent, generalizable, and fully automated method for kidney stone detection in ultrasound images, supporting improved diagnostic performance.

Multi-Phase CNN-ViT-Wavelet Fusion with Attention for Robust Kidney Stone Detection from Ultrasound Images

Published

2026-03-19

How to Cite

Multi-Phase CNN-ViT-Wavelet Fusion with Attention for Robust Kidney Stone Detection from Ultrasound Images. (2026). Inspire Intelligence Journal, 1(2), 85-105. https://doi.org/10.65718/inspireAI.2026.1008

Similar Articles

You may also start an advanced similarity search for this article.