Multi-Phase CNN-ViT-Wavelet Fusion with Attention for Robust Kidney Stone Detection from Ultrasound Images
DOI:
https://doi.org/10.65718/inspireAI.2026.1008Keywords:
Kidney Stone Detection, Ultrasound Imaging, Convolutional Neural Network (CNN), Vision Transformer (ViT), Wavelet Transform, Attention Mechanism, Feature Fusion, External ValidationAbstract
Accurate detection of kidney stones in ultrasound images is hindered by low contrast, speckle noise, and operator-dependent variability. The objective of this study is to develop a robust multi-phase model that integrates the local, global, and frequency-domain features to enable a reliable and generalizable kidney stone classification. Three convolutional neural networks (ResNet50, DenseNet121, and EfficientNet-B0) and vision transformer variants (ViT-Base, Swin-Transformer, and DeiT-Small) were independently trained on a primary renal ultrasound dataset. The dataset was split patient-wise in a 70-15-15 manner; extensive data augmentation, normalization, and five-fold cross-validation were implemented to avoid data leakage and ensure the model's robustness. The best-performing CNN and ViT features were fused at the feature level with an attention mechanism and classified using ML classifiers, among which XGBoost demonstrated the optimal performance. Next, a discrete wavelet transform (DWT) branch was incorporated to acquire complementary frequency information for further enhancing the discriminative capability. The multi-phase framework achieved 97.9%, 97.8%, and 0.997 for accuracy, F1-score, and AUC, respectively, on the internal dataset. Similarly, it obtained 94.3%, 94.0%, and 0.970 for accuracy, F1-score, and AUC on the external renal ultrasound dataset. These results demonstrate a robust generalization backed up by Grad-CAM and attention maps. The Multi-Phase Framework (MPF) offers a consistent, generalizable, and fully automated method for kidney stone detection in ultrasound images, supporting improved diagnostic performance.
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2026 Faizan Ahmad, Uzair Ishtiaq, Malik M. Ali Shahid (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.