Explainability of Tuberculosis Diagnosis based on Chest X-Ray Images with Vision Transformer
DOI:
https://doi.org/10.56042/jsir.v85i1.5865Keywords:
Deep learning, Machine learning, Medical image analysis, Self-attention, Transformer networkAbstract
Chest X-ray radiography is a reasonably inexpensive and widely available diagnostic technology that can aid in identifying different illnesses like tuberculosis (TB), pneumonia, COVID-19 and many more. The demand for skilled personnel to evaluate X-ray radiographs is a challenge in many health facilities across the globe, particularly in underdeveloped regions. Machine Learning (ML) algorithms have enabled the automated diagnosis of TB from X-ray modalities. Aside from deep convolutional neural networks (DCNN) for vision applications, the Vision Transformer (ViT) network has also produced outstanding results in image classification. Motivated by the robustness of the transformer network on image processing tasks, the study proposes a transformer-based framework for early screening of TB disease. Three different vision transformer types ViT-Base16 (ViT-B16), ViT-Base32 (ViT-B32), and ViT-Large32(ViT-L32) were tested in the experiment to see how well they performed in identifying tuberculosis. When the transformer models' outcomes were contrasted with those of other CNNs, the VIT-B32 model performed admirably in the diagnostic procedure. The ViT-B32 model's attained accuracy, sensitivity, specificity, precision, F-1 score, and AUC scores of the ViT-B32 model were 96.96%, 96.89%, 97.01%, 96.72%, 96.80% and 0.97, respectively, on TB classification. The ViT-b32 model demonstrated superiority and generalizability. Because of its low cost and ease of use, the ViT-b32 model may provide an accurate diagnostic system to all TB patients for early screening.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Scientific & Industrial Research (JSIR)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.