Indigenous Knowledge Systems and Machine Learning: Evaluating the Suitability of Colon Classification
DOI:
https://doi.org/10.56042/alis.v72i4.25677Keywords:
Annif, Colon Classification, Machine learning, Neural Network model, Retrieval metricsAbstract
This study explores the application of machine learning (ML) techniques to automate subject classification using S.R. Ranganathan’s Colon Classification (CC - 6th edition), a faceted system rooted in traditional Indian knowledge frameworks. The research integrates the Colon Classification scheme with Annif, an open-source AI/ML-based subject indexing tool developed by the National Library of Finland to predict main class of a text corpus based on Colon Classification 6th edition. A curated dataset of nearly 100,000 English-language bibliographic records (with CC 6th notation) from the Indian National Bibliography (INB) was used for model training. The study evaluates the performance of several machine learning backends—fastText, Omikuji (Bonsai), and Support Vector Classification (SVC) - as well as two ensemble models, including a neural network-based ensemble created through hyperparameter optimization. Key retrieval metrics such as F1@5 and NDCG were used to assess models efficacy. Among the tested models, the neural network ensemble achieved the highest scores, with F1@5 = 0.5873 and NDCG = 0.9473, showing strong accuracy in both prediction and ranking. The study demonstrates that machine learning can effectively support traditional classification systems when backed by well-structured data. Finally, through REST/API integration, the framework enables scalable classification, allowing real-time, automated processing of large bibliographic corpora. This work bridges indigenous classification logic with modern AI, contributing to more inclusive knowledge organization systems.