A NOVEL MODIFIED RELATIVE DISCRIMINATION CRITERION FOR FEATURE RANKING IN TEXT CLASSIFICATION

Shakir Ullah; Imad Ullah; Ibad Ullah; Naseer Ullah; Muhammad Taufiq

Authors

Shakir Ullah
Imad Ullah
Ibad Ullah
Naseer Ullah
Muhammad Taufiq

Abstract

The exponential growth of textual data across diverse domains poses significant challenges for extracting actionable insights, particularly in text classification tasks. Effective feature selection is paramount to mitigate issues such as high dimensionality, sparsity, and semantic complexity inherent in text corpora. Conventional feature selection methods, primarily designed for numerical or categorical data, often underperform when applied to text due to these unique characteristics. This study proposes the Modified Relative Discrimination Criterion (MRDC), a novel feature ranking approach specifically developed for text classification. The MRDC enhances feature selection by quantifying the discriminative capacity of features within a text corpus, thereby enabling more robust and efficient ranking. Evaluated on standard performance metrics including accuracy, precision, recall, and F1-score he MRDC achieved 82.12% accuracy, 82.42% precision, 82.12% recall, and 82.16% F1-score using only 1,500 features, compared to approximately 150,000 original features. In contrast, the baseline Relative Discrimination Criterion yielded a lower F1-score of 74.38% with the same feature subset. These results demonstrate the MRDC’s superior performance in optimizing feature selection for text classification tasks, offering a significant advancement over existing methods.

Keywords

Text Classification, Feature Selection, Modified Relative Discrimination Criterion, Feature Ranking, Text Mining, Machine Learning, High-Dimensional Data, Discriminative Features