SPAM GUARD PRO: A LIGHTWEIGHT REAL-TIME SMS SPAM DETECTION SYSTEM USING TF-IDF AND LOGISTIC REGRESSION WITH INTERPRETABLE FEATURE ENGINEERING

Authors

  • Umair Ayaz Kamangar
  • Abdul Sattar Chan
  • Soyam Kapoor
  • Ranjhan Ali
  • Zainab Umair Kamangar
  • Khalid Hussain

Keywords:

SMS spam detection; TF-IDF; logistic regression; NLP; text classification; feature engineering; Streamlit; machine learning

Abstract

The rapidly growing and overwhelming number of unsolicited SMS messages, their exploitative and deceiving characteristics, and thus introducing serious threat to security, private information, and finances of mobile users, efforts to identify unwanted applications are therefore essential. In this paper, SpamGuard Pro, a lightweight yet high accuracy SMS spam filter based on the Logistic Regression classification algorithm with Message TF IDF Vectorizer and three custom verb behavior and linguistic characteristics, is proposed for fast and reliable SMS spam detection, trained and assessed by application of the well-known UCI SMS Spam Collection dataset with about 5,700 samples. The experimental results showed the accuracy of 96.7, precision of 95.2, recall of 94.8, and F1 score of 95.0.  In order to extract more features without the complexity, we added other manually designed features such as length of message, number of exclamations and number of capitalized words based on the likelihood of spam messages and the linguistic behaviors of spam messages. These features together with the T-FIDF, bag-of-words, n-grams best contributed to the interpretability and achieving performance. In addition, we built the whole system as a web application on the Streamlit platform which is a new, simple, and popular light-weighted platform for user to categorize their own data interactively and instantly. From the comparison and analysis among all three interpretable models, we find that an interpretable model is still competitive on the online real-time spam detection system, in particular, Logistic Regression for this kind of classification problem.

Downloads

Published

2026-05-03

How to Cite

Umair Ayaz Kamangar, Abdul Sattar Chan, Soyam Kapoor, Ranjhan Ali, Zainab Umair Kamangar, & Khalid Hussain. (2026). SPAM GUARD PRO: A LIGHTWEIGHT REAL-TIME SMS SPAM DETECTION SYSTEM USING TF-IDF AND LOGISTIC REGRESSION WITH INTERPRETABLE FEATURE ENGINEERING . Spectrum of Engineering Sciences, 4(5), 411–427. Retrieved from https://thesesjournal.com/index.php/1/article/view/2711