TRUST SCORE FRAMEWORK FOR GOVERNING AUTONOMOUS DECISION-MAKING IN AGENTIC AI CUSTOMER SERVICE SYSTEMS

Authors

  • Areesha Sami
  • Warda Nadir
  • Aqsa Saleem
  • Aatif Hussain

Keywords:

Agentic AI Systems, Large Language Models (LLMs), Autonomous Decision-Making, Human-AI Collaboration, and AI Trust Framework.

Abstract

To address this, our paper introduces the Multi-Dimensional Trust Score (MDTS) Framework a practical evaluation layer that sits on top of existing AI systems and scores every AI-generated response across five dimensions: Accuracy, Personalization, Transparency, Privacy Safety, and Autonomy Risk. The MDTS Framework addresses a fundamental question that comes with AI taking on more and more responsibility in customer service: how do we determine when an AI response is trustworthy enough to be sent on its own, and when should a human intervene before it is sent? Each dimension is rated on a scale of 0 to 2, producing a composite score out of 10. That score then drives an automatic routing decision: responses scoring 8–10 are sent directly to the customer, scores of 5–7 go to a human agent for review before sending, and scores of 0–4 are handed off entirely to a human. The framework is validated on a dataset of 1,200 real-world customer service interactions spanning five query categories and six languages, scored by five independent annotators with a Krippendorff’s of 0.7675. Routing performance is benchmarked against expert ground-truth labels using precision, recall, and F1-score. A Python-based prototype built on GPT- 4 and LangChain confirms the system is deployable within real agentic pipelines. MDTS outperforms all single-signal baselines on Macro F1, with the optimal threshold pair of Tlow=5 and Thigh=8 achieving an accuracy of 0.614 and a Macro F1 of 0.481. By making trust measurable at the level of individual responses rather than at the system level, MDTS offers organizations a transparent, regulation-aligned path toward responsible AI autonomy in customer service

Downloads

Published

2026-06-12

How to Cite

Areesha Sami, Warda Nadir, Aqsa Saleem, & Aatif Hussain. (2026). TRUST SCORE FRAMEWORK FOR GOVERNING AUTONOMOUS DECISION-MAKING IN AGENTIC AI CUSTOMER SERVICE SYSTEMS. Spectrum of Engineering Sciences, 4(6), 1253–1274. Retrieved from https://thesesjournal.com/index.php/1/article/view/3198