MULTI-AGENT REINFORCEMENT LEARNING FOR COORDINATED DEMAND RESPONSE IN SIMULATED MICROGRIDS

Roshan Ali Siyal; Abdul Quddus Bhutto; Aadil Jamali; Imtiaz Ali Brohi; Altaf Hussain Bouk; Najma Imtiaz Ali

Authors

Roshan Ali Siyal
Abdul Quddus Bhutto
Aadil Jamali
Imtiaz Ali Brohi
Altaf Hussain Bouk
Najma Imtiaz Ali

Keywords:

Multi-Agent Reinforcement Learning; Centralized-Training Decentralized-Execution (CTDE); Demand Response; Microgrids; Safety-Aware Reinforcement Learning; Communication-Efficient Coordination; Constrained Markov Decision Processes (CMDP); Scalability and Robustness; PyPSA-based Simulation; Renewable Integration; Fairness in Demand Response; Privacy-Preserving Learning

Abstract

The accelerated growth of the distributed energy resources, as well as the associated growth of prosumer-based variability, require new control paradigms, that enable balancing economic goals, operational safety and communicative limits in distribution level. The paper presents a centralized-training-decentralized-execution multi-agent-reinforcement training platform, which combines safety conscious learning (lagrangREQ), i.e. Lagrangian-constrained critics and run time learnability filter) with communication efficient coordination (budget-aware scheduling and message compression). Testing is done to a reproducible microgrid simulator however on power-flow primitives and real-life renewable and price traces that covers across various situations in which the renewable penetration is varied as well as the population of agents (10-100 agents), forecast error, packet loss and contingency events. Empirical findings indicate that the developed approach achieves a normalized operational cost of 0.71 over 0.78 using a centralized MPC oracle, 0.79 using a leading MARL oracle (MAPPO) and 0.95 using independent agents and achieves a peak reduction of 31.6 percent and shifts more than 50 kWh/day of flexible demand in test scenarios. The results are significantly more safety performance, where the composite-voltage and SOC violations have been significantly decreased to almost single-digit value per thousand steps with statistical significance of cost and peak gains verified by paired tests and bootstrap confidence intervals at the 95 percent level. The method is gracefully degraded when forecast errors are moderate, and when it can tolerate up to approximates of 20 percent communication loss and ablation studies can isolate the efforts of safety critics and communication modules. The findings show that safety-enhanced, socially-aware CTDE MARL represents an experimental approach to coordinated demand response that is practical, scalable, and capable of ensuring reliability, and open artifacts are released to facilitate the speed at which fields may be translated.