Price-based demand response (DR) enables house-holds to provide the flexibility required in power grids with a high share of volatile renewable energy sources. Multi-agent reinforcement learning (MARL) is a powerful, decentralized decision-making tool for autonomous agents participating in DR programs. Unfortunately, MARL algorithms do not naturally allow one to incorporate safety guarantees, preventing their real-world deployment. To meet safety constraints, we propose a safeguarding mechanism with agent-specific safety shields that minimally adjust the decisions of each agent. We investigate the influence of using a reward function that reflects these safety interventions. Results show that considering safety aspects in the reward during training improves both the convergence rate and the performance of the MARL agents in the investigated numerical experiments.
«
Price-based demand response (DR) enables house-holds to provide the flexibility required in power grids with a high share of volatile renewable energy sources. Multi-agent reinforcement learning (MARL) is a powerful, decentralized decision-making tool for autonomous agents participating in DR programs. Unfortunately, MARL algorithms do not naturally allow one to incorporate safety guarantees, preventing their real-world deployment. To meet safety constraints, we propose a safeguarding mechanism...
»