Policy Graphs for Byzantine Fault Detection in Communicating Multi-Agent Reinforcement Learning
Policy Graphs for Byzantine Fault Detection in Communicating Multi-Agent Reinforcement Learning
Multi-agent RL systems coordinating through learned communication graphs are vulnerable to Byzantine agents that transmit strategically corrupted observations. We propose BARD-MARL — a framework that augments Bayesian latent graph inference with policy graph-based behavioral verification. Each agent's trajectory is mapped to a discrete directed policy graph capturing its decision-making topology. Spectral and behavioral features are fed into an Isolation Forest for unsupervised anomaly detection. On a 25-agent traffic signal control environment, our detection pipeline achieves F1 = 0.60 at 20% Byzantine fraction — three times the random baseline.
Request Manuscript

