We exploit the model-free paradigm of reinforcement learning to optimize a set of decentralized guidance policies for a swarm of sensor platforms that cooperatively track a constant velocity target. Each platform is equipped with a noisy bearing sensor that measures the azimuth and elevation angle of the target. The measurements from all platforms are shared and an onboard Extended Kalman Filter is used to estimate the relative target state. This state estimation process is taken into account for the optimization of the guidance laws with the aim of proposing trajectories that increase the observability of the target. A multi-agent capable version of Proximal Policy Optimization is used and benchmarked with a Model-Predictive-Control reference approach.
«