THE DEVELOPMENT OF AN ADVANCES IN OPEN AD HOC TEAMWORK AND TEAMMATE GENERATION

0
109
You can download this material now from our portal

THE DEVELOPMENT OF AN ADVANCES IN OPEN AD HOC TEAMWORK AND TEAMMATE GENERATION

ABSTRACT

Many real-world problems require an agent that can adapt its policy to efficiently collaborate with different groups of teammates whose composition may change over time. Previous work to design agents with such adaptive capabilities has been explored in the field of ad hoc teamwork. Given a predefined set of teammates for training, prior methods for ad hoc teamwork focused on training an agent to collaborate within a closed team where teammates remain in the environment during interaction with the trained agent. In this thesis, we consider ad hoc teamwork in open teams where agents with different fixed policies can enter and leave the environment. This thesis contributes to the Graph-based Policy Learning (GPL) approach for ad hoc teamwork in open teams, assuming full observability of the environment. GPL leverages graph neural networks (GNNs) to predict teammates’ actions and estimate their effects on the trained agent’s returns. These predictions are then utilised to compute the trained agent’s optimal action-value function when dealing with open teams. We empirically demonstrate GPL’s effectiveness for training agents in ad hoc teamwork with open teams by showing it achieves significantly higher returns than agent policies resulting from various deep reinforcement learning baselines. Further analysis also demonstrates that GPL’s success results from effectively learning the effects of teammates’ actions towards the trained agent. We also contribute to an extension of GPL to environments under partial observability. GPL’s extension to partially observable environments is based on different methodologies to maintain belief estimates over the latent environment states and team composition. The belief estimates are inferred based on the trained agent’s sequence of observations and utilised to compute the learning agent’s optimal policy under partial observability. Empirical results demonstrate that this extension can learn efficient open ad hoc teamwork policies under partial observability. Further analysis demonstrates that this efficiency results from accurately predicting the latent teammate actions and environment state. The final contribution of this thesis is a method for the automated discovery of diverse training teammate types. This method is the first step to prevent a trained agent from performing poorly against previously unseen teammates with significantly different behaviour from those encountered during training. Our approach assumes closed environments and is based on the idea that an optimal set of training teammates consists of agents that require different best-response policies for optimal collaboration. Training against teammates from this set enables the trained agent to learn a broader range of behaviours necessary for efficient collaboration in ad hoc teamwork. We finally demonstrate that our teammate generation approach improves the robustness of a learner’s performance in ad hoc teamwork compared to alternative methods.

THE DEVELOPMENT OF AN ADVANCES IN OPEN AD HOC TEAMWORK AND TEAMMATE GENERATION. GET MORE  COMPUTER SCIENCE PROJECT TOPICS AND MATERIALS

Leave a Reply