In this project we applied different reinforcement learning algorithms and policies which include imitation learning, DQN, DQFD, and AC3 to the Pommerman FFA competition challenge. We were able to successfully perform as efficiently as SimpleAgent which was a baseline heuristic using DQN and an architecture inspired by AlphaGo and Atari papers. Most of our agents emerged with defensive behaviors where we tried to train them further with reward shaping to observe emergence of other behaviors.
Please ignore if you have already signed up.
From leadingindia.ai in your inbox.
By submitting this form, you are consenting to receive marketing emails from: Bennett University. You can revoke your consent to receive emails at any time by using the SafeUnsubscribe® link, found at the bottom of every email.