MARLAnts

2019

This is my first foray into reinforcement learning! I designed a custom training and testing environment using Pygame and OpenAI Gym, and then used the Q-Learning algorithm for policy learning.

This project was inspired by Radhika Nagpal's Ted Talk, "Taming the Swarm," in which she introduces the idea that agents working towards local goals can achieve global objectives in a group setting.

The end goal of this project is to simulate the behaviour of swarm insects. More specifically, the agents should be able to construct some kind of structure as a group, that they would not be able to complete on their own.

This Pygame + OpenAI Gym environment is a visualization of the construction of a "tower." Here the building blocks are represented in Blue, and the agents in Cyan. The number of blocks an agent is holding is drawn on it. In this setup, the objective is to construct a tower with a height of 5 blocks. Each agent is initialized with 2 blocks. They have two choices at every step: build or move. Their field of vision is the height of the tower in front of them, and they have no explicit way to communicate with one-another.

Q-Learning was used to iteratively build the policy.

What's going on behind the scenes: a Q-learning algorithm was run for 1000 "episodes" on this simple environment. The agents (squares) are simply rewarded for making the tower taller. They also incur a small penalty for walking so that they get to building faster. From this they are able to learn the optimal policy of walking until they encounter a tower, then perform the "build" action until it is completed.