Labyrant

This project uses Q-learning to train a agent to find the fastest way through a labyrant. The agent can in each field choose to move in one of the four directions as shown bellow

After the agent has moved in one of the four directions, it will gain a "reward" based on where it moved to. If it moved to another empty field, would it received a slightly negative reward. If it moved into a wall, would it receive a more negative reward than if it had moved to another empty field. When it finally moves to the field where the goal is, will it finally gain a positive reward. This reward system is pictured below.

This reward system does bring up some interesting questions, that I can give short answers to. The reason why the agent get a slightly negative reward for moving to a empty field is to encourage the agent to move as quickly as possible to the goal. If it neither gained nor losed anything from moving to a field, would it basically be a 50/50 where or not the agent would take the quickest route to the goal.

Another question that is worth asking is, how will the agent ever learn the route to the goal, if it only gains a reward by moving from an empty field to the goal itself. That question can be answered with Q-learning, which teaches the agent the value of moving from one field to another by using the best possible reward that can be gained from that field.