Reinforcement Learning Game

An overview and explanation of how Reinforcement Learning works

Project Overview

Reinforcement learning (RL) was a concept that continues to be voiced in the common 'AI narrative'. I wanted to explore the underlying concepts and fundamentals to this style of machine learning. This project had more constraints than the other projects I've completed, especially in regards to the number of choices of environments I could use.

Approach

RL relies on an environment for the agent (the gnome, in this case) to learn from. The underlying concept is that for each action the agent takes, a reward (either positive or negative) is provided. These environments are complex to build - and frankly outside of the scope of this project. Thus, I utilized a common environment that many courses teach from, but I added in the extra map randomization and optimal path visuals to better educate on what is actually happening in this training.

Directions

Simply navigate below to find the initial state for the agent. Feel free to keep the map at 8x8 or change to 4x4 or 16x16 (this option takes longer). You can also randomize the map to change the locations of the holes in the lake. Then select 'Start Training' where the agent (the gnome) will be seen moving around the map trying to get to the finish line. You'll even see when it falls into the water. The optimized path will appear once the agent has learned and tested that route.

Key Learnings

  • RL has very specific use cases and should be examined for applicability before starting to solution with a product
  • Many of the concepts of RL extend to my doctoral research on Psychological Safety. The underlying concept is that an agent (person or computer) is motivated by an expected reward (or punishment) which will influence future behavior
  • This style of ML, when paired with the visualization, felt the most 'real' to me to showcase exactly what is being learned vs my other projects

Challenges Overcome

  • I tried many different environments, including Car Racing and Tennis games but those ultimately had too many constraints when it came to rendering the UI while the model was being trained
  • I chose to use a policy-free (meaning I did not define the optimal path), which introduced more challenges of when to start or stop the model

Future Applications

  • RL seems to fit best in the gaming space or robotics where environments are clearly defined as discrete or continuous
  • There could be applicability in other spaces, such as education, but there needs to be a clear expected outcome, such as on tests, to identify learning gaps

Frozen Lake RL Training

Model Ready

Environment Controls

Training Controls

Training Parameters

0.1
0.99
0.1

Training Progress

Average Reward

0.00

Episodes

0

Success Rate

0%