Fundamentals of Reinforcement Learning: Monte Carlo Algorithm

Part 3: Explaining the fundamental of model-free RL algorithm: Monte Carlo Algorithm

Chao De-Yu
Level Up Coding
Published in
5 min readMay 31, 2021

--

Photo by Lucas Benjamin on Unsplash

Recall the Agent-Environment Interface introduced in Part 1, the observation is the perception of the environment for the agent, the action will change the environment’s state, the reward is a scalar value that indicates how well the agent is doing at step t and the agent’s objective is to maximize the cumulative reward.

Figure 1: The Agent-Environment Interface. Source — A Standford University lecture in CME241.

Reinforcement Learning is different from other machine learning paradigms because the agent’s actions affect the subsequent data it receives, there is no supervision (label data), only a reward signal.

In Part 2, we have computed the value function and find the optimal policy with known transition probability. However, in most of the case transition probability is unknown, and we need to learn the value function and find the optimal policy from experience.

In this article, I will explain the Monte Carlo Algorithm, one of the model-free RL algorithms, using the sampling method.

Monte Carlo

--

--