Environment
In the first chapter, we looked into the different environments provided by the OpenAI Gym toolkit. You might have been wondering why they were called environments instead of problems, or tasks, or something else. Now that you have progressed to this chapter, does it ring a bell in your head?
The environment is the platform that represents the problem or task that we are interested in, and with which the agent interacts. The following diagram shows the general reinforcement learning paradigm at the highest level of abstraction:
At each time step, denoted by , the agent receives an observation from the environment and then executes an action , for which it receives a scalar reward back from the environment, along with the next observation , and then this process repeats until a terminal state is reached. What is an observation and what is a state? Let's look into that next.