Hands-On Intelligent Agents with OpenAI Gym
上QQ阅读APP看书,第一时间看更新

Model

A model is an agent's representation of the environment. It is similar to the mental models we have about people and things around us. An agent uses its model of the environment to predict what will happen next. There are two key pieces to it:

  • : The state transition model/probability
  • : The reward model

The state transition model  is a probability distribution or a function that predicts the probability of ending up in a state  in the next time step  given the state  and the action  at time step . Mathematically, it is expressed as follows:

The agent uses the reward model  to predict the immediate next reward that it would get if it were to take action  while in state  at time step .  This expectation of the reward at the next time step  can be mathematically expressed as follows: