上QQ阅读APP看书,第一时间看更新
Model
A model is an agent's representation of the environment. It is similar to the mental models we have about people and things around us. An agent uses its model of the environment to predict what will happen next. There are two key pieces to it:
- : The state transition model/probability
- : The reward model
The state transition model is a probability distribution or a function that predicts the probability of ending up in a state in the next time step given the state and the action at time step . Mathematically, it is expressed as follows:
The agent uses the reward model to predict the immediate next reward that it would get if it were to take action while in state at time step . This expectation of the reward at the next time step can be mathematically expressed as follows: