Hands-On Intelligent Agents with OpenAI Gym

上QQ阅读APP看书，第一时间看更新

SARSA and Q-learning

It is also very useful for an agent to learn the action value function , which informs the agent about the long-term value of taking action in state so that the agent can take those actions that will maximize its expected, discounted future reward. The SARSA and Q-learning algorithms enable an agent to learn that! The following table summarizes the update equation for the SARSA algorithm and the Q-learning algorithm

SARSA is so named because of the sequence State->Action->Reward->State'->Action' that the algorithm's update step depends on. The description of the sequence goes like this: the agent, in state S, takes an action A and gets a reward R, and ends up in the next state S', after which the agent decides to take an action A' in the new state. Based on this experience, the agent can update its estimate of Q(S,A).

Q-learning is a popular off-policy learning algorithm, and it is similar to SARSA, except for one thing. Instead of using the Q value estimate for the new state and the action that the agent took in that new state, it uses the Q value estimate that corresponds to the action that leads to the maximum obtainable Q value from that new state, S'.

本周热推：

CorelDRAW X4中文版平面设计50例计算机图形图像处理：Photoshop CS3 微机原理与接口技术（基于32位机）Web Services应用开发系统与服务监控技术实践