翻訳と辞書
Words near each other
・ State vector
・ State vector (geographical)
・ State Veterinary Service
・ State visit
・ State Water Register
・ State Wayne Theater
・ State Wide Area Network
・ State wildlife trails (United States)
・ State within a state
・ State Your Case
・ State's attorney
・ State's Attorney (film)
・ State's Direct Financial Interest
・ State's Evidence
・ State, County, and Municipal Workers of America
State-Action-Reward-State-Action
・ State-assisted suicide
・ State-building
・ State-centered theory
・ State-corporate crime
・ State-dependent memory
・ State-funded schools (England)
・ State-integrated school
・ State-merging
・ State-of-the-Art Reactor Consequence Analyses
・ State-owned Argentine Railway Companies
・ State-owned Assets Supervision and Administration Commission
・ State-owned enterprise
・ State-owned Enterprises Commission
・ State-owned enterprises of New Zealand


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

State-Action-Reward-State-Action : ウィキペディア英語版
State-Action-Reward-State-Action

State-Action-Reward-State-Action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was introduced in a technical note 〔(Online Q-Learning using Connectionist Systems" by Rummery & Niranjan (1994) )〕 where the alternative name SARSA was only mentioned as a footnote.
This name simply reflects the fact that the main function for updating the Q-value depends on the current state of the agent "S1", the action the agent chooses "A1", the reward "R" the agent gets for choosing this action, the state "S2" that the agent will now be in after taking that action, and finally the next action "A2" the agent will choose in its new state. Taking every letter in the quintuple (st, at, rt, st+1, at+1) yields the word ''SARSA''.〔(Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto (chapter 6.4) )〕
== Algorithm ==
:Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha (+ \gamma Q(s_, a_)-Q(s_t,a_t) )
A SARSA agent will interact with the environment and update the policy based on actions taken, known as an on-policy learning algorithm. As expressed above, the Q value for a state-action is updated by an error, adjusted by the learning rate alpha. Q values represent the possible reward received in the next time step for taking action ''a'' in state ''s'', plus the discounted future reward received from the next state-action observation. Watkin's Q-learning was created as an alternative to the existing temporal difference technique and which updates the policy based on the maximum reward of available actions. The difference may be explained as SARSA learns the Q values associated with taking the policy it follows itself, while Watkin's Q-learning learns the Q values associated with taking the exploitation policy while following an exploration/exploitation policy. For further information on the exploration/exploitation trade off, see reinforcement learning.
Some optimizations of Watkin's Q-learning may also be applied to SARSA, for example in the paper "Fast Online Q(λ)" (Wiering and Schmidhuber, 1998) the small differences needed for SARSA(λ) implementations are described as they arise.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「State-Action-Reward-State-Action」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.