翻訳と辞書
Words near each other
・ Markotów Mały
・ Markounda
・ Markout, Texas
・ Markov
・ Markov (crater)
・ Markov additive process
・ Markov algorithm
・ Markov blanket
・ Markov brothers' inequality
・ Markov chain
・ Markov chain approximation method
・ Markov chain geostatistics
・ Markov chain mixing time
・ Markov chain Monte Carlo
・ Markov chains on a measurable state space
Markov decision process
・ Markov information source
・ Markov kernel
・ Markov logic network
・ Markov model
・ Markov number
・ Markov partition
・ Markov perfect equilibrium
・ Markov process
・ Markov Processes International
・ Markov property
・ Markov random field
・ Markov renewal process
・ Markov reward model
・ Markov Reward Model Checker


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Markov decision process : ウィキペディア英語版
Markov decision process
Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, ''Dynamic Programming and Markov Processes''. They are used in a wide area of disciplines, including robotics, automated control, economics, and manufacturing.
More precisely, a Markov Decision Process is a discrete time stochastic control process. At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by randomly moving into a new state s', and giving the decision maker a corresponding reward R_a(s,s').
The probability that the process moves into its new state s' is influenced by the chosen action. Specifically, it is given by the state transition function P_a(s,s'). Thus, the next state s' depends on the current state s and the decision maker's action a. But given s and a, it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP process satisfies the ''Markov property''.
Markov decision processes are an extension of Markov chains; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state and all rewards are the same (e.g., zero), a Markov decision process reduces to a Markov chain.
==Definition==

A Markov decision process is a 5-tuple (S,A,P_\cdot(\cdot,\cdot),R_\cdot(\cdot,\cdot),\gamma), where
* S is a finite set of states,
* A is a finite set of actions (alternatively, A_s is the finite set of actions available from state s),
* P_a(s,s') = \Pr(s_=s' \mid s_t = s, a_t=a) is the probability that action a in state s at time t will lead to state s' at time t+1,
*R_a(s,s') is the immediate reward (or expected immediate reward) received after transition to state s' from state s,
*\gamma \in () is the discount factor, which represents the difference in importance between future rewards and present rewards.
(Note: The theory of Markov decision processes does not state that S or A are finite, but the basic algorithms below assume that they are finite.)

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Markov decision process」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.