Markov decision process について

Words near each other

・ Markotów Mały
・ Markounda
・ Markout, Texas
・ Markov
・ Markov (crater)
・ Markov additive process
・ Markov algorithm
・ Markov blanket
・ Markov brothers' inequality
・ Markov chain
・ Markov chain approximation method
・ Markov chain geostatistics
・ Markov chain mixing time
・ Markov chain Monte Carlo
・ Markov chains on a measurable state space
・ Markov decision process
・ Markov information source
・ Markov kernel
・ Markov logic network
・ Markov model
・ Markov number
・ Markov partition
・ Markov perfect equilibrium
・ Markov process
・ Markov Processes International
・ Markov property
・ Markov random field
・ Markov renewal process
・ Markov reward model
・ Markov Reward Model Checker

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Markov decision process ：ウィキペディア英語版

Markov decision process
Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, ''Dynamic Programming and Markov Processes''. They are used in a wide area of disciplines, including robotics, automated control, economics, and manufacturing.
More precisely, a Markov Decision Process is a discrete time stochastic control process. At each time step, the process is in some state

s

, and the decision maker may choose any action

a

that is available in state

s

. The process responds at the next time step by randomly moving into a new state

s'

, and giving the decision maker a corresponding reward

R_a(s,s')

.
The probability that the process moves into its new state

s'

is influenced by the chosen action. Specifically, it is given by the state transition function

P_a(s,s')

. Thus, the next state

s'

depends on the current state

s

and the decision maker's action

a

. But given

s

and

a

, it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP process satisfies the ''Markov property''.
Markov decision processes are an extension of Markov chains; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state and all rewards are the same (e.g., zero), a Markov decision process reduces to a Markov chain.
==Definition==

A Markov decision process is a 5-tuple

(S,A,P_\cdot(\cdot,\cdot),R_\cdot(\cdot,\cdot),\gamma)

, where
*

S

is a finite set of states,
*

A

is a finite set of actions (alternatively,

A_s

is the finite set of actions available from state

s

),
*

P_a(s,s') = \Pr(s_=s' \mid s_t = s, a_t=a)

is the probability that action

a

in state

s

at time

t

will lead to state

s'

at time

t+1

,
*

R_a(s,s')

is the immediate reward (or expected immediate reward) received after transition to state

s'

from state

s

,
*

\gamma \in ()

is the discount factor, which represents the difference in importance between future rewards and present rewards.
(Note: The theory of Markov decision processes does not state that

S

A

are finite, but the basic algorithms below assume that they are finite.)

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Markov decision process」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース