All Questions

与标签
4 questions
Filter by
排序
与标签
2
votes
1answer
72 views

How is the expected value in the loss function of DQN approximated?

In Deep Q Learning the parametrized Q-functions $Q_i$ are optimised by performing gradient descent on the series of loss functions $L_i(\theta_i)= E_{(s,a)\sim p}[(y_i-Q(s,a;\theta_i))^2]$ , where ...
0
votes
0answers
38次

Understanding V- and Q-functions

Assume the existence of a Markov Decision Process consisting of: State space $S$ Action space $A$ Transition model $T: S \times A \times S \to [0,1]$ Reward function $R: S \times A \times S \to \...
3
votes
2answers
62次

Why is the max a non-expansive operator?

In certain reinforcement learning (RL) proofs, the operators involved are assumed to be non-expansive. For example, on page 6 of the paper Generalized Markov Decision Processes: Dynamic-programming ...
8
votes
2answers
844点意见

How do we prove the n-step return error reduction property?

In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-...