All Questions

与标签
4 questions
Filter by
排序
与标签
2
votes
0answers
69 views

Is this the correct gradient for log of softmax? [duplicate]

I am currently implementing the very basic version (REINFORCE) of the Monte Carlo policy gradient algorithm. I was wondering if this is the correct gradient for the log of softmax. \begin{align} \...
2
votes
1answer
49 views

How is the log-derivative trick of a trajectory derived?

我期待在这个公式中分解$ P的梯度(\ tau蛋白| \ THETA)美元,第一部分是清楚的是衍生$ \日志(X)$,但我没有看到第一个公式是怎么重新安排到...
3
votes
1answer
52 views

在策略梯度方程,为$ \ PI(A_ {吨} | {S_ T},\ THETA)$ A分布或功能?

I am learning about policy gradient methods from the Deep RL Bootcamp by Peter Abbeel and I am a bit stumbled by the math presented. In the lecture, he derives the gradient logarithm likelihood of a ...
4
votes
1answer
742 views

为什么出现“奖励去”政策梯度法工作的把戏?

In policy gradient method, there's a trick to reduce a variance of policy gradient. We use causality, and remove part of the sum over rewards so that only actions happened after the reward are taken ...