Clipping
Clipped Surrogate Objective Plot1
There are 2 subfigures for this plot, illustrating 2 conditions where \(A\gt 0\) and \(A\lt 0\), the case of \(A=0\) is ignored as the clipping surrogate objective would be \(0\) for both \(r_{t}(\theta)\,\hat{A}_{t}\) and \(\mathrm{clip}\bigl(r_{t}(\theta),\,1-\epsilon,\,1+\epsilon\bigr)\,\hat{A}_{t}\bigr)\)
L^{\mathrm{CLIP}}(\theta) = \hat{\mathbb{E}}_{t}\Bigl[\, \min\bigl( r_{t}(\theta)\,\hat{A}_{t},\; \mathrm{clip}\bigl(r_{t}(\theta),\,1-\epsilon,\,1+\epsilon\bigr)\,\hat{A}_{t} \bigr) \Bigr]
When \(A\gt 0\) which refers to the left figure, it could be seen that \(L^{\mathrm{CLIP}}(\theta)\) is on first quadrant of the coordinate axis as \(A\gt 0\) and ratio should always larger than \(0\).
- Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017).[↩]
Leave a Reply