Proximal Policy Optimization Keynotes

Clipping

Clipped Surrogate Objective Plot1

There are 2 subfigures for this plot, illustrating 2 conditions where \(A\gt 0\) and \(A\lt 0\), the case of \(A=0\) is ignored as the clipping surrogate objective would be \(0\) for both \(r_{t}(\theta)\,\hat{A}_{t}\) and \(\mathrm{clip}\bigl(r_{t}(\theta),\,1-\epsilon,\,1+\epsilon\bigr)\,\hat{A}_{t}\bigr)\)

L^{\mathrm{CLIP}}(\theta)
=
\hat{\mathbb{E}}_{t}\Bigl[\,
  \min\bigl(
    r_{t}(\theta)\,\hat{A}_{t},\;
    \mathrm{clip}\bigl(r_{t}(\theta),\,1-\epsilon,\,1+\epsilon\bigr)\,\hat{A}_{t}
  \bigr)
\Bigr]

When \(A\gt 0\) which refers to the left figure, it could be seen that \(L^{\mathrm{CLIP}}(\theta)\) is on first quadrant of the coordinate axis as \(A\gt 0\) and ratio should always larger than \(0\).

  1. Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017).[]

Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *