In 1, the division of \(d_{\rho}\).
\begin{align*}
\rho^{\text{raw}}_{\alpha\beta}
&\;=\; \mathbf{q}_\alpha \cdot \mathbf{k}_\beta
\;=\; \sum_{i=1}^{d_\rho} (q_\alpha)_i \,(k_\beta)_i,
\quad (q_\alpha)_i,\;(k_\beta)_i \;\stackrel{\text{i.i.d.}}{\sim}\; \mathcal{N}(0,\,1) \\[8pt]
%
\mu\!\left(\rho^{\text{raw}}_{\alpha\beta}\right)
&\;=\; \mu\!\left(\sum_{i=1}^{d_\rho} (q_\alpha)_i (k_\beta)_i\right)
\;=\; \sum_{i=1}^{d_\rho} \mu\!\left((q_\alpha)_i (k_\beta)_i\right)
\;=\; \sum_{i=1}^{d_\rho} \mu\!\left((q_\alpha)_i\right) \cdot \mu\!\left((k_\beta)_i\right)
\;=\; 0 \\[8pt]
%
\sigma^2\!\left(\rho^{\text{raw}}_{\alpha\beta}\right)
&\;=\; \sigma^2\!\left(\sum_{i=1}^{d_\rho} (q_\alpha)_i (k_\beta)_i\right)
\;=\; \sum_{i=1}^{d_\rho} \sigma^2\!\left((q_\alpha)_i (k_\beta)_i\right)
\;=\; \sum_{i=1}^{d_\rho} \sigma^2\!\left((q_\alpha)_i\right) \cdot \sigma^2\!\left((k_\beta)_i\right)
\;=\; d_\rho \\[8pt]
\end{align*}Hadamard product \(\circ\) Element-wise matrix multiplication, Frobenius Inner Product \(\langle A, B \rangle_F\) or \(A \cdot B\), derivative \(\langle \frac{\partial{L}}{\partial{W}}, dW \rangle_F\).
Leave a Reply