Weight-sparse transformers have interpretable circuits

12 Page 3 test.

sparse models contain small, disentangled circuits that are both understandable and sufficient to perform the behavior.

Overall Setup

Plot of nterpretability versus capability

Test: quotation |

  1. Understanding neural networks through sparse circuits[]
  2. Gao, Leo, Achyuta Rajaram, Jacob Coxon, Soham V. Govande, Bowen Baker, and Dan Mossing. “Weight-sparse transformers have interpretable circuits.” arXiv preprint arXiv:2511.13653 (2025).[]

Posted

in

, ,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *