This link1 contains Ilya Sutskever’s2 curated machine learning paper list. The following tweets show the original story behind this list.
I modified the sequence so that similar ones are grouped together, yet the original index with digit emojis is placed in the beginning of the paper title for your own reference, although I don’t think there is a specific reason or I don’t know the reason why the order of the list is arranged as follows. There are 27 papers and materials at the moment, I added additional placeholders in case this list is appended with additional references.
Most of the papers are authors from Google Brain. Other than papers, some are blog posts and there is one item for a CS course. To my surprise, not all papers have a large number of citations. I will try to learn from the blog posts and hope to write high quality blogs like that.
Ilya’s List
- 0️⃣1️⃣ The Annotated Transformer3
- 0️⃣2️⃣ The First Law of Complexodynamics4
- 0️⃣3️⃣ The Unreasonable Effectiveness of Recurrent Neural Networks5
- 0️⃣4️⃣ Understanding LSTM Networks6
- 0️⃣5️⃣ Recurrent Neural Network Regularization7
- 0️⃣6️⃣ Keeping Neural Networks Simple by Minimizing the Description Length of the Weights8
- 0️⃣7️⃣ Pointer Networks9
- 0️⃣8️⃣ ImageNet Classification with Deep Convolutional Neural Networks10
- 0️⃣9️⃣ Order Matters: Sequence to sequence for sets11
- 1️⃣0️⃣ GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism12
- 1️⃣1️⃣ Deep Residual Learning for Image Recognition13
- 1️⃣2️⃣ Multi-Scale Context Aggregation by Dilated Convolutions14
- 1️⃣3️⃣ Neural Message Passing for Quantum Chemistry15
- 1️⃣4️⃣ Attention is All you Need16
- 1️⃣5️⃣ Neural Machine Translation by Jointly Learning to Align and Translate17
- 1️⃣6️⃣ Identity Mappings in Deep Residual Networks18
- 1️⃣7️⃣ A simple neural network module for relational reasoning19
- 1️⃣8️⃣ Variational Lossy Autoencoder20
- 1️⃣9️⃣ Relational recurrent neural networks21
- 2️⃣0️⃣ Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton22
- 2️⃣1️⃣ Neural Turing Machines23
- 2️⃣2️⃣ Deep Speech 2: End-to-End Speech Recognition in English and Mandarin24
- 2️⃣3️⃣ Scaling Laws for Neural Language Models25
- 2️⃣4️⃣ A Tutorial Introduction to the Minimum Description Length Principle26
- 2️⃣5️⃣ Machine super intelligence27
- 2️⃣6️⃣ Kolmogorov Complexity and Algorithmic Randomness28
- 2️⃣7️⃣ CS231n Convolutional Neural Networks for Visual Recognition29
- 2️⃣8️⃣ 30
- 2️⃣9️⃣ 30
- 3️⃣0️⃣ 30
- 3️⃣1️⃣ 30
- 3️⃣2️⃣ 30
- 3️⃣3️⃣ 30
- 3️⃣4️⃣ 30
- 3️⃣5️⃣ 30
- 3️⃣6️⃣ 30
- 3️⃣7️⃣ 30
- 3️⃣8️⃣ 30
- 3️⃣9️⃣ 30
- 4️⃣0️⃣ 30
An Reinforcement Learning List
This is a bonus section, where OpenAI released paper list for reinforcement learning31.
- Ilya 30u30[↩]
- Google Scholar[↩]
- The Annotated Transformer[↩]
- The First Law of Complexodynamics[↩]
- The Unreasonable Effectiveness of Recurrent Neural Networks[↩]
- colah’s blog, Understanding LSTM Networks, August 27, 2015[↩]
- Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. “Recurrent neural network regularization.” arXiv preprint arXiv:1409.2329 (2014). 📁[↩]
- Hinton, Geoffrey E., and Drew Van Camp. “Keeping the neural networks simple by minimizing the description length of the weights.” Proceedings of the sixth annual conference on Computational learning theory. 1993. 📁[↩]
- Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. “Pointer networks.” Advances in neural information processing systems 28 (2015). 📁[↩]
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012). 📁[↩]
- Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. “Order matters: Sequence to sequence for sets.” arXiv preprint arXiv:1511.06391 (2015). 📁[↩]
- Huang, Yanping, et al. “GPipe: Easy Scaling with Micro-Batch Pipel ine Parallelism.” proceeding of Computer Science> Computer Vision and Pattern Recognition (2019). 📁[↩]
- He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. 📁[↩]
- Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.” arXiv preprint arXiv:1511.07122 (2015). 📁[↩]
- Gilmer, Justin, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. “Neural message passing for quantum chemistry.” In International conference on machine learning, pp. 1263-1272. PMLR, 2017. 📁[↩]
- Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017). 📁[↩]
- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014). 📁[↩]
- He, Kaiming, et al. “Identity mappings in deep residual networks.” Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer International Publishing, 2016. 📁[↩]
- Santoro, Adam, et al. “A simple neural network module for relational reasoning.” Advances in neural information processing systems 30 (2017). 📁[↩]
- Chen, Xi, et al. “Variational lossy autoencoder.” arXiv preprint arXiv:1611.02731 (2016). 📁[↩]
- Santoro, Adam, et al. “Relational recurrent neural networks.” Advances in neural information processing systems 31 (2018). 📁[↩]
- Aaronson, Scott, Sean M. Carroll, and Lauren Ouellette. “Quantifying the rise and fall of complexity in closed systems: the coffee automaton.” arXiv preprint arXiv:1405.6903 (2014). 📁[↩]
- Graves, Alex, Greg Wayne, and Ivo Danihelka. “Neural turing machines.” arXiv preprint arXiv:1410.5401 (2014). 📁[↩]
- Amodei, Dario, et al. “Deep speech 2: End-to-end speech recognition in english and mandarin.” International conference on machine learning. PMLR, 2016 📁[↩]
- Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., … & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. 📁[↩]
- Grünwald, Peter. “Minimum description length tutorial.” (2005). 📁[↩]
- Legg, Shane. “Machine super intelligence.” (2008). 📁[↩]
- Shen, Alexander, Vladimir A. Uspensky, and Nikolay Vereshchagin. Kolmogorov complexity and algorithmic randomness. Vol. 220. American Mathematical Society, 2022. 📁[↩]
- CS231n Convolutional Neural Networks for Visual Recognition 📁[↩]
- 📁[↩][↩][↩][↩][↩][↩][↩][↩][↩][↩][↩][↩][↩]
- Key Papers in Deep RL[↩]
Leave a Reply