Bayesian Deep Learning (BDL) Reading List
This page maintains a collection of papers/resources in different categories related to Bayesian Deep Learning & Deep Bayesian Learning (see YW Teh’s talk on the dichotomy). (last updated: 2018/8)
Some books
- PGM Probabilistic Graphical Models: Principles and Techniques, Koller and Friedman 2009
- PRML Pattern Recognition and Machine Learning, Bishop 2006
- MCTME Monte Carlo theory, methods and examples, Owen (book in progress)
Core
A generic formula for models with latent variables:
- PGM Chapter 19
Markov Chain Monte Carlo (MCMC) theory and classic algorithms:
- PGM Chapter 12
- PRML Chapter 11
Hamiltonian Monte Carlo (HMC):
- MCMC using Hamiltonian dynamics, Neal 2012
- A Conceptual Introduction to Hamiltonian Monte Carlo, Betancourt 2017
Expectation Maximization (EM) and Variational Inference (VI):
- PRML Chapter 9, 10.1-10.6
- Variational Inference: A Review for Statisticians, Blei et al. 2016
- Graphical Models, Exponential Families, and Variational Inference, Wainwright and Jordan 2008
- An Introduction to Variational Methods for Graphical Models, Jordan et al. 1999
Amortized Variational Inference and Reparameterization Trick:
- Auto-Encoding Variational Bayes, Kingma and Welling 2013
- Stochastic Backpropagation and Approximate Inference in Deep Generative Models, Rezende et al. 2014
- The Generalized Reparameterization Gradient, Ruiz et al. 2016
- Inference Suboptimality in Variational Autoencoders, Cremer et al. 2018
- Forward Amortized Inference for Likelihood-Free Variational Marginalization, Ambrogioni et al. 2018
Hierarchical Variational Methods:
- An Auxiliary Variational Method, Agakov and Barber 2004
- Hierarchical Variational Models, Ranganath et al. 2015
- Auxiliary Deep Generative Models, Maaløe et al. 2016
- Markov Chain Monte Carlo and Variational Inference: Bridging the Gap, Salimans et al. 2014
- Variational Inference with Normalizing Flows, Rezende and Mohamed 2015
- The Variational Gaussian Process, Tran et al. 2015
Variance Reduction in VI:
- MCTME Chapter 8, 10
- Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference, Roeder et al. 2017
- Reducing Reparameterization Gradient Variance, Miller et al. 2017
- Quasi-Monte Carlo Variational Inference, Buchholz et al. 2018
Expectation Propagation (EP):
- PRML Chapter 10.7
- PGM Chapter 11.4
- Proofs of Alpha Divergence Properties (lecture note), Cevher 2008
- Divergence Measures and Message Passing, Minka 2005
Deep State Space Models
- A Recurrent Latent Variable Model for Sequential Data, Chung et al. 2015
- Deep Kalman Filters, Krishnan et al. 2015
- Filtering Variational Objectives, Maddison et al. 2017
- Variational Sequential Monte Carlo, Naesseth et al. 2017
- Auto-Encoding Sequential Monte Carlo, Le et al. 2017
- Variational Bi-LSTMs, Shabanian et al. 2017
Normalizing Flows
- MCTME Chapter 4
- Variational Inference with Normalizing Flows, Rezende and Mohamed 2015
- Improving Variational Inference with Inverse Autoregressive Flow, Kingma et al. 2016
- Improving Variational Auto-Encoders using Householder Flow, Tomczak and Welling 2016
- Improving Variational Auto-Encoders using Convex Combination Linear Inverse Autoregressive Flow, Tomczak and Welling 2017
- Sylvester Normalizing Flows for Variational Inference, Berg et al. 2018
- Neural Autoregressive Flows, Huang et al. 2018
- Density Estimation using Real NVP, Dinh et al. 2016
- Glow: Generative Flow with Invertible 1x1 Convolutions, Kingma and Dhariwal 2018
- Neural Ordinary Differential Equations, Chen et al. 2018
Importance Weighted Autoencoder
- Importance Weighted Autoencoders, Burda et al. 2015
- Reinterpreting Importance-Weighted Autoencoders, Cremer et al. 2017
- Sequentialized Sampling Importance Resampling and Scalable IWAE, Huang and Courville 2018
- Tighter Variational Bounds are Not Necessarily Better, Rainforth et al. 2018
- On Nesting Monte Carlo Estimators, Rainforth et al. 2018
- Debiasing Evidence Approximations: On Importance-weighted Autoencoders and Jackknife Variational Inference, Nowozin 2018
Implicit Inference
- Adversarially Learned Inference, Dumoulin et al. 2016
- Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks, Mescheder et al. 2017
- Variational Inference using Implicit Distributions, Huszar 2017
Transfer Learning and Semisupervised Learning
- Semi-Supervised Learning with Deep Generative Models, Kingma et al. 2014
- Towards a Neural Statistician, Edwards and Storkey 2016
- One-Shot Generalization in Deep Generative Models, Rezende et al. 2016
- Uncertainty in Multitask Transfer Learning, Lacoste et al. 2018
- Conditional Neural Processes, Garnelo et al. 2018
- Neural Processes, Garnelo et al. 2018
Representation Learning
- Ladder Variational Autoencoders, Sønderby et al. 2016
- PixelVAE: A Latent Variable Model for Natural Images, Gulrajani et al. 2016
- Variational Lossy Autoencoder, Chen et al. 2016
- Generating Sentences from a Continuous Space, Bowman et al. 2015
- Generating Sentences by Editing Prototypes, Guu et al. 2017
- The Variational Fair Autoencoder, Louizos et al. 2015
- VAE with a VampPrior, Tomczak and Welling 2017
- Hierarchical VampPrior Variational Fair Auto-Encoder, Botros and Tomczak 2018
- Neural Relational Inference for Interacting Systems, Kipf et al. 2018
- Hyperspherical Variational Auto-Encoders, Davidson et al. 2018
- Neural Scene Representation and Rendering, Eslami et al. 2018
Disentanglement in Deep Representations
- Emergence of Invariance and Disentanglement in Deep Representations, Achille and Soatto 2017
- Early Visual Concept Learning with Unsupervised Deep Learning, Higgins et al. 2016
- β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, Higgins et al. 2017
- Isolating Sources of Disentanglement in Variational Autoencoders, Chen et al. 2018
- Understanding Disentangling in β-VAE, Burgess et al. 2018
Memory Addressing, Localization and Inference
- Learning to Generate with Memory, Li et al. 2016
- Generative Temporal Models with Memory, Gemici et al. 2017
- Variational Memory Addressing in Generative Models, Bornschein et al. 2017
- The Kanerva Machine: A Generative Distributed Memory, Wu et al. 2018
- Generative Temporal Models with Spatial Memory for Partially Observed Environments, Fraccaro et al. 2018
Discrete Latent Variable
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
- Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation, Bengio et al. 2013
- Neural Variational Inference and Learning in Belief Networks, Mnih and Gregor 2014
- MuProp: Unbiased Backpropagation for Stochastic Neural Networks, Gu et al. 2015
- Gradient Estimation Using Stochastic Computation Graphs, Schulman et al. 2015
- Variational Inference for Monte Carlo Objectives, Mnih and Rezende 2016
- The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, Maddison et al. 2016
- Categorical Reparameterization with Gumbel-Softmax, Jang et al. 2016
- The Generalized Reparameterization Gradient, Ruiz et al. 2016
- REBAR: Low-variance, Unbiased Gradient Estimates for Discrete Latent Variable Models, Tucker et al. 2017
- Backpropagation Through the Void: Optimizing Control Variates for Black-Box Gradient Estimation, Grathwohl et al. 2017
Bayesian Deep Neural Networks (Variational Approaches)
- Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks, Hernández-Lobato and Adams 2015
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, Gal and Ghahramani 2015
- Weight Uncertainty in Neural Networks, Blundell et al. 2015
- Variational Dropout and the Local Reparameterization Trick, Kingma et al. 2015
- Dropout Inference in Bayesian Neural Networks with Alpha-divergences, Yingzhen and Gal 2017
- Multiplicative Normalizing Flows for Variational Bayesian Neural Networks, Louizos and Welling 2017
- Bayesian Hypernetworks, Krueger et al. 2017
- Deep Prior, Lacoste et al. 2017
- Variational Gaussian Dropout is not Bayesian, Hron et al. 2017
- Variational Bayesian dropout: pitfalls and fixes, Hron et al. 2018
- Noisy Natural Gradient as Variational Inference, Zhang et al. 2018
Bayesian Compression
- Bayesian Compression for Deep Learning, Louizos et al. 2017
- Improved Bayesian Compression, Federici et al. 2017
- Variational Dropout Sparsifies Deep Neural Networks, Molchanov et al. 2017
- Learning Sparse Neural Networks through L0 Regularization, Louizos et al. 2018
- Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors, Ghosh et al. 2018
Bayesian Deep Neural Networks (MCMC Approaches)
- Bayesian Learning via Stochastic Gradient Langevin Dynamics, Welling and Teh 2011
- Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring, Ahn et al. 2012
- Stochastic Gradient Hamiltonian Monte Carlo, Chen et al. 2014
- Bayesian Sampling Using Stochastic Gradient Thermostats, Ding et al. 2014
- Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks, Li et al 2015
- Entropy-SGD: Biasing Gradient Descent Into Wide Valleys, Chaudhari et al. 2017
- Adversarial Distillation of Bayesian Neural Network Posteriors, Wang et al. 2018
Deep neural networks = Gaussian Process
- Priors for Infinite Network, Neal 1994
- Bayesian Learning for Neural Networks, Neal 1995
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, Gal and Ghahramani 2015
- Avoiding Pathologies in Very Deep Networks, Duvenaud et al. 2016
- Deep Neural Networks as Gaussian Processes, Lee et al. 2018
SGD / Approximate Inference / PAC-Bayes
- PAC-Bayesian Theory Meets Bayesian Inference, Germain et al. 2016
- Stochastic Gradient Descent as Approximate Bayesian Inference, Mandt et al. 2017
- Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks, Chaudhari and Soatto 2017
- Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints, Mou et al. 2017
- Entropy-SGD Optimizes the Prior of a PAC-Bayes Bound: Generalization properties of Entropy-SGD and data-dependent priors, Dziugaite and Roy 2017
- A Bayesian Perspective on Generalization and Stochastic Gradient Descent, Smith and Le 2018