I am a graduate student at Montreal Institute for Learning Algorithms (MILA), advised by Aaron Courville. My research is mostly about Deep Latent Variable models and efficient approximate inference. My recent focus is about improving expressivity in doing variational inference (see our ICML18 paper NAF!), the optimization process of inference (our NeuIPS18 paper AVO!), and understanding the training dynamics of generative models in general. I am also interested in meta learning, statistical learning theory and reinforcement learning. Here’s my one-page CV and google scholar page.


  • Our paper Probability Distillation: A Caveat and Alternatives was presented at UAI 2019
  • We organized a workshop on Invertible Neural Nets and Normalizing Flows (INNF) at ICML this year
  • Our paper Hierarchical Importance Weighted Autoencoders (H-IWAE) was presented at ICML 2019
  • Our paper PAC Bayes Bound Minimization via Normalizing Flows was presented at the ICML 2019 workshop on Understanding and Improving Generalization in Deep Learning


Probability Distillation: A Caveat and Alternatives
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practical workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Chin-Wei Huang*, Faruk Ahmed*, Kundan Kumar, Alexandre Lacoste, Aaron Courville
Association for Uncertainty in Artificial Intelligence, 2019
Hierarchical Importance Weighted Autoencoders [arXiv]
Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.
Chin-Wei Huang, Kris Sankaran, Eeshan Dhekane, Alexandre Lacoste, Aaron Courville
International Conference on Machine Learning, 2019
Improving Explorability in Variational Inference with Annealed Variational Objectives [arXiv]
Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.
Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville
Neural Information Processing Systems, 2018
Neural Autoregressive Flows [arXiv] [bib] [slides]
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Chin-Wei Huang*, David Krueger*, Alexandre Lacoste, Aaron Courville
International Conference on Machine Learning, 2018 (LONG TALK!)
Neural Language Modeling by Jointly Learning Syntax and Lexicon [arXiv] [openreview] [bib]
We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville
International Conference on Learning Representations, 2018
  • Generating Contradictory, Neutral, and Entailing Sentences [arXiv]
    • Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville
  • Bayesian Hypernetworks [arXiv] [openreview] [DLRL video] [bib]
    • David Krueger*, Chin-Wei Huang*, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville
    • presented in the Deep Learning and Reinforcement Learning Summer School (17’)
    • presented in the Montreal AI Symposium (17’)
    • presented in the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • PAC Bayes Bound Minimization via Normalizing Flows [Generalization, arXiv]
    • Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville
    • presented at the ICML (‘19) workshop on Understanding and Improving Generalization in Deep Learning (Generalization)
  • Facilitating Multimodality in Normalizing Flows [BDL]
    • Chin-Wei Huang*, David Krueger*, Aaron Courville
    • presented at the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Sequentialized Sampling Importance Resampling and Scalable IWAE [BDL] [bib]
    • Chin-Wei Huang, Aaron Courville
    • presented at the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Learnable Explicit Density for Continuous Latent Space and Variational Inference [arXiv] [padl] [poster] [bib]
    • Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville
    • presented at the ICML (‘17)  workshop on Principle Approaches to Deep Learning (padl)
  • Deconstructive Defense Against Adversarial Attacks [poster]
    • Chin-Wei Huang*, Nan Rosemary Ke*, Chris Pal
    • presented in the Montreal AI Symposium (17’)
  • Data Imputation with Latent Variable Models
    • Michal Drozdzal, Mohammad Havaei, Chin-Wei Huang, Laurent Charlin, Nicolas Chapados, Aaron Courville
    • presented in the Montreal AI Symposium (17’)
Technical reports
  • Multilabel Topic Model and User-Item Representation for Personalized Display of Review [report]
    • Chin-Wei Huang, Pierre-André Brousseau
    • A final project report for IFT6266 (Probabilistic Graphical Models, 2016A)


  • Autoregressive Flows for Image Generation and Density Estimation [slides]
    • Presented at 2018 AI Summer School: Vision and Learning @ NTHU
    • Presented at Speech Processing and Machine Learning Lab @ NTU