I am a graduate student at Montreal Institute for Learning Algorithms (MILA), advised by Aaron Courville. My research is mostly about Deep Latent Variable models and efficient approximate inference. My recent focus is about improving expressivity of deep probabilistic models, the optimization process of inference, and understanding the training dynamics of generative models in general. Check out a list of reading materials on Bayesian Deep Learning I curated some time ago here if you want to collaborate or have a chat. I am also interested in meta learning, statistical learning theory and reinforcement learning.

Some notes I’ve written.



AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary continuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduce the error of the gradient approximator by approaching asymptotic optimality of a regular DAE, in which case the estimation is in theory unbiased.
Jae Hyun Lim, Aaron Courville, Chris Pal, Chin-Wei Huang
International Conference on Machine Learning, 2020
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate stateof-the-art performance on standard benchmarks of flow-based generative modeling.
Chin-Wei Huang, Laurent Dinh, Aaron Courville
Under review
See the workshop paper for additional theoretical development.
Stochastic Neural Network with Kronecker Flow
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville
International Conference on Artificial Intelligence and Statistics, 2020
vGraph: A Generative Model for Joint Community Detection and Node Representation Learning
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. ...
Fan-Yun Sun, Meng Qu, Jordan Hoffmann, Chin-Wei Huang, Jian Tang
Neural Information Processing Systems, 2019
Probability Distillation: A Caveat and Alternatives
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practical workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Chin-Wei Huang*, Faruk Ahmed*, Kundan Kumar, Alexandre Lacoste, Aaron Courville
Association for Uncertainty in Artificial Intelligence, 2019
Hierarchical Importance Weighted Autoencoders
Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.
Chin-Wei Huang, Kris Sankaran, Eeshan Dhekane, Alexandre Lacoste, Aaron Courville
International Conference on Machine Learning, 2019
Improving Explorability in Variational Inference with Annealed Variational Objectives
Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.
Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville
Neural Information Processing Systems, 2018
Neural Autoregressive Flows 
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Chin-Wei Huang*, David Krueger*, Alexandre Lacoste, Aaron Courville
International Conference on Machine Learning, 2018 (LONG TALK!)
Neural Language Modeling by Jointly Learning Syntax and Lexicon
We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville
International Conference on Learning Representations, 2018
  • Generating Contradictory, Neutral, and Entailing Sentences [arXiv]
    • Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville
  • Bayesian Hypernetworks [arXiv] [openreview] [DLRL video] [bib]
    • David Krueger*, Chin-Wei Huang*, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville
    • presented in the Deep Learning and Reinforcement Learning Summer School (17’)
    • presented in the Montreal AI Symposium (17’)
    • presented in the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models [deepdiffeq, arXiv]
    • Chin-Wei Huang, Laurent Dinh, Aaron Courville
    • presented as a contributed talk at the ICLR (‘20) workshop on Integration of Deep Neural Models and Differential Equations (deepdiffeq)
  • PAC Bayes Bound Minimization via Normalizing Flows [Generalization, arXiv]
    • Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville
    • presented at the ICML (‘19) workshop on Understanding and Improving Generalization in Deep Learning (Generalization)
  • Facilitating Multimodality in Normalizing Flows [BDL]
    • Chin-Wei Huang*, David Krueger*, Aaron Courville
    • presented at the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Sequentialized Sampling Importance Resampling and Scalable IWAE [BDL] [bib]
    • Chin-Wei Huang, Aaron Courville
    • presented at the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Learnable Explicit Density for Continuous Latent Space and Variational Inference [arXiv] [padl] [poster] [bib]
    • Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville
    • presented at the ICML (‘17)  workshop on Principle Approaches to Deep Learning (padl)
  • Deconstructive Defense Against Adversarial Attacks [poster]
    • Chin-Wei Huang*, Nan Rosemary Ke*, Chris Pal
    • presented in the Montreal AI Symposium (17’)
  • Data Imputation with Latent Variable Models
    • Michal Drozdzal, Mohammad Havaei, Chin-Wei Huang, Laurent Charlin, Nicolas Chapados, Aaron Courville
    • presented in the Montreal AI Symposium (17’)
Technical reports
  • Multilabel Topic Model and User-Item Representation for Personalized Display of Review [report]
    • Chin-Wei Huang, Pierre-André Brousseau
    • A final project report for IFT6266 (Probabilistic Graphical Models, 2016A)


  • Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models [recording starting from 34:30]
    • Presented @ the Integration of Deep Neural Models and Differential Equations work (ICLR2020)
  • Roadmap to expressive flows [slides]
    • Presented @ Google Cambridge Perception Team, October 2019
  • Go with the flow: recent advances in invertible models
    • Presented @ Element AI, May 2019
  • Autoregressive Flows for Image Generation and Density Estimation [slides]
    • Presented at 2018 AI Summer School: Vision and Learning @ NTHU
    • Presented at Speech Processing and Machine Learning Lab @ NTU  

Teaching Experience

Teaching assistantship
  • 2020 IFT6135 (UdeM) Representation Learning (head TA) [course site]
  • 2019 IFT6135 (UdeM) Representation Learning (head TA) [course site]
  • 2018 IFT6135 (UdeM) Representation Learning [course site]
  • Normalizing Flows [slides] [lecture page]
    • 20/03/2019 IFT6135 (UdeM) Representation Learning
  • Variational Autoencoders [slides] [lecture page]
    • 18/03/2019 IFT6135 (UdeM) Representation Learning
  • PyTorch Tutorial [notebooks] [lecture page]
    • 21/01/2019 IFT6135 (UdeM) Representation Learning
  • Variational Autoencoders [slides] [lecture page]
    • 21/03/2018 IFT6135 (UdeM) Representation Learning