I am a graduate student at Montreal Institute for Learning Algorithms (MILA), advised by Aaron Courville. I work on deep latent variable models and efficient algorithms for approximate inference. I primarily focus on improving the expressiveness of deep probabilistic models, the optimization process of inference, and understanding the training dynamics of generative models in general. Check out a list of reading materials on Bayesian Deep Learning I curated some time ago here. I am also interested in meta learning, statistical learning theory and reinforcement learning.

Some notes I’ve written.



Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization
Flow-based models are powerful tools for designing probabilistic models with tractable density. This paper introduces Convex Potential Flows (CP-Flow), a natural and efficient parameterization of invertible models inspired by the optimal transport (OT) theory. CP-Flows are the gradient map of a strongly convex neural potential function. The convexity implies invertibility and allows us to resort to convex optimization to solve the convex conjugate for efficient inversion. To enable maximum likelihood training, we derive a new gradient estimator of the log-determinant of the Jacobian, which involves solving an inverse-Hessian vector product using the conjugate gradient method. The gradient estimator has constant-memory cost, and can be made effectively unbiased by reducing the error tolerance level of the convex optimization routine. Theoretically, we prove that CP-Flows are universal density approximators and are optimal in the OT sense. Our empirical results show that CP-Flow performs competitively on standard benchmarks of density estimation and variational inference.
Chin-Wei Huang, Ricky TQ Chen, Christos Tsirigotis, Aaron Courville
International Conference on Learning Representations, 2021
RealCause: Realistic Causal Inference Benchmarking
There are many different causal effect estimators in causal inference. However, it is unclear how to choose between these estimators because there is no ground-truth for causal effects. A commonly used option is to simulate synthetic data, where the ground-truth is known. However, the best causal estimators on synthetic data are unlikely to be the best causal estimators on realistic data. An ideal benchmark for causal estimators would both (a) yield ground-truth values of the causal effects and (b) be representative of real data. Using flexible generative models, we provide a benchmark that both yields ground-truth and is realistic. Using this benchmark, we evaluate 66 different causal estimators.
Brady Neal, Chin-Wei Huang, Sunand Raghupathi
Preprint, 2020
A benchmark of medical out of distribution detection
There is a rise in the use of deep learning for automated medical diagnosis, most notably in medical imaging. Such an automated system uses a set of images from a patient to diagnose whether they have a disease. However, systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be filtered out by an Out-of-Distribution Detection (OoDD) method prior to diagnosis. This paper benchmarks popular OoDD methods in three domains of medical imaging: chest x-rays, fundus images, and histology slides. Our experiments show that despite methods yielding good results on some types of out-of-distribution samples, they fail to recognize images close to the training distribution.
Tianshi Cao, Chin-Wei Huang, David Yu-Tung Hui, Joseph Paul Cohen
Preprint, 2020
AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary continuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduce the error of the gradient approximator by approaching asymptotic optimality of a regular DAE, in which case the estimation is in theory unbiased.
Jae Hyun Lim, Aaron Courville, Chris Pal, Chin-Wei Huang
International Conference on Machine Learning, 2020
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate stateof-the-art performance on standard benchmarks of flow-based generative modeling.
Chin-Wei Huang, Laurent Dinh, Aaron Courville
See the workshop paper for additional theoretical development.
Stochastic Neural Network with Kronecker Flow
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville
International Conference on Artificial Intelligence and Statistics, 2020
vGraph: A Generative Model for Joint Community Detection and Node Representation Learning
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. ...
Fan-Yun Sun, Meng Qu, Jordan Hoffmann, Chin-Wei Huang, Jian Tang
Neural Information Processing Systems, 2019
Probability Distillation: A Caveat and Alternatives
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practical workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Chin-Wei Huang*, Faruk Ahmed*, Kundan Kumar, Alexandre Lacoste, Aaron Courville
Association for Uncertainty in Artificial Intelligence, 2019
Hierarchical Importance Weighted Autoencoders
Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.
Chin-Wei Huang, Kris Sankaran, Eeshan Dhekane, Alexandre Lacoste, Aaron Courville
International Conference on Machine Learning, 2019
Improving Explorability in Variational Inference with Annealed Variational Objectives
Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.
Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville
Neural Information Processing Systems, 2018
Neural Autoregressive Flows 
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Chin-Wei Huang*, David Krueger*, Alexandre Lacoste, Aaron Courville
International Conference on Machine Learning, 2018 (LONG TALK!)
Neural Language Modeling by Jointly Learning Syntax and Lexicon
We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville
International Conference on Learning Representations, 2018
Older Pre-prints
  • Generating Contradictory, Neutral, and Entailing Sentences [arXiv]
    • Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville
  • Bayesian Hypernetworks [arXiv] [openreview] [DLRL video] [bib]
    • David Krueger*, Chin-Wei Huang*, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville
    • presented in the Deep Learning and Reinforcement Learning Summer School (17’)
    • presented in the Montreal AI Symposium (17’)
    • presented in the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models [deepdiffeq, arXiv]
    • Chin-Wei Huang, Laurent Dinh, Aaron Courville
    • presented as a contributed talk at the ICLR (‘20) workshop on Integration of Deep Neural Models and Differential Equations (deepdiffeq)
  • PAC Bayes Bound Minimization via Normalizing Flows [Generalization, arXiv]
    • Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste and Aaron Courville
    • presented at the ICML (‘19) workshop on Understanding and Improving Generalization in Deep Learning (Generalization)
  • Facilitating Multimodality in Normalizing Flows [BDL]
    • Chin-Wei Huang*, David Krueger*, Aaron Courville
    • presented at the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Sequentialized Sampling Importance Resampling and Scalable IWAE [BDL] [bib]
    • Chin-Wei Huang, Aaron Courville
    • presented at the NIPS (‘17) workshop on Bayesian Deep Learning (BDL)
  • Learnable Explicit Density for Continuous Latent Space and Variational Inference [arXiv] [padl] [poster] [bib]
    • Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville
    • presented at the ICML (‘17)  workshop on Principle Approaches to Deep Learning (padl)
  • Deconstructive Defense Against Adversarial Attacks [poster]
    • Chin-Wei Huang*, Nan Rosemary Ke*, Chris Pal
    • presented in the Montreal AI Symposium (17’)
  • Data Imputation with Latent Variable Models
    • Michal Drozdzal, Mohammad Havaei, Chin-Wei Huang, Laurent Charlin, Nicolas Chapados, Aaron Courville
    • presented in the Montreal AI Symposium (17’)
Technical reports
  • Multilabel Topic Model and User-Item Representation for Personalized Display of Review [report]
    • Chin-Wei Huang, Pierre-André Brousseau
    • A final project report for IFT6266 (Probabilistic Graphical Models, 2016A)


  • Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models [recording starting from 34:30]
    • Presented @ the Integration of Deep Neural Models and Differential Equations work (ICLR2020)
  • Roadmap to expressive flows [slides]
    • Presented @ Google Cambridge Perception Team, October 2019
  • Go with the flow: recent advances in invertible models
    • Presented @ Element AI, May 2019
  • Autoregressive Flows for Image Generation and Density Estimation [slides]
    • Presented at 2018 AI Summer School: Vision and Learning @ NTHU
    • Presented at Speech Processing and Machine Learning Lab @ NTU  

Teaching Experience

Teaching assistantship
  • 2020 IFT6135 (UdeM) Representation Learning (head TA) [course site]
  • 2019 IFT6135 (UdeM) Representation Learning (head TA) [course site]
  • 2018 IFT6135 (UdeM) Representation Learning [course site]
  • Normalizing Flows [slides] [lecture page]
    • 20/03/2019 IFT6135 (UdeM) Representation Learning
  • Variational Autoencoders [slides] [lecture page]
    • 18/03/2019 IFT6135 (UdeM) Representation Learning
  • PyTorch Tutorial [notebooks] [lecture page]
    • 21/01/2019 IFT6135 (UdeM) Representation Learning
  • Variational Autoencoders [slides] [lecture page]
    • 21/03/2018 IFT6135 (UdeM) Representation Learning