ADS Machine Learning: Advances in Deep Generative Models & Unsupervised Learning
Amsterdam Data Science Coffee & Data: Machine Learning with a focus on Advances in Deep Generative Models and Unsupervised Learning
Date: Wednesday 25 October 2017
Location: Aula (UvA), Singel 411, Amsterdam
The ADS Meet-up offers the opportunity for researchers and business to share their knowledge and give insight on a central theme, specifically on the morning of Wednesday 25 Octoberthis will be on Machine Learning with a focus on Advances in Deep Generative Models and Unsupervised Learning.
We are delighted to announce that an international line-up of top Machine Learning experts from London, Montreal & New York will present at this ADS Machine Learning Meetup. There will also be the chance to network during the coffee break.
Mini-Symposium on Deep Generative Models and Unsupervised Machine Learning
– Yoshua Bengio, Full Professor of the Department of Computer Science and Operations Research, Université de Montréal, Canada.
Title: From Deep Learning of Disentangled Representations to Higher-Level Cognition
Abstract: One of main challenges for AI remains unsupervised learning, at which humans are much better than machines. We review recent work in deep generative models and propose research directions towards learning of high-level abstractions. This follows the ambitious objective of disentangling the underlying causal factors explaining the observed data. We argue that in order to efficiently capture these, a learning agent can acquire information by acting in the world, moving our research from traditional deep generative models of given datasets to that of autonomous learning or unsupervised reinforcement learning. We propose two priors which could be used by an agent acting in its environment in order to help discover such high-level disentangled representations of abstract concepts. The first one is based on the discovery of independently controllable factors, i.e., in jointly learning policies and representations, such that each of these policies can independently control one aspect of the world (a factor of interest) computed by the representation while keeping the other uncontrolled aspects mostly untouched. The second prior is called the consciousness prior and is based on the observation that our conscious thoughts are low-dimensional objects with a strong predictive or explanatory power (or are very useful for planning). A conscious thought thus selects a few abstract factors (using the attention mechanism which brings these variables to consciousness) and combines them to make a useful statement or prediction. In addition, the concepts brought to consciousness often correspond to words or short phrases and the thought itself can be transformed into a brief linguistic expression, like a sentence. Natural language could thus be used as an additional hint about the abstract representations and disentangled factors which humans have discovered to explain their world. A conscious thought also corresponds to the kind of small nugget of knowledge (like a fact or a rule) which has been the main building block of classical symbolic AI. This therefore raises the interesting possibility of addressing some of the objectives of classical symbolic AI using the deep learning machinery augmented by the architectural elements necessary to implement conscious thinking.
Title: Towards nonlinear independent component analysis
Abstract: Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, e.g. with independent component analysis (ICA) and sparse coding. However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Here, we show that this problem can be solved by using temporal structure. We formulate two generative models in which the data is an arbitrary but invertible nonlinear transformation of time series (components) which are statistically independent of each other. Drawing from the theory of linear ICA, we formulate two distinct classes of temporal structure of the components which enable identification, i.e. recovery of the original independent components. We show that in both cases, the actual learning can be performed by ordinary neural network training where only the input is defined in an unconventional manner, making software implementations trivial. We can rigorously prove that after such training, the units in the last hidden layer will give the original independent components. [With Hiroshi Morioka, published at NIPS2016 and AISTATS2017.]
Title: Black Box Variational Inference and Deep Exponential Families
Abstract: Bayesian statistics and expressive probabilistic modeling have become key tools for the modern statistician. They let us express complex assumptions about the hidden elements that underlie our data, and they have been successfully applied in numerous fields. The central computational problem in Bayesian statistics is posterior inference, the problem of approximating the conditional distribution of the hidden variables given the observations. Approximate posterior inference algorithms have revolutionized the field, revealing its potential as a usable and general-purpose language for data analysis.
In this talk, I will discuss two related innovations in modeling and inference: deep exponential families and black box variational inference. Deep exponential families (DEFs) adapt the main ideas behind deep learning to expressive probabilistic models. DEFs provide principled probabilistic models that can uncover layers of representations of high-dimensional data. I will show how to use DEFs to analyze text, recommendation data, and electronic health records.
I will then discuss the key algorithm that enables DEFs: Black box variational inference (BBVI). BBVI is a generic and scalable algorithm for approximating the posterior. BBVI easily applies to many models, with little model-specific derivation and few restrictions on their properties.
The event will be in English and is open to all
Amsterdam Data Science (ADS) accelerates data science research by connecting, sharing and showcasing world-class technology, expertise and talent from Amsterdam on a regional, national and international level. Our research enables business and society to better gather, store, analyse and present data in order to gain valuable insights and make informed decisions.
Find out more about ADS at http://amsterdamdatascience.nl