Posted in 2021

All you need is relative information

  • 09 September 2021

Relative information (relative entropy, KL divergence) and variational inference are powerful tools for deriving learning algorithms and their asymptotic properties, for both static systems and dynamic systems. The goal of this talk is to motivate a general online stochastic learning algorithm for stochastic processes with latent variables or memory, that provably converges under some regularity conditions. Please visit https://bit.ly/3kmovql for details.

In the first half of the talk, we study static systems, viewing maximum likelihood and Bayesian inference through the lens of relative information. In particular, their generalization errors may be derived by resolving the singularities of relative information. We then frame the two learning algorithms as special cases of variational inference with different computational constraints.

Read more ...


Spiking neural networks

  • 05 June 2021

In this post, we study a class of spiking network models based on continuous-time Markov chains with mutable variables.

Using a relative inference recipe for online learning, we derive local Hebbian learning rules for the spiking network which are provably convergent to local minima of the relative information objective.

Read more ...


Convergence of biased stochastic approximation

  • 01 June 2021

Using techniques from biased stochastic approximation [KMMW19], we prove under some regularity conditions the convergence of the online learning algorithm proposed previously for mutable Markov processes.

Recall that the algorithm is described by the following updates.

Read more ...


Path integrals and the Dyson formula

  • 10 May 2021

One of the deepest results in quantum field theory, to me, is the Dyson formula [nLa]. It describes the solution to the differential equation

in terms of the exponential of the path integral of the operator \(A(t)\),

Read more ...


Proofs as programs - challenges and strategies for program synthesis

  • 22 April 2021

The Curry-Howard correspondence between proofs and programs suggests that we can exploit proof assistants for writing software. I will discuss the challenges behind a naïve execution of this idea, and some preliminary strategies for overcoming them. As an example, we will organize higher-order information in knowledge graphs using dependent type theory, and automate the answering of queries using a proof assistant. In another example, we will explore how decentralized proof assistants can enable mathematicians or programmers to work collaboratively on a theorem or application. If time permits, I will outline connections to canonical structures, reflection (ssreflect), transport, unification and universe management.

Topos Institute Colloquium

Read more ...


Biased stochastic approximation with mutable processes

  • 23 March 2021

The goal of this post is to derive a general online learning recipe for training a mutable process \(\{Z_t,X_t\}\) to learn the true distribution \(Q_*(X)\) of a partially-observed Markov process \(\{X_t\}\). The recipe returns a generative distribution \(P(Z,X)\) whose marginal \(P(X)\) approximates \(Q_*(X).\)

The variables \(Z\) of the mutable process are auxiliary variables that assist in inference and computation. During training, the distribution of \(Z\) given \(X\) is controlled by a discriminative model \(\{Q(Z\vert X)\}.\) Our method works in both discrete time and continuous time. We assume in the mutable process that for each time \(t,\) the variables \(Z_t\) and \(X_t\) are conditionally independent of each other given their past.

Read more ...


Relative inference with mutable processes

  • 22 March 2021

We introduce a information-theoretic objective, which is a form of relative information between a discriminative model and a generative model, for learning processes using models with mutable variables. This technique is known as relative inference (also called approximate inference, variational inference or variational Bayes). Such a technique is useful, for instance, for learning processes that contain latent variables.

We discuss natural constraints on the discriminative and generative models, and the consequences of these constraints on:

Read more ...


Process learning with relative information

  • 21 March 2021

Over the next few posts, we will derive a distributed learning algorithm for spiking neural networks with mutable variables that minimizes some natural notion of relative information and provably converges over time. We will model these spiking neural networks with stochastic processes: both discrete-time and continuous-time processes, with or without mutable variables.

In this post, we give a general overview of information-theoretic approaches to training stochastic processes, while postponing discussions about issues that arise from mutable variables.

Read more ...