Paper-1 : arXiv

Paper-2 arXiv

Variational Bayesian Monte Carlo

Result

  1. Nonparametric analytical approximation of posterior distribution of unobserved variables (parameters + latent variables), to do statistical inference over them.
  2. Approximate lower bound for model evidence (marginal likelihood, Bayes factor) of the observed data, can be used for model selection. Idea is that higher the marginal likelihood for a given model, it is a better fit over the data by that model so has greater probability that this model generated the data.

We can also use sampling-based methods (MCMC eg. Gibbs Sampling) to approximate the intractable solution of an inference problem.

Why not MCMC?

Variational Inference (Variational Bayesian Methods, Variational Bayes, VI)

intractable distri P ~ q belonging to tractable distri Q

Kullback-Leibler Divergence

KL(q || p) = \sum ( q(x) log ( q(x) / p(x) )
Optimisation objective - J(q) - captures similarity b/w q and p

Bayesian Quadrature (BQ)

(f) = \int f(x) \pi (x)dx,
Here, f(x) - GP prior & π(x) - known prob distri

Gaussian Process

Active Sampling

VBMC Algo

In each iteration t,

  1. sequentially sample a batch of new points n_active that maximise acquisition func. a(\theta) and evaluate log joint f at each point
  2. train GP surrogate model of log joint f; training set is points evaluated so far
  3. update variational posterior approx. by optimising surrogate ELBO calculated via Bayesian quadrature.

VI using GP as surrogate model f for expensive log posterior. Keep updating GP using active sampling.
In each iteration except 1st, VBMC samples n_active (=5) points. Select each point sequentially, by optimising acquisition func. & apply fast rank-one updates of GP posterior after each acquisition. No sampling in first iteration so that variational posterior can adapt.
Algo works in unconstrained inference space R^D but parameters with bound constraints can be handled via nonlinear remapping of input space via a shifted and rescaled logit transform, with Jacobian correction of log prob density. Solutions are ed back to the original space via a matched inverse transform, e.g., a shifted and rescaled logistic function for bound parameters.

Variational Posterior q(\theta)

ELBO (Evidence Lower Bound, negative free energy)

GP f : SE kernel, gaussian likelihood, negative quadratic mean function. GP hyperparam are estimated via MCMC when there’s large uncertainty about GP and then via MAP estimates using gradient-based optimization.

ELCBO (Evidence Lower Confidence Bound)

In VBMC algo, do active sampling sequentially to find sequence of integrals across iterations 1 ..., T s.t.

2 acquisition functions for VBMC based on uncertainty sampling (operate pointwise on posterior density)

Adaptive treatment of GP hyperparam

Initialization

Warm up

Adaptive num of mixture comps - K

Termination

Future Work

Variational whitening

Acq func:

  1. a_npro
  2. Global Acq func:
    • driven by uncertainty in posterior mass
    • account for non-local changes in GP model when making new obs
    EIG (Expected Information Gain)
    • sample points that maximize EIG of integral G (eqn 2)
    • choose next location θ* that maximizes mutual information I[G;y*] ; G=expected log joint, y* = new obs
    IMIQR/VIQR
    • IQR (interquantile range) : estimate of uncertainty of unnormalized posterior
    • integral is intractable so approximate it using MCMC and importance sampling

Applications

Application of Kriging and Variational Bayesian Monte Carlo method for improved prediction of doped UO2 fission gas release