Belief Propagation Converges to Gaussian Distributions in Sparsely-Connected Factor Graphs

1Imperial College London 2University of Cambridge
43rd International Conference on Machine Learning (ICML 2026)

We provide the first mathematical proof that local variable beliefs in sparse, spatial AI factor graphs converge to Gaussian distributions via a convolutional CLT mechanism. This theoretical guarantee justifies using efficient Gaussian Belief Propagation over expensive non-parametric methods in non-Gaussian settings.

Abstract

Belief Propagation is a powerful algorithm for distributed inference in probabilistic graphical models, however it quickly becomes infeasible for practical compute and memory budgets. Many efficient, non-parametric forms of BP have been developed, but the most popular is Gaussian Belief Propagation, a variant that assumes all distributions are locally Gaussian. GBP is widely used due to its efficiency and empirically strong performance in applications like computer vision or sensor networks - even when modelling non-Gaussian problems.

In this paper, we seek to provide a theoretical guarantee for when Gaussian approximations are valid in highly non-Gaussian, sparsely-connected factor graphs performing BP (common in spatial AI). We leverage the Central Limit Theorem to prove mathematically that variables' beliefs under BP converge to a Gaussian distribution in complex, loopy factor graphs obeying our 4 key assumptions. We then confirm experimentally that variable beliefs become increasingly Gaussian after just a few BP iterations in a stereo depth estimation task.

Belief Propagation

Belief Propagation (BP) is a distributed probabilistic inference algorithm that computes marginal distributions via iterative message passing on factor graphs.

A factor graph encodes a joint distribution \(p(\mathbf{x}) \propto \prod_s f_s(\mathbf{x}_s)\) as a bipartite graph of variable nodes \(x_i\) and factor nodes \(f_s\). BP passes messages between them with three update rules:

Variable → Factor:  \(\mu_{x_i \to f_s}(x_i) = \prod_{t \in \mathrm{ne}(x_i) \setminus f_s} \mu_{f_t \to x_i}(x_i)\)

Variables take the product of all incoming messages from neighbouring factors (excluding the one they are sending to).

Factor → Variable:  \(\mu_{f_s \to x_i}(x_i) = \sum_{\mathbf{x}_s \setminus x_i} f_s(\mathbf{x}_s) \prod_{j \in \mathrm{ne}(f_s) \setminus x_i} \mu_{x_j \to f_s}(x_j)\)

Factors take the product of all incoming messages from neighbouring variables (excluding the one they are sending to), then marginalise over all other variables.

Belief update:  \(b_i(x_i) \propto \prod_{s \in \mathrm{ne}(x_i)} \mu_{f_s \to x_i}(x_i)\)

Variables take the product of all incoming messages.

The widget below lets you explore BP on different graph topologies. Each variable has an unknown 2D position; its graph node shows the current belief mean, and the variance ring shows uncertainty (a larger ring means higher uncertainty). The purple diamond marks the true posterior mean for comparison. As messages pass, nodes converge toward the diamond. Click a variable to send messages to its neighbours or send a single random message or run a full synchronous iteration with the buttons below.

Graph Topology:
Message Schedule:
Click a variable node to send messages from it, or use the buttons above.
Uncertainty (ring radius) True posterior position Prior factor

For a detailed, interactive introduction to Gaussian Belief Propagation see gaussianbp.github.io, which this was heavily inspired by.

Key Intuitions

We show that Belief Propagation messages converge to a Gaussian distribution in a range of topologies. This is because message passing at factors is equivalent to convolution, and repeated convolution drives distributions to a Gaussian via the Central Limit Theorem. Then at variables, message passing involves multiplying incoming messages, which we show preserves Gaussianity. A surprising range of non-Gaussian problems can then be approximated as Gaussian, and our goal is to encourage the use of GBP over expensive non-parametric methods.

In the demo below, see how factor distributions are messy & noisy, but as you click on variables further from a prior factor, their beliefs become increasingly Gaussian.

Graph Topology:
Factor Graph
Click any node to inspect its distribution

Assumptions

Gaussian emergence relies on 4 key assumptions:

  • 1: Finite Moments - All noise models (distributions) in the graph have finite moments.
  • 2: Lindeberg Condition - Along any infinite path the summed variance of the factors must approach infinity.
  • 3: Low-Degree Factors - The graph contains only unary and binary factors.
  • 4: Shift Invariance - Non-prior factors only depend on the difference between their neighbours, \(f(x_i, x_j) = g(x_i - x_j)\).
Try breaking an assumption!
Factor Graph (3×3 Grid)
Click any node to inspect its distribution

Experimental Results

We validate our theoretical findings in stereo depth experiments. In stereo depth estimation, we estimate depth between two stereo images using per-pixel disparity. Disparity is the difference in position between corresponding pixels in the left and right images.

We take a patch around each pixel and compare it to the corresponding patches in the right image, searching across a range from the original position. From this, we build up a probability for each disparity value in the search range.

Left Image — click any pixel
Right Image
Same position as left image True match in right image
Click a pixel in the left image to see its disparity in the right image.
Left patch (7×7)
8 sampled right image patches (d=63 → d=0) — click to select
Disparity probability distribution (SSD cost → p(d) ∝ exp(−λ·cost))

From these per-pixel disparity distributions, we construct a factor graph that connects each pixel variable to its neighbours, with smoothness factors that encourage nearby pixels to have similar disparities. We run GBP to optimise this non-Gaussian graph because, as our theory predicts, the majority of variable beliefs converge to Gaussian distributions during inference. In general, it is only the highly confident beliefs around textured regions (e.g. edges) that remain non-Gaussian.

Stereo depth estimation results for Teddy, Cones, and Moebius scenes

We see that GBP substantially improves disparity estimates over the initial patch-matching baseline across all three scenes. The KL divergence maps then confirm the theoretical prediction: beliefs that are highly non-Gaussian before inference converge to near-Gaussian distributions after BP, particularly in textureless regions where patch-matching leads to high uncertainty.

Limitations & Future Work

There are several limitations to our work. In particular, our guarantees do not directly extend to:

Accordingly, our work opens several useful goals for further research, such as analysis of the speed and stability of cumulant decay under asynchronous BP schedules, or for graphs which do not obey assumptions 3 and 4 (low-degree factors and shift invariance). Moreover, extending our cumulant analysis to uncertainties on Lie Groups presents a critical direction for future work in robust robot perception.

Citation

@inproceedings{yates2026gbpclt,
    title={Belief Propagation Converges to Gaussian Distributions in Sparsely-Connected Factor Graphs},
    author={Yates, Tom and Cheng, Yuzhou and Alzugaray, Ignacio and Akarca, Danyal and Mediano, Pedro A. M. and Davison, Andrew J.},
    booktitle={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
    year={2026}
}