High-accuracy log-concave sampling with stochastic queries
Authors
Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin
Abstract
We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries.
Concepts
The Big Picture
Imagine navigating a foggy mountain range. In one version, you need to reach the single highest peak; that’s optimization. In another, you need to map the entire landscape; that’s sampling. For decades, researchers treated these as close cousins. A new paper from MIT and Yale shows that when your GPS is unreliable, the two tasks diverge completely.
The specific problem is log-concave sampling: drawing random examples from a well-behaved, hump-shaped probability distribution. This underlies modern statistical inference, from estimating hidden parameters in noisy astronomical data to quantifying uncertainty in machine learning models to running Bayesian analysis at scale. But you rarely have the precise mathematical description of the distribution. Instead, you get stochastic gradient estimates, rough directional clues computed from random data subsets rather than the full picture.
Fan Chen, Sinho Chewi, Constantinos Daskalakis, and Alexander Rakhlin have proven that log-concave sampling tolerates this kind of noise far better than optimization does, but only when the noise is well-behaved. Their result draws a mathematically precise dividing line. Noise with rare, controllable extreme values allows near-perfect sampling with the same tiny query count as having no noise at all. Noise that merely has finite average spread forces as many attempts as the hardest optimization problems.
Key Insight: When stochastic gradient errors have light (subexponential) tails, high-accuracy log-concave sampling requires only polylog(1/δ) queries, matching the exact-gradient rate. Bounded-variance noise is not enough: it forces at least Θ(1/δ) queries, matching the hard lower bound in optimization.
How It Works
The central quantity is δ, the target accuracy in total variation distance, which measures how close the generated distribution is to the true target. “High accuracy” means the query count grows only logarithmically as δ shrinks: reaching δ = 0.001 instead of δ = 0.01 costs only a constant factor, not a tenfold increase.
With exact gradients, current best methods achieve O(√(κd) · polylog(1/δ)) oracle calls, where κ = β/α is the condition number (how elongated the distribution is) and d is the dimension. The new result shows this polylog(1/δ) scaling survives stochastic noise under two conditions:
- Unbiasedness: On average, stochastic estimates must equal the true gradient.
- Subexponential tails: The probability of a large error must decay at least as fast as e^{−|error|/σ}.
Query complexity under these conditions becomes O((√(κd) + σ²/α) · polylog(1/δ)), where σ characterizes the noise level and α is the strong log-concavity parameter. The noise term enters additively, inflating the constant prefactor without touching the critical 1/δ exponent.
The algorithm extends a Metropolis-adjusted Langevin algorithm (MALA) framework: a particle drifts under gradient pull and random thermal jostling, with each proposed step accepted or rejected to keep samples honest. What makes this work is batch averaging, querying n independent stochastic gradients per step and averaging them, with the batch size tuned to each step’s needs.
Because the noise has subexponential tails, averaging suppresses extreme errors exponentially with n. That’s far stronger than the √n improvement you’d get from bounded-variance noise alone, and it’s what unlocks the polylog(1/δ) guarantee.
Why It Matters
Why does sampling behave so differently from optimization? In optimization, you’re hunting for a single point. Every noisy step risks permanent deviation, and recovery costs additional queries. Even Gaussian noise forces poly(1/δ) queries in convex optimization; there’s no escape.
Sampling is different because you’re characterizing an entire distribution. The ergodicity of Markov chains, their tendency to explore and revisit the whole space over time, provides a self-correcting mechanism that optimization lacks. Stochastic errors get absorbed into the chain’s natural fluctuations, provided they don’t systematically bias the chain or produce catastrophic outliers. Subexponential tails guarantee exactly that.
The paper also establishes matching lower bounds: proofs that no algorithm can beat a minimum cost. If stochastic gradients have only bounded variance, any algorithm requires Θ(1/δ) queries. If only finitely many noise moments are bounded, complexity must scale as Ω(1/δ^c) for some c > 0. Light tails aren’t just sufficient for high accuracy; they’re necessary.
For machine learning practitioners, this settles a long-standing question: when are mini-batch gradient estimates good enough for high-quality posterior samples? Sub-Gaussian or subexponential noise puts you in the polylog regime. Bounded variance only? Expect polynomial cost.
The results apply directly to computational physics as well. Lattice quantum field theory, cosmological parameter inference, and gravitational wave data analysis all involve sampling from complex, high-dimensional distributions using gradient estimators from large datasets. The condition number κ and noise parameter σ are concrete, measurable properties of these systems, giving practitioners a principled basis for choosing mini-batch sizes and diagnosing when stochastic approximations will degrade results.
Bottom Line: Sampling is fundamentally more robust to gradient noise than optimization, but only when the noise has light tails. Subexponential noise allows polylog(1/δ) sampling, while bounded-variance noise forces poly(1/δ) cost, exactly matching the optimization lower bound.
IAIFI Research Highlights
This work connects statistical sampling theory and machine learning optimization, proving a fundamental separation between two problems long treated as analogous, with direct implications for Bayesian inference across computational physics and AI.
The paper establishes tight upper and lower bounds for stochastic log-concave sampling, showing that light-tailed gradient noise is both necessary and sufficient for high-accuracy guarantees. This gives practitioners a rigorous criterion for evaluating mini-batch samplers.
High-accuracy sampling under stochastic queries is directly relevant to lattice QCD, cosmological posterior inference, and gravitational wave parameter estimation, where exact gradient access is computationally prohibitive and mini-batch methods are standard practice.
Future work could extend these guarantees to non-log-concave distributions and explore connections to score-based generative models. The paper is available as [arXiv:2602.14342](https://arxiv.org/abs/2602.14342) (February 2026) from authors at MIT and Yale.
Original Paper Details
High-accuracy log-concave sampling with stochastic queries
[arXiv:2602.14342](https://arxiv.org/abs/2602.14342)
Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin
We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries.