Randomized Splitting Methods and Stochastic Gradient Algorithms
ABSTRACT
We explore an explicit link between stochastic gradient algorithms using common batching strategies and splitting methods for ordinary and stochastic differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric Minibatching Strategy) for stochastic gradient optimisation and MCMC which shows greatly reduced stochastic gradient bias (from O(h) to O(h2) in the stepsize h) when combined with momentum-based optimisers. We justify why momentum is needed to obtain the improved performance using the theory of backward analysis for splitting integrators and provide a detailed analytic computation of the stochastic gradient bias on a simple example. Further, we provide improved convergence guarantees for this new minibatching strategy using Lyapunov techniques that show reduced stochastic gradient bias for a fixed stepsize (or learning rate) over the class of strongly-convex and smooth objective functions. We argue that this also leads to a faster convergence rate when considering a decreasing stepsize schedule according to the Robbins-Munro criterion. Both the reduced bias and efficacy of decreasing stepsizes are demonstrated numerically on several motivating examples in both MCMC and Optimization.