optiml.opti.unconstrained.stochastic.amsgrad module

class optiml.opti.unconstrained.stochastic.amsgrad.AMSGrad(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.001, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

AMSGrad for the minimization of the provided function f.

It is a variant of Adam that, instead of the bias-corrected second moment, uses the element-wise maximum of all the past second raw moment estimates to scale the step. This keeps the effective learning rate non-increasing and fixes a convergence issue of Adam.

References

Parameters:

f – the objective function.
x – ([n x 1] real column vector): the point where to start the algorithm from.
batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.
eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.
tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).
epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.
step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.
momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the AMSGrad step.
momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.
beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.
beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the second raw moment (uncentered variance) estimate of the gradient.
offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.
callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.
callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.
shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.
random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.
verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]

callback(args=())

check_lagrangian_dual_conditions()

check_lagrangian_dual_optimality()

is_augmented_lagrangian_dual()

is_batch_end()

is_lagrangian_dual()

is_verbose()

iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:: an infinite iterator of mini batches (one tuple of aligned slices per step).