optiml.opti.unconstrained.stochastic.amsgrad module

class optiml.opti.unconstrained.stochastic.amsgrad.AMSGrad(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.001, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

AMSGrad for the minimization of the provided function f.

It is a variant of Adam that, instead of the bias-corrected second moment, uses the element-wise maximum of all the past second raw moment estimates to scale the step. This keeps the effective learning rate non-increasing and fixes a convergence issue of Adam.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the AMSGrad step.

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.

  • beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.

  • beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the second raw moment (uncentered variance) estimate of the gradient.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).