optiml.opti.unconstrained.stochastic.rmsprop module

class optiml.opti.unconstrained.stochastic.rmsprop.RMSProp(f, x=None, step_size=0.001, momentum_type='none', momentum=0.9, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, decay=0.9, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

RMSProp for the minimization of the provided function f.

It divides the learning rate of each coordinate by the square root of an exponentially decaying average of the squared gradients (the moving root mean square), so that the effective step size adapts to the recent magnitude of the gradients; an optional Polyak or Nesterov momentum can be applied on top.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of momentum applied on top of the RMSProp step.

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the momentum factor, i.e., the fraction of the previous step retained in the current one.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • decay – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate of the moving average of the squared gradients.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).