optiml.opti.unconstrained.stochastic.adadelta module

class optiml.opti.unconstrained.stochastic.adadelta.AdaDelta(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=1.0, decay=0.9, offset=1e-06, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticOptimizer

AdaDelta for the minimization of the provided function f.

It is an extension of AdaGrad that replaces the ever-growing sum of squared gradients with an exponentially decaying average and, by also tracking an exponentially decaying average of the squared updates, scales the step by the ratio of these two running averages, removing the need for a manually tuned global learning rate.

References

Parameters:

f – the objective function.
x – ([n x 1] real column vector): the point where to start the algorithm from.
batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.
eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.
tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).
epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.
step_size – (real scalar > 0, callable or iterable, optional, default value 1.): the learning rate, i.e., the base size of the step taken along the negative gradient.
decay – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate of the running averages of the squared gradients and of the squared updates.
offset – (real scalar > 0, optional, default value 1e-6): a small constant added to the running averages to avoid division by zero and improve numerical stability.
callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.
callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.
shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.
random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.
verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]

callback(args=())

check_lagrangian_dual_conditions()

check_lagrangian_dual_optimality()

is_augmented_lagrangian_dual()

is_batch_end()

is_lagrangian_dual()

is_verbose()

iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:: an infinite iterator of mini batches (one tuple of aligned slices per step).