optiml.opti.unconstrained.stochastic.adam module

class optiml.opti.unconstrained.stochastic.adam.Adam(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.001, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

Adam (Adaptive Moment Estimation) for the minimization of the provided function f.

It keeps exponentially decaying running averages of the gradient (first moment) and of the squared gradient (second raw moment), both bias-corrected, and scales the step element-wise by the inverse of the square root of the second moment, thus adapting the learning rate to each coordinate.

References

Parameters:

f – the objective function.
x – ([n x 1] real column vector): the point where to start the algorithm from.
batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.
eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.
tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).
epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.
step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.
momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the Adam step.
momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.
beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.
beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the second raw moment (uncentered variance) estimate of the gradient.
offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.
callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.
callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.
shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.
random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.
verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]

callback(args=())

check_lagrangian_dual_conditions()

check_lagrangian_dual_optimality()

is_augmented_lagrangian_dual()

is_batch_end()

is_lagrangian_dual()

is_verbose()

iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:: an infinite iterator of mini batches (one tuple of aligned slices per step).