optiml.opti.unconstrained.stochastic.adamax module

class optiml.opti.unconstrained.stochastic.adamax.AdaMax(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.002, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

AdaMax for the minimization of the provided function f.

It is the variant of Adam based on the infinity norm: instead of the exponentially decaying average of the squared gradients, it tracks an exponentially weighted infinity norm of the gradients (the running maximum of their absolute value) and uses it to scale the bias-corrected first moment.

References

Parameters:

f – the objective function.
x – ([n x 1] real column vector): the point where to start the algorithm from.
batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.
eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.
tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).
epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.
step_size – (real scalar > 0, callable or iterable, optional, default value 0.002): the learning rate, i.e., the base size of the step taken along the search direction.
momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the AdaMax step.
momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.
beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.
beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the exponentially weighted infinity norm of the gradient.
offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.
callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.
callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.
shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.
random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.
verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]

callback(args=())

check_lagrangian_dual_conditions()

check_lagrangian_dual_optimality()

is_augmented_lagrangian_dual()

is_batch_end()

is_lagrangian_dual()

is_verbose()

iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:: an infinite iterator of mini batches (one tuple of aligned slices per step).