optiml.opti.unconstrained.stochastic package

Submodules

Module contents

class optiml.opti.unconstrained.stochastic.StochasticOptimizer(f, x=None, step_size=0.01, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: Optimizer, ABC

Abstract base class for stochastic (mini-batch) first-order optimizers.

At each iteration the gradient is estimated on a mini batch of the data drawn from f.args() rather than on the whole sample, and the point is moved by a step controlled by a (possibly adaptive or scheduled) learning rate. The mini batches are produced by iterating over the data, optionally shuffled once per epoch, where an epoch is a full pass over all the samples.

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.01): the learning rate, i.e., the size of the step taken along the search direction. It can be a positive scalar (kept constant for all iterations), a callable with signature step_size(X_batch, y_batch) returning an iterator (e.g., one of the schedules in schedules.py) or an iterable yielding the step size to use at each iteration.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient. If None the full sample is used at every iteration (i.e., plain batch gradient descent); otherwise it is clipped to lie in [1, n_samples].

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual function), i.e., the algorithm is stopped when the variables or the constraints change by less than tol.

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs, i.e., full passes over the whole sample, before the algorithm is stopped.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator used to initialize x (when not provided) and to shuffle the mini batches, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

iter_mini_batches()[source]

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

is_batch_end()[source]
is_verbose()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_lagrangian_dual()
minimize()
class optiml.opti.unconstrained.stochastic.StochasticMomentumOptimizer(f, x=None, step_size=0.01, momentum_type='none', momentum=0.9, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticOptimizer, ABC

Abstract base class for stochastic optimizers that support a momentum term.

In addition to the plain stochastic step it keeps a velocity that accumulates an exponentially decaying fraction of the past steps, which damps oscillations and accelerates convergence along consistent descent directions. Both the classical heavy-ball (Polyak) momentum and Nesterov’s accelerated momentum are supported, as selected by momentum_type.

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.01): the learning rate, i.e., the size of the step taken along the search direction (see StochasticOptimizer for the accepted forms).

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of momentum to apply: ‘none’ disables momentum, ‘polyak’ uses the classical heavy-ball momentum and ‘nesterov’ uses Nesterov’s accelerated momentum (the gradient is evaluated after the momentum jump).

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the momentum factor, i.e., the fraction of the previous step retained in the current one. It can be a scalar (kept constant) or an iterable yielding the value to use at each iteration.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

minimize()
class optiml.opti.unconstrained.stochastic.StochasticGradientDescent(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.01, momentum_type='none', momentum=0.9, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

Stochastic Gradient Descent (SGD) for the minimization of the provided function f.

At each iteration the point is moved by a fixed (or scheduled) learning rate along the negative of the gradient estimated on a mini batch of the data, optionally accelerated by a classical heavy-ball (Polyak) or Nesterov momentum term as selected by momentum_type.

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.01): the learning rate, i.e., the size of the step taken along the negative gradient.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of momentum to apply (‘none’, heavy-ball ‘polyak’ or ‘nesterov’).

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the momentum factor, i.e., the fraction of the previous step retained in the current one.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

class optiml.opti.unconstrained.stochastic.Adam(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.001, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

Adam (Adaptive Moment Estimation) for the minimization of the provided function f.

It keeps exponentially decaying running averages of the gradient (first moment) and of the squared gradient (second raw moment), both bias-corrected, and scales the step element-wise by the inverse of the square root of the second moment, thus adapting the learning rate to each coordinate.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the Adam step.

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.

  • beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.

  • beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the second raw moment (uncentered variance) estimate of the gradient.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

class optiml.opti.unconstrained.stochastic.AMSGrad(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.001, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

AMSGrad for the minimization of the provided function f.

It is a variant of Adam that, instead of the bias-corrected second moment, uses the element-wise maximum of all the past second raw moment estimates to scale the step. This keeps the effective learning rate non-increasing and fixes a convergence issue of Adam.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the AMSGrad step.

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.

  • beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.

  • beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the second raw moment (uncentered variance) estimate of the gradient.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

class optiml.opti.unconstrained.stochastic.AdaMax(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.002, momentum_type='none', momentum=0.9, beta1=0.9, beta2=0.999, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

AdaMax for the minimization of the provided function f.

It is the variant of Adam based on the infinity norm: instead of the exponentially decaying average of the squared gradients, it tracks an exponentially weighted infinity norm of the gradients (the running maximum of their absolute value) and uses it to scale the bias-corrected first moment.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.002): the learning rate, i.e., the base size of the step taken along the search direction.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of (outer) momentum applied on top of the AdaMax step.

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the (outer) momentum factor, i.e., the fraction of the previous step retained.

  • beta1 – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate for the first moment (mean) estimate of the gradient.

  • beta2 – (real scalar in [0, 1), optional, default value 0.999): the exponential decay rate for the exponentially weighted infinity norm of the gradient.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

class optiml.opti.unconstrained.stochastic.AdaGrad(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=1.0, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticOptimizer

AdaGrad (Adaptive Gradient) for the minimization of the provided function f.

It adapts the learning rate to each coordinate by dividing the step by the square root of the sum of the squares of all the past gradients, so that frequently updated parameters receive smaller steps and rarely updated ones larger steps.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 1.): the learning rate, i.e., the base size of the step taken along the negative gradient.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the accumulated squared gradients to avoid division by zero.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

class optiml.opti.unconstrained.stochastic.AdaDelta(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=1.0, decay=0.9, offset=1e-06, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticOptimizer

AdaDelta for the minimization of the provided function f.

It is an extension of AdaGrad that replaces the ever-growing sum of squared gradients with an exponentially decaying average and, by also tracking an exponentially decaying average of the squared updates, scales the step by the ratio of these two running averages, removing the need for a manually tuned global learning rate.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 1.): the learning rate, i.e., the base size of the step taken along the negative gradient.

  • decay – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate of the running averages of the squared gradients and of the squared updates.

  • offset – (real scalar > 0, optional, default value 1e-6): a small constant added to the running averages to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).

class optiml.opti.unconstrained.stochastic.RMSProp(f, x=None, step_size=0.001, momentum_type='none', momentum=0.9, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, decay=0.9, offset=1e-08, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

RMSProp for the minimization of the provided function f.

It divides the learning rate of each coordinate by the square root of an exponentially decaying average of the squared gradients (the moving root mean square), so that the effective step size adapts to the recent magnitude of the gradients; an optional Polyak or Nesterov momentum can be applied on top.

References

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.001): the learning rate, i.e., the base size of the step taken along the search direction.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of momentum applied on top of the RMSProp step.

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the momentum factor, i.e., the fraction of the previous step retained in the current one.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • decay – (real scalar in [0, 1), optional, default value 0.9): the exponential decay rate of the moving average of the squared gradients.

  • offset – (real scalar > 0, optional, default value 1e-8): a small constant added to the denominator to avoid division by zero and improve numerical stability.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).