optiml.opti.unconstrained.stochastic.gradient_descent module

class optiml.opti.unconstrained.stochastic.gradient_descent.StochasticGradientDescent(f, x=None, batch_size=None, eps=1e-06, tol=1e-08, epochs=1000, step_size=0.01, momentum_type='none', momentum=0.9, callback=None, callback_args=(), shuffle=True, random_state=None, verbose=False)[source]

Bases: StochasticMomentumOptimizer

Stochastic Gradient Descent (SGD) for the minimization of the provided function f.

At each iteration the point is moved by a fixed (or scheduled) learning rate along the negative of the gradient estimated on a mini batch of the data, optionally accelerated by a classical heavy-ball (Polyak) or Nesterov momentum term as selected by momentum_type.

Parameters:
  • f – the objective function.

  • x – ([n x 1] real column vector): the point where to start the algorithm from.

  • batch_size – (integer scalar or None, optional, default value None): the size of the mini batches used to estimate the gradient; if None the full sample is used.

  • eps – (real scalar, optional, default value 1e-6): the accuracy in the stopping criterion: the algorithm is stopped when the norm of the gradient is less than or equal to eps.

  • tol – (real scalar, optional, default value 1e-8): the tolerance used in the optimality conditions of the Lagrangian dual (when f is a Lagrangian dual).

  • epochs – (integer scalar, optional, default value 1000): the maximum number of epochs before the algorithm is stopped.

  • step_size – (real scalar > 0, callable or iterable, optional, default value 0.01): the learning rate, i.e., the size of the step taken along the negative gradient.

  • momentum_type – (string in {‘none’, ‘polyak’, ‘nesterov’}, optional, default value ‘none’): the kind of momentum to apply (‘none’, heavy-ball ‘polyak’ or ‘nesterov’).

  • momentum – (real scalar in [0, 1) or iterable, optional, default value 0.9): the momentum factor, i.e., the fraction of the previous step retained in the current one.

  • callback – (callable, optional, default value None): a function called at each iteration with the optimizer instance (and callback_args) as arguments; it can raise StopIteration to interrupt the optimization.

  • callback_args – (tuple, optional, default value ()): additional positional arguments passed to the callback at each call.

  • shuffle – (boolean, optional, default value True): whether to shuffle the order of the mini batches at the beginning of each epoch.

  • random_state – (integer scalar or None, optional, default value None): seed for the random number generator, for reproducibility.

  • verbose – (boolean or integer, optional, default value False): print details about each iteration if True (or every verbose epochs if an integer), nothing otherwise.

minimize()[source]
callback(args=())
check_lagrangian_dual_conditions()
check_lagrangian_dual_optimality()
is_augmented_lagrangian_dual()
is_batch_end()
is_lagrangian_dual()
is_verbose()
iter_mini_batches()

Return an iterator that successively yields tuples of aligned mini batches of size batch_size from the sliceable arrays returned by f.args(), in random order (when shuffle is True) without replacement.

Returns:

an infinite iterator of mini batches (one tuple of aligned slices per step).