View source on GitHub |

Optimizer that implements the FTRL algorithm.

Inherits From: `Optimizer`

tf.keras.optimizers.Ftrl( learning_rate=0.001, learning_rate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, name='Ftrl', l2_shrinkage_regularization_strength=0.0, **kwargs )

See Algorithm 1 of this paper. This version has support for both online L2 (the L2 penalty given in the paper above) and shrinkage-type L2 (which is the addition of an L2 penalty to the loss function).

$$t = 0$$

$$n_{0} = 0$$

$$\sigma_{0} = 0$$

$$z_{0} = 0$$

Update (

$$i$$

is variable index):

$$t = t + 1$$

$$n_{t,i} = n_{t-1,i} + g_{t,i}^{2}$$

$$\sigma_{t,i} = (\sqrt{n_{t,i} } - \sqrt{n_{t-1,i} }) / \alpha$$

$$z_{t,i} = z_{t-1,i} + g_{t,i} - \sigma_{t,i} * w_{t,i}$$

$$w_{t,i} = - ((\beta+\sqrt{n+{t} }) / \alpha + \lambda_{2})^{-1} * (z_{i} - sgn(z_{i}) * \lambda_{1}) if \abs{z_{i} } > \lambda_{i} else 0$$

Check the documentation for the l2_shrinkage_regularization_strength parameter for more details when shrinkage is enabled, where gradient is replaced with gradient_with_shrinkage.

Args | |
---|---|

`learning_rate` | A float value or a constant float `Tensor` . |

`learning_rate_power` | A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. |

`initial_accumulator_value` | The starting value for accumulators. Only zero or positive values are allowed. |

`l1_regularization_strength` | A float value, must be greater than or equal to zero. |

`l2_regularization_strength` | A float value, must be greater than or equal to zero. |

`name` | Optional name prefix for the operations created when applying gradients. Defaults to "Ftrl". |

`l2_shrinkage_regularization_strength` | A float value, must be greater than or equal to zero. This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. The FTRL formulation can be written as: w_{t+1} = argminw(\hat{g}{1:t}w + L1||w||_1 + L2||w||_2^2), where \hat{g} = g + (2L2_shrinkagew), and g is the gradient of the loss function w.r.t. the weights w. Specifically, in the absence of L1 regularization, it is equivalent to the following update rule: w_{t+1} = w_t - lr_t / (1 + 2L2lr_t) * g_t - 2L2_shrinkagelr_t / (1 + 2L2lr_t) * w_t where lr_t is the learning rate at t. When input is sparse shrinkage will only happen on the active weights. |

`**kwargs` | keyword arguments. Allowed to be {`clipnorm` , `clipvalue` , `lr` , `decay` }. `clipnorm` is clip gradients by norm; `clipvalue` is clip gradients by value, `decay` is included for backward compatibility to allow time inverse decay of learning rate. `lr` is included for backward compatibility, recommended to use `learning_rate` instead. |

Raises | |
---|---|

`ValueError` | If one of the arguments is invalid. |

Attributes | |
---|---|

`iterations` | Variable. The number of training steps this Optimizer has run. |

`weights` | Returns variables of this Optimizer based on the order created. |

`add_slot`

add_slot( var, slot_name, initializer='zeros' )

Add a new slot variable for `var`

.

`add_weight`

add_weight( name, shape, dtype=None, initializer='zeros', trainable=None, synchronization=tf.VariableSynchronization.AUTO, aggregation=tf.VariableAggregation.NONE )

`apply_gradients`

apply_gradients( grads_and_vars, name=None )

Apply gradients to variables.

This is the second part of `minimize()`

. It returns an `Operation`

that applies gradients.

Args | |
---|---|

`grads_and_vars` | List of (gradient, variable) pairs. |

`name` | Optional name for the returned operation. Default to the name passed to the `Optimizer` constructor. |

Returns | |
---|---|

An `Operation` that applies the specified gradients. The `iterations` will be automatically increased by 1. |

Raises | |
---|---|

`TypeError` | If `grads_and_vars` is malformed. |

`ValueError` | If none of the variables have gradients. |

`from_config`

@classmethod from_config( config, custom_objects=None )

Creates an optimizer from its config.

This method is the reverse of `get_config`

, capable of instantiating the same optimizer from the config dictionary.

Arguments | |
---|---|

`config` | A Python dictionary, typically the output of get_config. |

`custom_objects` | A Python dictionary mapping names to additional Python objects used to create this optimizer, such as a function used for a hyperparameter. |

Returns | |
---|---|

An optimizer instance. |

`get_config`

get_config()

Returns the config of the optimimizer.

An optimizer config is a Python dictionary (serializable) containing the configuration of an optimizer. The same optimizer can be reinstantiated later (without any saved state) from this configuration.

Returns | |
---|---|

Python dictionary. |

`get_gradients`

get_gradients( loss, params )

Returns gradients of `loss`

with respect to `params`

.

Arguments | |
---|---|

`loss` | Loss tensor. |

`params` | List of variables. |

Returns | |
---|---|

List of gradient tensors. |

Raises | |
---|---|

`ValueError` | In case any gradient cannot be computed (e.g. if gradient function not implemented). |

`get_slot`

get_slot( var, slot_name )

`get_slot_names`

get_slot_names()

A list of names for this optimizer's slots.

`get_updates`

get_updates( loss, params )

`get_weights`

get_weights()

`minimize`

minimize( loss, var_list, grad_loss=None, name=None )

Minimize `loss`

by updating `var_list`

.

This method simply computes gradient using `tf.GradientTape`

and calls `apply_gradients()`

. If you want to process the gradient before applying then call `tf.GradientTape`

and `apply_gradients()`

explicitly instead of using this function.

Args | |
---|---|

`loss` | A callable taking no arguments which returns the value to minimize. |

`var_list` | list or tuple of `Variable` objects to update to minimize `loss` , or a callable returning the list or tuple of `Variable` objects. Use callable when the variable list would otherwise be incomplete before `minimize` since the variables are created at the first time `loss` is called. |

`grad_loss` | Optional. A `Tensor` holding the gradient computed for `loss` . |

`name` | Optional name for the returned operation. |

Returns | |
---|---|

An Operation that updates the variables in `var_list` . If `global_step` was not `None` , that operation also increments `global_step` . |

Raises | |
---|---|

`ValueError` | If some of the variables are not `Variable` objects. |

`set_weights`

set_weights( weights )

`variables`

variables()

Returns variables of this Optimizer based on the order created.

© 2020 The TensorFlow Authors. All rights reserved.

Licensed under the Creative Commons Attribution License 3.0.

Code samples licensed under the Apache 2.0 License.

https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras/optimizers/Ftrl