Optimization parameters for Ftrl with TPU embeddings.
tf.compat.v1.tpu.experimental.FtrlParameters( learning_rate: float, learning_rate_power: float = -0.5, initial_accumulator_value: float = 0.1, l1_regularization_strength: float = 0.0, l2_regularization_strength: float = 0.0, use_gradient_accumulation: bool = True, clip_weight_min: Optional[float] = None, clip_weight_max: Optional[float] = None, weight_decay_factor: Optional[float] = None, multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None, multiply_linear_by_learning_rate: bool = False, beta: float = 0, allow_zero_accumulator: bool = False, clip_gradient_min: Optional[float] = None, clip_gradient_max: Optional[float] = None )
Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec
via the optimization_parameters
argument to set the optimizer and its parameters. See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec
for more details.
estimator = tf.estimator.tpu.TPUEstimator( ... embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec( ... optimization_parameters=tf.tpu.experimental.FtrlParameters(0.1), ...))
Args | |
---|---|
learning_rate | a floating point value. The learning rate. |
learning_rate_power | A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. See section 3.1 in the paper. |
initial_accumulator_value | The starting value for accumulators. Only zero or positive values are allowed. |
l1_regularization_strength | A float value, must be greater than or equal to zero. |
l2_regularization_strength | A float value, must be greater than or equal to zero. |
use_gradient_accumulation | setting this to False makes embedding gradients calculation less accurate but faster. Please see optimization_parameters.proto for details. for details. |
clip_weight_min | the minimum value to clip by; None means -infinity. |
clip_weight_max | the maximum value to clip by; None means +infinity. |
weight_decay_factor | amount of weight decay to apply; None means that the weights are not decayed. |
multiply_weight_decay_factor_by_learning_rate | if true, weight_decay_factor is multiplied by the current learning rate. |
multiply_linear_by_learning_rate | When true, multiplies the usages of the linear slot in the weight update by the learning rate. This is useful when ramping up learning rate from 0 (which would normally produce NaNs). |
beta | The beta parameter for FTRL. |
allow_zero_accumulator | Changes the implementation of the square root to allow for the case of initial_accumulator_value being zero. This will cause a slight performance drop. |
clip_gradient_min | the minimum value to clip by; None means -infinity. |
clip_gradient_max | the maximum value to clip by; None means +infinity. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/compat/v1/tpu/experimental/FtrlParameters