Optimization parameters for Adam with TPU embeddings.
tf.tpu.experimental.embedding.Adam( learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, lazy_adam=True, sum_inside_sqrt=True, use_gradient_accumulation=True, clip_weight_min=None, clip_weight_max=None, weight_decay_factor=None, multiply_weight_decay_factor_by_learning_rate=None, slot_variable_creation_fn=None )
Pass this to tf.tpu.experimental.embedding.TPUEmbedding
via the optimizer
argument to set the global optimizer and its parameters:
Note: By default this optimizer is lazy, i.e. it will not apply the gradient update of zero to rows that were not looked up. You can change this behavior by settinglazy_adam
toFalse
.
embedding = tf.tpu.experimental.embedding.TPUEmbedding( ... optimizer=tf.tpu.experimental.embedding.Adam(0.1))
This can also be used in a tf.tpu.experimental.embedding.TableConfig
as the optimizer parameter to set a table specific optimizer. This will override the optimizer and parameters for global embedding optimizer defined above:
table_one = tf.tpu.experimental.embedding.TableConfig( vocabulary_size=..., dim=..., optimizer=tf.tpu.experimental.embedding.Adam(0.2)) table_two = tf.tpu.experimental.embedding.TableConfig( vocabulary_size=..., dim=...) feature_config = ( tf.tpu.experimental.embedding.FeatureConfig( table=table_one), tf.tpu.experimental.embedding.FeatureConfig( table=table_two)) embedding = tf.tpu.experimental.embedding.TPUEmbedding( feature_config=feature_config, batch_size=... optimizer=tf.tpu.experimental.embedding.Adam(0.1))
In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.
Args | |
---|---|
learning_rate | The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate. |
beta_1 | A float value. The exponential decay rate for the 1st moment estimates. |
beta_2 | A float value. The exponential decay rate for the 2nd moment estimates. |
epsilon | A small constant for numerical stability. |
lazy_adam | Use lazy Adam instead of Adam. Lazy Adam trains faster. |
sum_inside_sqrt | When this is true, the Adam update formula is changed from m / (sqrt(v) + epsilon) to m / sqrt(v + epsilon**2) . This option improves the performance of TPU training and is not expected to harm model quality. |
use_gradient_accumulation | Setting this to False makes embedding gradients calculation less accurate but faster. |
clip_weight_min | the minimum value to clip by; None means -infinity. |
clip_weight_max | the maximum value to clip by; None means +infinity. |
weight_decay_factor | amount of weight decay to apply; None means that the weights are not decayed. |
multiply_weight_decay_factor_by_learning_rate | if true, weight_decay_factor is multiplied by the current learning rate. |
slot_variable_creation_fn | a callable taking two parameters, a variable and a list of slot names to create for it. This function should return a dict with the slot names as keys and the created variables as values. When set to None (the default), uses the built-in variable creation. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/tpu/experimental/embedding/Adam