View source on GitHub |
A dtype policy for a Keras layer.
tf.keras.mixed_precision.experimental.Policy( name, loss_scale=USE_DEFAULT )
A dtype policy determines dtype-related aspects of a layer, such as its computation and variable dtypes. Each layer has a policy. Policies can be passed to the dtype
argument of layer constructors, or a global policy can be set with tf.keras.mixed_precision.experimental.set_policy
. A layer will default to the global policy if no policy is passed to it's constructor.
For many models, each layer's policy will have the same compute dtype and variable dtype, which will typically be float32. In this case, we refer to the singular dtype as the layer's dtype, which can be queried by the property tf.keras.layers.Layer.dtype
.
When mixed precision training is used, most layers will instead have a float16 or bfloat16 compute dtype and a float32 variable dtype, and so the layer does not have a single dtype. When the variable dtype does not match the compute dtype, variables will be automatically casted to the compute dtype to avoid type errors. In this case, tf.keras.layers.Layer.dtype
refers to the variable dtype, not the compute dtype. See the mixed precision guide for more information on how to use mixed precision.
Certain policies also have a tf.mixed_precision.experimental.LossScale
instance, which is used by tf.keras.Model
s to performance loss scaling. Loss scaling is a technique used with mixed precision to avoid numerical underflow in float16 gradients. Loss scaling is only done by Models in Model.fit
, Model.train_on_batch
, and similar methods. Layers which are not Models ignore the loss scale.
Policies are constructed by passing a string to the constructor, e.g. tf.keras.mixed_precision.experimental.Policy('float32')
. The string determines the compute and variable dtypes. It can be one of the following:
To use mixed precision in a Keras model, the 'mixed_float16'
or 'mixed_bfloat16'
policy can be used. tf.keras.mixed_precision.experimental.set_policy
can be used to set the default policy for layers if no policy is passed to them. For example:
tf.keras.mixed_precision.experimental.set_policy('mixed_float16') model = tf.keras.models.Sequential([ tf.keras.layers.Input((100,)), # Dense layers use global policy of 'mixed_float16', which does # computations in float16 while keeping variables in float32. tf.keras.layers.Dense(10), tf.keras.layers.Dense(10), # Softmax should be done in float32 for numeric stability. We pass # dtype='float32' to use float32 instead of the global policy. tf.keras.layers.Activation('softmax', dtype='float32') ])
Alternatively, the policy can be passed to individual layers instead of setting the global policy with set_policy
:
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16') model = tf.keras.models.Sequential([ tf.keras.layers.Input((100,)), tf.keras.layers.Dense(10, dtype=policy), tf.keras.layers.Dense(10, dtype=policy), # Softmax should be done in float32 for numeric stability. tf.keras.layers.Activation('softmax', dtype='float32') ])
Note the 'mixed_float16'
policy will apply loss scaling by default in Model.fit
, Model.train_on_batch
, and other training methods. If no such method is used (e.g., a custom training loop is used) and 'mixed_float16'
is used, the loss scale must be manually applied. See tf.keras.mixed_precision.experimental.LossScaleOptimizer
for details. For 'mixed_bfloat16'
, no loss scaling is done and loss scaling never needs to be manually applied.
See the mixed precision guide for more information on using mixed precision
Using float64 is similar to mixed precision. Either the global policy can be set to float64, or dtype='float64'
can be passed to individual layers. For example, to set the global policy:
tf.keras.mixed_precision.experimental.set_policy('float64') model = tf.keras.models.Sequential([ tf.keras.layers.Input((100,)), # All layers use global policy of 'float64', which does computations # and creates variables in float64. tf.keras.layers.Dense(10), tf.keras.layers.Dense(10), tf.keras.layers.Activation('softmax') ]) # Optionaly set policy back to float32 if any other models use float32 tf.keras.mixed_precision.experimental.set_policy('float32')
A layer will cast its inputs to its compute dtype in TensorFlow 2. For example:
x = tf.ones((4, 4, 4, 4), dtype='float64') # `layer`'s policy defaults to float32. layer = tf.keras.layers.Conv2D(filters=4, kernel_size=2) # `layer` casts it's inputs to its compute dtype, which is float32, and # does computations in float32. y = layer(x) y.dtype tf.float32
Note that the base tf.keras.layers.Layer
class inserts the casts. If subclassing your own layer, you do not have to insert any casts.
Currently, only tensors in the first argument to the layer's call
method are casted. For example:
class MyLayer(tf.keras.layers.Layer): # Bug! `b` will not be casted. def call(self, a, b): return a + 1., b + 1. a = tf.constant(1., dtype="float32") b = tf.constant(1., dtype="float32") layer = MyLayer(dtype="float64") x, y = layer(a, b) x.dtype tf.float64 y.dtype tf.float32
If writing your own layer, it is recommended to accept tensors only in the first argument. This way, all tensors are casted to the layer's compute dtype. MyLayer
should therefore be written as:
class MyLayer(tf.keras.layers.Layer): # Now, all tensor inputs will be casted. def call(self, inputs): a, b = inputs return a + 1., b + 1. a = tf.constant(1., dtype="float32") b = tf.constant(1., dtype="float32") layer = MyLayer(dtype="float64") x, y = layer((a, b)) x.dtype tf.float64 y.dtype tf.float64
Other arguments are not automatically casted for technical reasons, but this may change in a future minor release.
The casting only occurs in TensorFlow 2, but can be enabled if tf.compat.v1.disable_v2_behavior()
has been called with tf.compat.v1.keras.layers.enable_v2_dtype_behavior()
.
A layer subclass can prevent its inputs from being autocasted by passing autocast=False
to the layer constructor. For example:
class NonAutoCastingLayer(tf.keras.layers.Layer): def __init__(self, **kwargs): kwargs['autocast'] = False super(NonAutoCastingLayer, self).__init__(**kwargs) def call(self, inp): return inp x = tf.ones((4, 4, 4, 4), dtype='float32') layer = NonAutoCastingLayer(dtype='float64') y = layer(x) # Will not cast inputs to it's compute dtype of float64 y.dtype tf.float32
The default dtype of variables created by tf.keras.layers.Layer.add_weight
is the layer's policy's variable dtype.
If a layer's compute and variable dtypes differ, add_weight
will wrap floating-point variables with a special wrapper called an AutoCastVariable
. This wrapper is identical to the original variable except it casts itself to the layer's compute dtype when used within Layer.call
. Outside Layer.call
, the variable is not casted.
A layer author can prevent a variable from being wrapped with an AutoCastVariable
by passing experimental_autocast=False
to add_weight
:
class MyLayer(tf.keras.layers.Layer): def build(self, input_shape): self.x = self.add_weight('x') self.y = self.add_weight('y', experimental_autocast=False) policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16') layer = MyLayer(dtype=policy) layer.build((2, 2)) layer.x <AutoCastVariable 'x:0' shape=() dtype=float32 true_dtype=float32, numpy=...> layer.y <tf.Variable 'y:0' shape=() dtype=float32, numpy=...>
Passing experimental_autocast=False
is useful for layers which may internally do some math in the variable dtype instead of the compute dtype. For example, you may wish to compute variable statistics, such as mean and variance, in the variable dtype.
For the most part, layers will automatically support mixed precision and float64 without any additional work, due to the fact the base layer automatically casts inputs, creates variables of the correct type, and in the case of mixed precision, wraps variables with AutoCastVariables
.
For example, this simple dense layer does not require any additional work to support mixed precision or float64. Keras automatically casts the inputs and variable to the appropriate dtype.
class MyDense(tf.keras.layers.Layer): def build(self, input_shape): self.kernel = self.add_weight('kernel', (input_shape[-1], 10)) def call(self, inputs): return tf.matmul(inputs, self.kernel)
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16') layer = MyDense(dtype=policy) x = np.random.rand(10, 10) y = layer(x) y.dtype tf.float16
The primary case where you need extra work to support mixed precision or float64 is when you create a new tensor, such as with tf.ones
or tf.constant
. In such cases, you must create the tensor of the correct dtype. For example, suppose you modify the MyDense
layer to add a random number to the output using tf.random.normal
. You must pass the input dtype to tf.random.normal
to ensure the dtypes match.
class MyDense(tf.keras.layers.Layer): def build(self, input_shape): self.kernel = self.add_weight('kernel', (input_shape[-1], 10)) def call(self, inputs): rand = tf.random.normal(shape=inputs.shape, dtype=inputs.dtype) return tf.matmul(inputs, self.kernel) + rand layer = MyDense(dtype=policy) y = layer(x) y.dtype tf.float16
If you did not pass dtype=inputs.dtype
to tf.random.normal
, a TypeError
would have occurred. This is because the dtype defaults to "float32"
, so the layer would only work if the inputs were float32.
Args | |
---|---|
name | A string. Can be one of the following values:
|
loss_scale | A tf.mixed_precision.experimental.LossScale , an int (which uses a FixedLossScale ), or the string "dynamic" (which uses a DynamicLossScale ). Defaults to using no loss scaling unless name is "mixed_float16", in which case this defaults to "dynamic". Only tf.keras.Model s, not layers, use the loss scale, and it is only used during Model.fit , Model.train_on_batch , and other similar methods. |
Attributes | |
---|---|
compute_dtype | The compute dtype of this policy. This is the dtype layers will do their computations in. Note that even if the compute dtype is float16 or bfloat16, hardware devices may not do individual adds, multiplies, and other fundamental operations in [b]float16, but instead may do some of them in float32 for numeric stability. The compute dtype is the dtype of the inputs and outputs of the TensorFlow ops that the layer executes. Internally, many TensorFlow ops will do certain internal calculations in float32, or some other device-internal intermediate format with higher precision than [b]float16, to increase numeric stability. For example, a |
loss_scale | Returns the loss scale of this Policy. |
name | Returns the name of this policy. |
should_cast_variables | Returns True if variables should be casted. This is true if the variable dtype is not the same as the compute dtype. |
variable_dtype | The variable dtype of this policy. This is the dtype layers will create their variables in, unless a layer explicitly chooses a different dtype. If this is different than |
from_config
@classmethod from_config( config, custom_objects=None )
get_config
get_config()
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/keras/mixed_precision/experimental/Policy