View source on GitHub |
NCCL all-reduce implementation of CrossDeviceOps.
Inherits From: CrossDeviceOps
tf.distribute.NcclAllReduce( num_packs=1 )
It uses Nvidia NCCL for all-reduce. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.
For reduces that are not all-reduce, it falls back to tf.distribute.ReductionToOneDevice
.
Here is how you can use NcclAllReduce
in tf.distribute.MirroredStrategy
:
strategy = tf.distribute.MirroredStrategy( cross_device_ops=tf.distribute.NcclAllReduce())
Args | |
---|---|
num_packs | a non-negative integer. The number of packs to split values into. If zero, no packing will be done. |
Raises | |
---|---|
ValueError | if num_packs is negative. |
batch_reduce
batch_reduce( reduce_op, value_destination_pairs, options=None )
Reduce values to destinations in batches.
See tf.distribute.StrategyExtended.batch_reduce_to
. This can only be called in the cross-replica context.
Args | |
---|---|
reduce_op | a tf.distribute.ReduceOp specifying how values should be combined. |
value_destination_pairs | a sequence of (value, destinations) pairs. See tf.distribute.CrossDeviceOps.reduce for descriptions. |
options | a tf.distribute.experimental.CommunicationOptions . See tf.distribute.experimental.CommunicationOptions for details. |
Returns | |
---|---|
A list of tf.Tensor or tf.distribute.DistributedValues , one per pair in value_destination_pairs . |
Raises | |
---|---|
ValueError | if value_destination_pairs is not an iterable of tuples of tf.distribute.DistributedValues and destinations. |
broadcast
broadcast( tensor, destinations )
Broadcast tensor
to destinations
.
This can only be called in the cross-replica context.
Args | |
---|---|
tensor | a tf.Tensor like object. The value to broadcast. |
destinations | a tf.distribute.DistributedValues , a tf.Variable , a tf.Tensor alike object, or a device string. It specifies the devices to broadcast to. Note that if it's a tf.Variable , the value is broadcasted to the devices of that variable, this method doesn't update the variable. |
Returns | |
---|---|
A tf.Tensor or tf.distribute.DistributedValues . |
reduce
reduce( reduce_op, per_replica_value, destinations, options=None )
Reduce per_replica_value
to destinations
.
See tf.distribute.StrategyExtended.reduce_to
. This can only be called in the cross-replica context.
Args | |
---|---|
reduce_op | a tf.distribute.ReduceOp specifying how values should be combined. |
per_replica_value | a tf.distribute.DistributedValues , or a tf.Tensor like object. |
destinations | a tf.distribute.DistributedValues , a tf.Variable , a tf.Tensor alike object, or a device string. It specifies the devices to reduce to. To perform an all-reduce, pass the same to value and destinations . Note that if it's a tf.Variable , the value is reduced to the devices of that variable, and this method doesn't update the variable. |
options | a tf.distribute.experimental.CommunicationOptions . See tf.distribute.experimental.CommunicationOptions for details. |
Returns | |
---|---|
A tf.Tensor or tf.distribute.DistributedValues . |
Raises | |
---|---|
ValueError | if per_replica_value can't be converted to a tf.distribute.DistributedValues or if destinations is not a string, tf.Variable or tf.distribute.DistributedValues . |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/distribute/NcclAllReduce