torch.cuda.comm.reduce_add_coalesced
-
torch.cuda.comm.reduce_add_coalesced(inputs, destination=None, buffer_size=10485760)[source] -
Sum tensors from multiple GPUs.
Small tensors are first coalesced into a buffer to reduce the number of synchronizations.
- Parameters
- Returns
-
A tuple of tensors containing an elementwise sum of each group of inputs, placed on the
destinationdevice.