Experimental Object Oriented Distributed API
Created On: Jul 09, 2025 | Last Updated On: Jul 30, 2025
This is an experimental new API for PyTorch Distributed. This is actively in development and subject to change or deletion entirely.
This is intended as a proving ground for more flexible and object oriented distributed APIs.
-
class torch.distributed._dist2.ProcessGroup -
Bases:
pybind11_objectA ProcessGroup is a communication primitive that allows for collective operations across a group of processes.
This is a base class that provides the interface for all ProcessGroups. It is not meant to be used directly, but rather extended by subclasses.
-
class BackendType -
Bases:
pybind11_objectThe type of the backend used for the process group.
Members:
UNDEFINED
GLOO
NCCL
XCCL
UCC
MPI
CUSTOM
-
CUSTOM = <BackendType.CUSTOM: 6>
-
GLOO = <BackendType.GLOO: 1>
-
MPI = <BackendType.MPI: 4>
-
NCCL = <BackendType.NCCL: 2>
-
UCC = <BackendType.UCC: 3>
-
UNDEFINED = <BackendType.UNDEFINED: 0>
-
XCCL = <BackendType.XCCL: 5>
-
property name
-
property value
-
-
CUSTOM = <BackendType.CUSTOM: 6>
-
GLOO = <BackendType.GLOO: 1>
-
MPI = <BackendType.MPI: 4>
-
NCCL = <BackendType.NCCL: 2>
-
UCC = <BackendType.UCC: 3>
-
UNDEFINED = <BackendType.UNDEFINED: 0>
-
XCCL = <BackendType.XCCL: 5>
-
abort(self: torch._C._distributed_c10d.ProcessGroup) → None -
abort all operations and connections if supported by the backend
-
allgather(*args, **kwargs) -
Overloaded function.
- allgather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fad5a376870>) -> c10d::Work
Allgathers the input tensors from all processes across the process group.
See
torch.distributed.all_gather()for more details.- allgather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensor: torch.Tensor, timeout: datetime.timedelta | None = None) -> c10d::Work
Allgathers the input tensors from all processes across the process group.
See
torch.distributed.all_gather()for more details.
-
allgather_coalesced(self: torch._C._distributed_c10d.ProcessGroup, output_lists: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_list: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fad59f0ec70>) → c10d::Work -
Allgathers the input tensors from all processes across the process group.
See
torch.distributed.all_gather()for more details.
-
allgather_into_tensor_coalesced(self: torch._C._distributed_c10d.ProcessGroup, outputs: collections.abc.Sequence[torch.Tensor], inputs: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fad5a6ef270>) → c10d::Work -
Allgathers the input tensors from all processes across the process group.
See
torch.distributed.all_gather()for more details.
-
allreduce(*args, **kwargs) -
Overloaded function.
- allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllreduceOptions = <torch._C._distributed_c10d.AllreduceOptions object at 0x7fad5a366130>) -> c10d::Work
Allreduces the provided tensors across all processes in the process group.
See
torch.distributed.all_reduce()for more details.- allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work
Allreduces the provided tensors across all processes in the process group.
See
torch.distributed.all_reduce()for more details.- allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work
Allreduces the provided tensors across all processes in the process group.
See
torch.distributed.all_reduce()for more details.
-
allreduce_coalesced(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllreduceCoalescedOptions = <torch._C._distributed_c10d.AllreduceCoalescedOptions object at 0x7fad59f0e5b0>) → c10d::Work -
Allreduces the provided tensors across all processes in the process group.
See
torch.distributed.all_reduce()for more details.
-
alltoall(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllToAllOptions = <torch._C._distributed_c10d.AllToAllOptions object at 0x7fad5a411ff0>) → c10d::Work -
Alltoalls the input tensors from all processes across the process group.
See
torch.distributed.all_to_all()for more details.
-
alltoall_base(*args, **kwargs) -
Overloaded function.
- alltoall_base(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: torch.Tensor, output_split_sizes: collections.abc.Sequence[typing.SupportsInt], input_split_sizes: collections.abc.Sequence[typing.SupportsInt], opts: torch._C._distributed_c10d.AllToAllOptions = <torch._C._distributed_c10d.AllToAllOptions object at 0x7fad5a3384b0>) -> c10d::Work
Alltoalls the input tensors from all processes across the process group.
See
torch.distributed.all_to_all()for more details.- alltoall_base(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: torch.Tensor, output_split_sizes: collections.abc.Sequence[typing.SupportsInt], input_split_sizes: collections.abc.Sequence[typing.SupportsInt], timeout: datetime.timedelta | None = None) -> c10d::Work
Alltoalls the input tensors from all processes across the process group.
See
torch.distributed.all_to_all()for more details.
-
barrier(*args, **kwargs) -
Overloaded function.
- barrier(self: torch._C._distributed_c10d.ProcessGroup, opts: torch._C._distributed_c10d.BarrierOptions = <torch._C._distributed_c10d.BarrierOptions object at 0x7fad59f0fa70>) -> c10d::Work
- Blocks until all processes in the group enter the call, and
-
then all leave the call together.
See
torch.distributed.barrier()for more details.
- barrier(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta | None = None) -> c10d::Work
- Blocks until all processes in the group enter the call, and
-
then all leave the call together.
See
torch.distributed.barrier()for more details.
-
property bound_device_id
-
boxed(self: torch._C._distributed_c10d.ProcessGroup) → object
-
broadcast(*args, **kwargs) -
Overloaded function.
- broadcast(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.BroadcastOptions = <torch._C._distributed_c10d.BroadcastOptions object at 0x7fad5a355ff0>) -> c10d::Work
Broadcasts the tensor to all processes in the process group.
See
torch.distributed.broadcast()for more details.- broadcast(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work
Broadcasts the tensor to all processes in the process group.
See
torch.distributed.broadcast()for more details.
-
gather(*args, **kwargs) -
Overloaded function.
- gather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.GatherOptions = <torch._C._distributed_c10d.GatherOptions object at 0x7fad5a6eea70>) -> c10d::Work
Gathers the input tensors from all processes across the process group.
See
torch.distributed.gather()for more details.- gather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensor: torch.Tensor, root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work
Gathers the input tensors from all processes across the process group.
See
torch.distributed.gather()for more details.
-
get_group_store(self: torch._C._distributed_c10d.ProcessGroup) → torch._C._distributed_c10d.Store -
Get the store of this process group.
-
property group_desc -
Gets this process group description
-
property group_name -
(Gets this process group name. It’s cluster unique)
-
merge_remote_group(self: torch._C._distributed_c10d.ProcessGroup, store: torch._C._distributed_c10d.Store, size: SupportsInt, timeout: datetime.timedelta = datetime.timedelta(seconds=1800), group_name: str | None = None, group_desc: str | None = None) → torch._C._distributed_c10d.ProcessGroup
-
monitored_barrier(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta | None = None, wait_all_ranks: bool = False) → None -
- Blocks until all processes in the group enter the call, and
-
then all leave the call together.
See
torch.distributed.monitored_barrier()for more details.
-
name(self: torch._C._distributed_c10d.ProcessGroup) → str -
Get the name of this process group.
-
rank(self: torch._C._distributed_c10d.ProcessGroup) → int -
Get the rank of this process group.
-
recv(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], srcRank: SupportsInt, tag: SupportsInt) → c10d::Work -
Receives the tensor from the specified rank.
See
torch.distributed.recv()for more details.
-
recv_anysource(self: torch._C._distributed_c10d.ProcessGroup, arg0: collections.abc.Sequence[torch.Tensor], arg1: SupportsInt) → c10d::Work -
Receives the tensor from any source.
See
torch.distributed.recv()for more details.
-
reduce(*args, **kwargs) -
Overloaded function.
- reduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.ReduceOptions = <torch._C._distributed_c10d.ReduceOptions object at 0x7fad5a413930>) -> c10d::Work
Reduces the provided tensors across all processes in the process group.
See
torch.distributed.reduce()for more details.- reduce(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, root: typing.SupportsInt, op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work
Reduces the provided tensors across all processes in the process group.
See
torch.distributed.reduce()for more details.
-
reduce_scatter(*args, **kwargs) -
Overloaded function.
- reduce_scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], opts: torch._C._distributed_c10d.ReduceScatterOptions = <torch._C._distributed_c10d.ReduceScatterOptions object at 0x7fad5a367fb0>) -> c10d::Work
Reduces and scatters the input tensors from all processes across the process group.
See
torch.distributed.reduce_scatter()for more details.- reduce_scatter(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: collections.abc.Sequence[torch.Tensor], op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work
Reduces and scatters the input tensors from all processes across the process group.
See
torch.distributed.reduce_scatter()for more details.
-
reduce_scatter_tensor_coalesced(self: torch._C._distributed_c10d.ProcessGroup, outputs: collections.abc.Sequence[torch.Tensor], inputs: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.ReduceScatterOptions = <torch._C._distributed_c10d.ReduceScatterOptions object at 0x7fad5a376db0>) → c10d::Work -
Reduces and scatters the input tensors from all processes across the process group.
See
torch.distributed.reduce_scatter()for more details.
-
scatter(*args, **kwargs) -
Overloaded function.
- scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], opts: torch._C._distributed_c10d.ScatterOptions = <torch._C._distributed_c10d.ScatterOptions object at 0x7fad59f16ff0>) -> c10d::Work
Scatters the input tensors from all processes across the process group.
See
torch.distributed.scatter()for more details.- scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensor: torch.Tensor, input_tensors: collections.abc.Sequence[torch.Tensor], root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work
Scatters the input tensors from all processes across the process group.
See
torch.distributed.scatter()for more details.
-
send(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], dstRank: SupportsInt, tag: SupportsInt) → c10d::Work -
Sends the tensor to the specified rank.
See
torch.distributed.send()for more details.
-
set_timeout(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta) → None -
Sets the default timeout for all future operations.
-
shutdown(self: torch._C._distributed_c10d.ProcessGroup) → None -
shutdown the process group
-
size(self: torch._C._distributed_c10d.ProcessGroup) → int -
Get the size of this process group.
-
split_group(self: torch._C._distributed_c10d.ProcessGroup, ranks: collections.abc.Sequence[typing.SupportsInt], timeout: datetime.timedelta | None = None, opts: c10d::Backend::Options | None = None, group_name: str | None = None, group_desc: str | None = None) → torch._C._distributed_c10d.ProcessGroup
-
static unbox(arg0: object) → torch._C._distributed_c10d.ProcessGroup
-
-
class torch.distributed._dist2.ProcessGroupFactory(*args, **kwargs)[source] -
Bases:
ProtocolProtocol for process group factories.
-
torch.distributed._dist2.current_process_group()[source] -
Get the current process group. Thread local method.
- Returns
-
The current process group.
- Return type
-
torch.distributed._dist2.new_group(backend, timeout, device, **kwargs)[source] -
Create a new process group with the given backend and options. This group is independent and will not be globally registered and thus not usable via the standard torch.distributed.* APIs.
- Parameters
-
- backend (str) – The backend to use for the process group.
- timeout (timedelta) – The timeout for collective operations.
- device (Union[str, device]) – The device to use for the process group.
- **kwargs (object) – All remaining arguments are passed to the backend constructor. See the backend specific documentation for details.
- Returns
-
A new process group.
- Return type
-
torch.distributed._dist2.process_group(pg)[source] -
Context manager for process groups. Thread local method.
- Parameters
-
pg (ProcessGroup) – The process group to use.
- Return type
-
Generator[None, None, None]
-
torch.distributed._dist2.register_backend(name, func)[source] -
Register a new process group backend.
- Parameters
-
- name (str) – The name of the backend.
- func (ProcessGroupFactory) – The function to create the process group.