Events
Created On: May 04, 2021 | Last Updated On: Jun 10, 2024
Module contains events processing mechanisms that are integrated with the standard python logging.
Example of usage:
from torch.distributed.elastic import events
event = events.Event(
name="test_event", source=events.EventSource.WORKER, metadata={...}
)
events.get_logging_handler(destination="console").info(event)
API Methods
-
torch.distributed.elastic.events.record(event, destination='null')[source]
-
torch.distributed.elastic.events.construct_and_record_rdzv_event(run_id, message, node_state, name='', hostname='', pid=None, master_endpoint='', local_id=None, rank=None)[source] -
Initialize rendezvous event object and record its operations.
- Parameters
-
- run_id (str) – The run id of the rendezvous.
- message (str) – The message describing the event.
- node_state (NodeState) – The state of the node (INIT, RUNNING, SUCCEEDED, FAILED).
- name (str) – Event name. (E.g. Current action being performed).
- hostname (str) – Hostname of the node.
- pid (Optional[int]) – The process id of the node.
- master_endpoint (str) – The master endpoint for the rendezvous store, if known.
- local_id (Optional[int]) – The local_id of the node, if defined in dynamic_rendezvous.py
- rank (Optional[int]) – The rank of the node, if known.
- Returns
-
None
- Return type
-
None
Example
>>> # See DynamicRendezvousHandler class >>> def _record( ... self, ... message: str, ... node_state: NodeState = NodeState.RUNNING, ... rank: Optional[int] = None, ... ) -> None: ... construct_and_record_rdzv_event( ... name=f"{self.__class__.__name__}.{get_method_name()}", ... run_id=self._settings.run_id, ... message=message, ... node_state=node_state, ... hostname=self._this_node.addr, ... pid=self._this_node.pid, ... local_id=self._this_node.local_id, ... rank=rank, ... )
-
torch.distributed.elastic.events.get_logging_handler(destination='null')[source] -
- Return type
Event Objects
-
class torch.distributed.elastic.events.api.Event(name, source, timestamp=0, metadata=<factory>)[source] -
The class represents the generic event that occurs during the torchelastic job execution.
The event can be any kind of meaningful action.
-
class torch.distributed.elastic.events.api.EventSource(value)[source] -
Known identifiers of the event producers.