W3cubDocs

tf.data.experimental.service.from_dataset_id

Creates a dataset which reads data from the tf.data service.

View aliases

Compat aliases for migration

tf.compat.v1.data.experimental.service.from_dataset_id

tf.data.experimental.service.from_dataset_id(
    processing_mode, service, dataset_id, element_spec=None, job_name=None,
    max_outstanding_requests=None
)

This is useful when the dataset is registered by one process, then used in another process. When the same process is both registering and reading from the dataset, it is simpler to use tf.data.experimental.service.distribute instead.

Before using from_dataset_id, the dataset must have been registered with the tf.data service using tf.data.experimental.service.register_dataset. register_dataset returns a dataset id for the registered dataset. That is the dataset_id which should be passed to from_dataset_id.

The element_spec argument indicates the tf.TypeSpecs for the elements produced by the dataset. Currently element_spec must be explicitly specified, and match the dataset registered under dataset_id. element_spec defaults to None so that in the future we can support automatically discovering the element_spec by querying the tf.data service.

tf.data.experimental.service.distribute is a convenience method which combines register_dataset and from_dataset_id into a dataset transformation. See the documentation for tf.data.experimental.service.distribute for more detail about how from_dataset_id works.

dispatcher = tf.data.experimental.service.DispatchServer()
dispatcher_address = dispatcher.target.split("://")[1]
worker = tf.data.experimental.service.WorkerServer(
    tf.data.experimental.service.WorkerConfig(
        dispatcher_address=dispatcher_address))
dataset = tf.data.Dataset.range(10)
dataset_id = tf.data.experimental.service.register_dataset(
    dispatcher.target, dataset)
dataset = tf.data.experimental.service.from_dataset_id(
    processing_mode="parallel_epochs",
    service=dispatcher.target,
    dataset_id=dataset_id,
    element_spec=dataset.element_spec)
print(list(dataset.as_numpy_iterator()))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Args
`processing_mode`	A string specifying the policy for how data should be processed by tf.data workers. Can be either "parallel_epochs" to have each tf.data worker process a copy of the dataset, or "distributed_epoch" to split a single iteration of the dataset across all the workers.
`service`	A string indicating how to connect to the tf.data service. The string should be in the format "protocol://address", e.g. "grpc://localhost:5000".
`dataset_id`	The id of the dataset to read from. This id is returned by `register_dataset` when the dataset is registered with the tf.data service.
`element_spec`	A nested structure of `tf.TypeSpec`s representing the type of elements produced by the dataset. Use `tf.data.Dataset.element_spec` to see the element spec for a given dataset.
`job_name`	(Optional.) The name of the job. This argument makes it possible for multiple datasets to share the same job. The default behavior is that the dataset creates anonymous, exclusively owned jobs.
`max_outstanding_requests`	(Optional.) A limit on how many elements may be requested at the same time. You can use this option to control the amount of memory used, since `distribute` won't use more than `element_size` * `max_outstanding_requests` of memory.

Returns
A `tf.data.Dataset` which reads from the tf.data service.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/data/experimental/service/from_dataset_id