This document walks through the principal TensorFlow APIs necessary to make effective use of a Cloud TPU, and highlights the differences between regular TensorFlow usage, and usage on a TPU.
This doc is aimed at users who:
Estimator
and Dataset
APIsEstimator
model to run on Cloud TPUsEstimators are TensorFlow's model-level abstraction. Standard Estimators
can drive models on CPU and GPUs. You must use tf.contrib.tpu.TPUEstimator
to drive a model on TPUs.
Refer to TensorFlow's Getting Started section for an introduction to the basics of using a pre-made Estimator
, and custom Estimator
s.
The TPUEstimator
class differs somewhat from the Estimator
class.
The simplest way to maintain a model that can be run both on CPU/GPU or on a Cloud TPU is to define the model's inference phase (from inputs to predictions) outside of the model_fn
. Then maintain separate implementations of the Estimator
setup and model_fn
, both wrapping this inference step. For an example of this pattern compare the mnist.py
and mnist_tpu.py
implementation in tensorflow/models.
TPUEstimator
locallyTo create a standard Estimator
you call the constructor, and pass it a model_fn
, for example:
my_estimator = tf.estimator.Estimator( model_fn=my_model_fn)
The changes required to use a tf.contrib.tpu.TPUEstimator
on your local machine are relatively minor. The constructor requires two additional arguments. You should set the use_tpu
argument to False
, and pass a tf.contrib.tpu.RunConfig
as the config
argument, as shown below:
my_tpu_estimator = tf.contrib.tpu.TPUEstimator( model_fn=my_model_fn, config=tf.contrib.tpu.RunConfig() use_tpu=False)
Just this simple change will allow you to run a TPUEstimator
locally. The majority of example TPU models can be run in this local mode, by setting the command line flags as follows:
$> python mnist_tpu.py --use_tpu=false --master=''
Note: Thisuse_tpu=False
argument is useful for trying out theTPUEstimator
API. It is not meant to be a complete TPU compatibility test. Successfully running a model locally in aTPUEstimator
does not guarantee that it will work on a TPU.
tpu.RunConfig
While the default RunConfig
is sufficient for local training, these settings cannot be ignored in real usage.
A more typical setup for a RunConfig
, that can be switched to use a Cloud TPU, might be as follows:
import tempfile import subprocess class FLAGS(object): use_tpu=False tpu_name=None # Use a local temporary path for the `model_dir` model_dir = tempfile.mkdtemp() # Number of training steps to run on the Cloud TPU before returning control. iterations = 50 # A single Cloud TPU has 8 shards. num_shards = 8 if FLAGS.use_tpu: my_project_name = subprocess.check_output([ 'gcloud','config','get-value','project']) my_zone = subprocess.check_output([ 'gcloud','config','get-value','compute/zone']) cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver( tpu_names=[FLAGS.tpu_name], zone=my_zone, project=my_project) master = tpu_cluster_resolver.get_master() else: master = '' my_tpu_run_config = tf.contrib.tpu.RunConfig( master=master, evaluation_master=master, model_dir=FLAGS.model_dir, session_config=tf.ConfigProto( allow_soft_placement=True, log_device_placement=True), tpu_config=tf.contrib.tpu.TPUConfig(FLAGS.iterations, FLAGS.num_shards), )
Then you must pass the tf.contrib.tpu.RunConfig
to the constructor:
my_tpu_estimator = tf.contrib.tpu.TPUEstimator( model_fn=my_model_fn, config = my_tpu_run_config, use_tpu=FLAGS.use_tpu)
Typically the FLAGS
would be set by command line arguments. To switch from training locally to training on a cloud TPU you would need to:
FLAGS.use_tpu
to True
FLAGS.tpu_name
so the tf.contrib.cluster_resolver.TPUClusterResolver
can find itFLAGS.model_dir
to a Google Cloud Storage bucket url (gs://
).When training on a cloud TPU you must wrap the optimizer in a tf.contrib.tpu.CrossShardOptimizer
, which uses an allreduce
to aggregate gradients and broadcast the result to each shard (each TPU core).
The CrossShardOptimizer
is not compatible with local training. So, to have the same code run both locally and on a Cloud TPU, add lines like the following:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) if FLAGS.use_tpu: optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)
If you prefer to avoid a global FLAGS
variable in your model code, one approach is to set the optimizer as one of the Estimator
's params, as follows:
my_tpu_estimator = tf.contrib.tpu.TPUEstimator( model_fn=my_model_fn, config = my_tpu_run_config, use_tpu=FLAGS.use_tpu, params={'optimizer':optimizer})
This section details the changes you must make to the model function (model_fn()
) to make it TPUEstimator
compatible.
During regular usage TensorFlow attempts to determine the shapes of each tf.Tensor
during graph construction. During execution any unknown shape dimensions are determined dynamically, see Tensor Shapes for more details.
To run on Cloud TPUs TensorFlow models are compiled using XLA. XLA uses a similar system for determining shapes at compile time. XLA requires that all tensor dimensions be statically defined at compile time. All shapes must evaluate to a constant, and not depend on external data, or stateful operations like variables or a random number generator.
Remove any use of tf.summary
from your model.
TensorBoard summaries are a great way see inside your model. A minimal set of basic summaries are automatically recorded by the TPUEstimator
, to event
files in the model_dir
. Custom summaries, however, are currently unsupported when training on a Cloud TPU. So while the TPUEstimator
will still run locally with summaries, it will fail if used on a TPU.
Build your evaluation metrics dictionary in a stand-alone metric_fn
.
Evaluation metrics are an essential part of training a model. These are fully supported on Cloud TPUs, but with a slightly different syntax.
A standard tf.metrics
returns two tensors. The first returns the running average of the metric value, while the second updates the running average and returns the value for this batch:
running_average, current_batch = tf.metrics.accuracy(labels, predictions)
In a standard Estimator
you create a dictionary of these pairs, and return it as part of the EstimatorSpec
.
my_metrics = {'accuracy': tf.metrics.accuracy(labels, predictions)} return tf.estimator.EstimatorSpec( ... eval_metric_ops=my_metrics )
In a TPUEstimator
you instead pass a function (which returns a metrics dictionary) and a list of argument tensors, as shown below:
def my_metric_fn(labels, predictions): return {'accuracy': tf.metrics.accuracy(labels, predictions)} return tf.contrib.tpu.TPUEstimatorSpec( ... eval_metrics=(my_metric_fn, [labels, predictions]) )
TPUEstimatorSpec
TPUEstimatorSpec
do not support hooks, and require function wrappers for some fields.
An Estimator
's model_fn
must return an EstimatorSpec
. An EstimatorSpec
is a simple structure of named fields containing all the tf.Tensors
of the model that the Estimator
may need to interact with.
TPUEstimators
use a tf.contrib.tpu.TPUEstimatorSpec
. There are a few differences between it and a standard tf.estimator.EstimatorSpec
:
eval_metric_ops
must be wrapped into a metrics_fn
, this field is renamed eval_metrics
(see above).scaffold
, if used, must also be wrapped in a function. This field is renamed to scaffold_fn
.Scaffold
and Hooks
are for advanced usage, and can typically be omitted.
Input functions work mainly unchanged as they run on the host computer, not the Cloud TPU itself. This section explains the two necessary adjustments.
The input_fn
for a standard Estimator
can include a params
argument; the input_fn
for a TPUEstimator
must include a params
argument. This is necessary to allow the estimator to set the batch size for each replica of the input stream. So the minimum signature for an input_fn
for a TPUEstimator
is:
def my_input_fn(params): pass
Where params['batch-size']
will contain the batch size.
The input pipeline generated by your input_fn
is run on CPU. So it is mostly free strict static shape requirements imposed by the XLA/TPU environment. The one requirement is that the batches of data fed from your input pipeline to the TPU have a static shape, as determined by the standard TensorFlow shape inference algorithm. Intermediate tensors are free to have a dynamic shapes. If shape inference has failed, but the shape is known it is possible to impose the correct shape using tf.set_shape()
.
In the example below the shape inference algorithm fails, but it is correctly using set_shape
:
>>> x = tf.zeros(tf.constant([1,2,3])+1) >>> x.shape TensorShape([Dimension(None), Dimension(None), Dimension(None)]) >>> x.set_shape([2,3,4])
In many cases the batch size is the only unknown dimension.
A typical input pipeline, using tf.data
, will usually produce batches of a fixed size. The last batch of a finite Dataset
, however, is typically smaller, containing just the remaining elements. Since a Dataset
does not know its own length or finiteness, the standard batch
method cannot determine if all batches will have a fixed size batch on its own:
>>> params = {'batch_size':32} >>> ds = tf.data.Dataset.from_tensors([0, 1, 2]) >>> ds = ds.repeat().batch(params['batch-size']) >>> ds <BatchDataset shapes: (?, 3), types: tf.int32>
The most straightforward fix is to apply tf.contrib.data.batch_and_drop_remainder
as follows:
>>> params = {'batch_size':32} >>> ds = tf.data.Dataset.from_tensors([0, 1, 2]) >>> ds = ds.repeat().apply( ... tf.contrib.data.batch_and_drop_remainder(params['batch-size'])) >>> ds <_RestructuredDataset shapes: (32, 3), types: tf.int32>
The one downside to this approach is that, as the name implies, this batching method throws out any fractional batch at the end of the dataset. This is fine for an infinitely repeating dataset being used for training, but could be a problem if you want to train for an exact number of epochs.
To do an exact 1-epoch of evaluation you can work around this by manually padding the length of the batches, and setting the padding entries to have zero weight when creating your tf.metrics
.
Efficient use of the tf.data.Dataset
API is critical when using a Cloud TPU, as it is impossible to use the Cloud TPU's unless you can feed it data quickly enough. See Input Pipeline Performance Guide for details on dataset performance.
For all but the simplest experimentation (using tf.data.Dataset.from_tensor_slices
or other in-graph data) you will need to store all data files read by the TPUEstimator
's Dataset
in Google Cloud Storage Buckets.
For most use-cases, we recommend converting your data into TFRecord
format and using a tf.data.TFRecordDataset
to read it. This, however, is not a hard requirement and you can use other dataset readers (FixedLengthRecordDataset
or TextLineDataset
) if you prefer.
Small datasets can be loaded entirely into memory using tf.data.Dataset.cache
.
Regardless of the data format used, it is strongly recommended that you use large files, on the order of 100MB. This is especially important in this networked setting as the overhead of opening a file is significantly higher.
It is also important, regardless of the type of reader used, to enable buffering using the buffer_size
argument to the constructor. This argument is specified in bytes. A minimum of a few MB (buffer_size=8*1024*1024
) is recommended so that data is available when needed.
The TPU-demos repo includes a script for downloading the imagenet dataset and converting it to an appropriate format. This together with the imagenet models included in the repo demonstrate all of these best-practices.
For details on how to actually set up and run a Cloud TPU see:
This document is by no means exhaustive. The best source of more detail on how to make a Cloud TPU compatible model are the example models published in:
For more information about tuning TensorFlow code for performance see:
© 2018 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/programmers_guide/using_tpu