View source on GitHub |
Checkpoints input pipeline state every N steps or seconds.
Inherits From: SessionRunHook
tf.data.experimental.CheckpointInputPipelineHook( estimator )
This hook saves the state of the iterators in the Graph
so that when training is resumed the input pipeline continues from where it left off. This could potentially avoid overfitting in certain pipelines where the number of training steps per eval are small compared to the dataset size or if the training pipeline is pre-empted.
Differences from CheckpointSaverHook
:
GraphDef
and MetaGraphDef
to the summary.Example of checkpointing the training pipeline:
est = tf.estimator.Estimator(model_fn) while True: est.train( train_input_fn, hooks=[tf.data.experimental.CheckpointInputPipelineHook(est)], steps=train_steps_per_eval) # Note: We do not pass the hook here. metrics = est.evaluate(eval_input_fn) if should_stop_the_training(metrics): break
This hook should be used if the input pipeline state needs to be saved separate from the model checkpoint. Doing so may be useful for a few reasons:
For saving the input pipeline checkpoint alongside the model weights use tf.data.experimental.make_saveable_from_iterator
directly to create a SaveableObject
and add to the SAVEABLE_OBJECTS
collection. Note, however, that you will need to be careful not to restore the training iterator during eval. You can do that by not adding the iterator to the SAVEABLE_OBJECTS collector when building the eval graph.
Args | |
---|---|
estimator | Estimator. |
Raises | |
---|---|
ValueError | One of save_steps or save_secs should be set. |
ValueError | At most one of saver or scaffold should be set. |
after_create_session
after_create_session( session, coord )
Called when new TensorFlow session is created.
This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin
is called:
Args | |
---|---|
session | A TensorFlow Session that has been created. |
coord | A Coordinator object which keeps track of all threads. |
after_run
after_run( run_context, run_values )
Called after each call to run().
The run_values
argument contains results of requested ops/tensors by before_run()
.
The run_context
argument is the same one send to before_run
call. run_context.request_stop()
can be called to stop the iteration.
If session.run()
raises any exceptions then after_run()
is not called.
Args | |
---|---|
run_context | A SessionRunContext object. |
run_values | A SessionRunValues object. |
before_run
before_run( run_context )
Called before each call to run().
You can return from this call a SessionRunArgs
object indicating ops or tensors to add to the upcoming run()
call. These ops/tensors will be run together with the ops/tensors originally passed to the original run() call. The run args you return can also contain feeds to be added to the run() call.
The run_context
argument is a SessionRunContext
that provides information about the upcoming run()
call: the originally requested op/tensors, the TensorFlow Session.
At this point graph is finalized and you can not add ops.
Args | |
---|---|
run_context | A SessionRunContext object. |
Returns | |
---|---|
None or a SessionRunArgs object. |
begin
begin()
Called once before using the session.
When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin()
call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin()
on the same graph, should not change the graph.
end
end( session )
Called at the end of session.
The session
argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.
If session.run()
raises exception other than OutOfRangeError or StopIteration then end()
is not called. Note the difference between end()
and after_run()
behavior when session.run()
raises OutOfRangeError or StopIteration. In that case end()
is called but after_run()
is not called.
Args | |
---|---|
session | A TensorFlow Session that will be soon closed. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/data/experimental/CheckpointInputPipelineHook