MonitoredSession
Defined in tensorflow/python/training/monitored_session.py
.
See the guides: Threading and Queues > Queue usage overview, Training > Distributed execution
Session-like object that handles initialization, recovery and hooks.
Example usage:
saver_hook = CheckpointSaverHook(...) summary_hook = SummarySaverHook(...) with MonitoredSession(session_creator=ChiefSessionCreator(...), hooks=[saver_hook, summary_hook]) as sess: while not sess.should_stop(): sess.run(train_op)
Initialization: At creation time the monitored session does following things in given order:
hook.begin()
for each given hookscaffold.finalize()
Scaffold
hook.after_create_session()
Run: When run()
is called, the monitored session does following things:
hook.before_run()
session.run()
with merged fetches and feed_dicthook.after_run()
session.run()
asked by userAbortedError
or UnavailableError
occurs, it recovers or reinitializes the session before executing the run() call againExit: At the close()
, the monitored session does following things in order:
hook.end()
OutOfRange
error which indicates that all inputs have been processed if the monitored_session is used as a contextHow to set tf.Session
arguments:
MonitoredSession( session_creator=ChiefSessionCreator(master=..., config=...))
MonitoredSession( session_creator=WorkerSessionCreator(master=..., config=...))
See MonitoredTrainingSession
for an example usage based on chief or worker.
Note: This is not a tf.Session
. For example, it cannot do following:
session_creator
: A factory object to create session. Typically a ChiefSessionCreator
which is the default one.hooks
: An iterable of `SessionRunHook' objects.A MonitoredSession object.
graph
The graph that was launched in this session.
__init__
__init__( session_creator=None, hooks=None, stop_grace_period_secs=120 )
Sets up a Monitored or Hooked Session.
session_creator
: A factory object to create session. Typically a ChiefSessionCreator
or a WorkerSessionCreator
.hooks
: An iterable of `SessionRunHook' objects.should_recover
: A bool. Indicates whether to recover from AbortedError
and UnavailableError
or not.stop_grace_period_secs
: Number of seconds given to threads to stop after close()
has been called.__enter__
__enter__()
__exit__
__exit__( exception_type, exception_value, traceback )
close
close()
run
run( fetches, feed_dict=None, options=None, run_metadata=None )
Run ops in the monitored session.
This method is completely compatible with the tf.Session.run()
method.
fetches
: Same as tf.Session.run()
.feed_dict
: Same as tf.Session.run()
.options
: Same as tf.Session.run()
.run_metadata
: Same as tf.Session.run()
.Same as tf.Session.run()
.
run_step_fn
run_step_fn(step_fn)
Run ops using a step function.
step_fn
: A function or a method with a single argument of type StepContext
. The function may use methods of the argument to perform computations with access to a raw session.
The returned value of the step_fn
will be returned from run_step_fn
, unless a stop is requested. In that case, the next should_stop
call will return True.
Example usage:
```python with tf.Graph().as_default(): c = tf.placeholder(dtypes.float32) v = tf.add(c, 4.0) w = tf.add(c, 0.5)
def step_fn(step_context): a = step_context.session.run(fetches=v, feed_dict={c: 0.5}) if a <= 4.5: step_context.request_stop() return step_context.run_with_hooks(fetches=w, feed_dict={c: 0.1}) with tf.MonitoredSession() as session: while not session.should_stop(): a = session.run_step_fn(step_fn)
```
Hooks interact with the run_with_hooks()
call inside the step_fn
as they do with a MonitoredSession.run
call.
Returns the returned value of step_fn
.
StopIteration
: if step_fn
has called request_stop()
. It may be caught by with tf.MonitoredSession()
to close the session.ValueError
: if step_fn
doesn't have a single argument called step_context
. It may also optionally have self
for cases when it belongs to an object.should_stop
should_stop()
© 2018 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/train/MonitoredSession