tf.contrib.data.make_batched_features_dataset( file_pattern, batch_size, features, reader=tf.data.TFRecordDataset, reader_args=None, num_epochs=None, shuffle=True, shuffle_buffer_size=10000, shuffle_seed=None, prefetch_buffer_size=1, reader_num_threads=1, parser_num_threads=2, sloppy_ordering=False, drop_final_batch=False )
Defined in tensorflow/contrib/data/python/ops/readers.py
.
Returns a Dataset
of feature dictionaries from Example
protos.
Example:
serialized_examples = [ features { feature { key: "age" value { int64_list { value: [ 0 ] } } } feature { key: "gender" value { bytes_list { value: [ "f" ] } } } feature { key: "kws" value { bytes_list { value: [ "code", "art" ] } } } }, features { feature { key: "age" value { int64_list { value: [] } } } feature { key: "gender" value { bytes_list { value: [ "f" ] } } } feature { key: "kws" value { bytes_list { value: [ "sports" ] } } } } ]
We can use arguments:
features: { "age": FixedLenFeature([], dtype=tf.int64, default_value=-1), "gender": FixedLenFeature([], dtype=tf.string), "kws": VarLenFeature(dtype=tf.string), }
And the expected output is:
{ "age": [[0], [-1]], "gender": [["f"], ["f"]], "kws": SparseTensor( indices=[[0, 0], [0, 1], [1, 0]], values=["code", "art", "sports"] dense_shape=[2, 2]), }
file_pattern
: List of files or patterns of file paths containing Example
records. See tf.gfile.Glob
for pattern rules.batch_size
: An int representing the number of consecutive elements of this dataset to combine in a single batch.features
: A dict
mapping feature keys to FixedLenFeature
or VarLenFeature
values. See tf.parse_example
.reader
: A function or class that can be called with a filenames
tensor and (optional) reader_args
and returns a Dataset
of Example
tensors. Defaults to tf.data.TFRecordDataset
.reader_args
: Additional arguments to pass to the reader class.num_epochs
: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever. Defaults to None
.shuffle
: A boolean, indicates whether the input should be shuffled. Defaults to True
.shuffle_buffer_size
: Buffer size of the ShuffleDataset. A large capacity ensures better shuffling but would increase memory usage and startup time.shuffle_seed
: Randomization seed to use for shuffling.prefetch_buffer_size
: Number of feature batches to prefetch in order to improve performance. Recommended value is the number of batches consumed per training step (default is 1).reader_num_threads
: Number of threads used to read Example
records. If >1, the results will be interleaved.parser_num_threads
: Number of threads to use for parsing Example
tensors into a dictionary of Feature
tensors.sloppy_ordering
: If True
, reading performance will be improved at the cost of non-deterministic ordering. If False
, the order of elements produced is deterministic prior to shuffling (elements are still randomized if shuffle=True
. Note that if the seed is set, then order of elements after shuffling is deterministic). Defaults to False
.drop_final_batch
: If True
, and the batch size does not evenly divide the input dataset size, the final smaller batch will be dropped. Defaults to False
.A dataset of dict
elements. Each dict
maps feature keys to Tensor
or SparseTensor
objects.
© 2018 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/data/make_batched_features_dataset