View source on GitHub |
String to Id table that assigns out-of-vocabulary keys to hash buckets.
tf.lookup.StaticVocabularyTable( initializer, num_oov_buckets, lookup_key_dtype=None, name=None )
For example, if an instance of StaticVocabularyTable
is initialized with a string-to-id initializer that maps:
init = tf.lookup.KeyValueTensorInitializer( keys=tf.constant(['emerson', 'lake', 'palmer']), values=tf.constant([0, 1, 2], dtype=tf.int64)) table = tf.lookup.StaticVocabularyTable( init, num_oov_buckets=5)
The Vocabulary
object will performs the following mapping:
emerson -> 0
lake -> 1
palmer -> 2
<other term> -> bucket_id
, where bucket_id
will be between 3
and 3 + num_oov_buckets - 1 = 7
, calculated by: hash(<term>) % num_oov_buckets + vocab_size
input_tensor = tf.constant(["emerson", "lake", "palmer", "king", "crimson"]) table[input_tensor].numpy() array([0, 1, 2, 6, 7])
If initializer
is None, only out-of-vocabulary buckets are used.
num_oov_buckets = 3 vocab = ["emerson", "lake", "palmer", "crimnson"] import tempfile f = tempfile.NamedTemporaryFile(delete=False) f.write('\n'.join(vocab).encode('utf-8')) f.close()
init = tf.lookup.TextFileInitializer( f.name, key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE, value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER) table = tf.lookup.StaticVocabularyTable(init, num_oov_buckets) table.lookup(tf.constant(["palmer", "crimnson" , "king", "tarkus", "black", "moon"])).numpy() array([2, 3, 5, 6, 6, 4])
The hash function used for generating out-of-vocabulary buckets ID is Fingerprint64.
Args | |
---|---|
initializer | A TableInitializerBase object that contains the data used to initialize the table. If None, then we only use out-of-vocab buckets. |
num_oov_buckets | Number of buckets to use for out-of-vocabulary keys. Must be greater than zero. |
lookup_key_dtype | Data type of keys passed to lookup . Defaults to initializer.key_dtype if initializer is specified, otherwise tf.string . Must be string or integer, and must be castable to initializer.key_dtype . |
name | A name for the operation (optional). |
Raises | |
---|---|
ValueError | when num_oov_buckets is not positive. |
TypeError | when lookup_key_dtype or initializer.key_dtype are not integer or string. Also when initializer.value_dtype != int64. |
Attributes | |
---|---|
key_dtype | The table key dtype. |
name | The name of the table. |
resource_handle | Returns the resource handle associated with this Resource. |
value_dtype | The table value dtype. |
lookup
lookup( keys, name=None )
Looks up keys
in the table, outputs the corresponding values.
It assigns out-of-vocabulary keys to buckets based in their hashes.
Args | |
---|---|
keys | Keys to look up. May be either a SparseTensor or dense Tensor . |
name | Optional name for the op. |
Returns | |
---|---|
A SparseTensor if keys are sparse, a RaggedTensor if keys are ragged, otherwise a dense Tensor . |
Raises | |
---|---|
TypeError | when keys doesn't match the table key data type. |
size
size( name=None )
Compute the number of elements in this table.
__getitem__
__getitem__( keys )
Looks up keys
in a table, outputs the corresponding values.
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/lookup/StaticVocabularyTable