One-hot encodes a text into a list of word indexes of size n.
tf.keras.preprocessing.text.one_hot( text, n, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n', lower=True, split=' ' )
This is a wrapper to the hashing_trick
function using hash
as the hashing function; unicity of word to index mapping non-guaranteed.
text: Input text (string). n: int. Size of vocabulary. filters: list (or concatenation) of characters to filter out, such as punctuation. Default: ``!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\t\n``, includes basic punctuation, tabs, and newlines. lower: boolean. Whether to set the text to lowercase. split: str. Separator for word splitting.
List of integers in [1, n]. Each integer encodes a word (unicity non-guaranteed).
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras/preprocessing/text/one_hot