Utilities for text input preprocessing.
class Tokenizer
: Text tokenization utility class.
hashing_trick(...)
: Converts a text to a sequence of indexes in a fixed-size hashing space.
one_hot(...)
: One-hot encodes a text into a list of word indexes of size n
.
text_to_word_sequence(...)
: Converts a text to a sequence of words (or tokens).
tokenizer_from_json(...)
: Parses a JSON tokenizer configuration file and returns a
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/keras/preprocessing/text