View source on GitHub |
Pads sequences to the same length.
tf.keras.preprocessing.sequence.pad_sequences( sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.0 )
This function transforms a list (of length num_samples
) of sequences (lists of integers) into a 2D Numpy array of shape (num_samples, num_timesteps)
. num_timesteps
is either the maxlen
argument if provided, or the length of the longest sequence in the list.
Sequences that are shorter than num_timesteps
are padded with value
until they are num_timesteps
long.
Sequences longer than num_timesteps
are truncated so that they fit the desired length.
The position where padding or truncation happens is determined by the arguments padding
and truncating
, respectively. Pre-padding or removing values from the beginning of the sequence is the default.
sequence = [[1], [2, 3], [4, 5, 6]] tf.keras.preprocessing.sequence.pad_sequences(sequence) array([[0, 0, 1], [0, 2, 3], [4, 5, 6]], dtype=int32)
tf.keras.preprocessing.sequence.pad_sequences(sequence, value=-1) array([[-1, -1, 1], [-1, 2, 3], [ 4, 5, 6]], dtype=int32)
tf.keras.preprocessing.sequence.pad_sequences(sequence, padding='post') array([[1, 0, 0], [2, 3, 0], [4, 5, 6]], dtype=int32)
tf.keras.preprocessing.sequence.pad_sequences(sequence, maxlen=2) array([[0, 1], [2, 3], [5, 6]], dtype=int32)
Arguments | |
---|---|
sequences | List of sequences (each sequence is a list of integers). |
maxlen | Optional Int, maximum length of all sequences. If not provided, sequences will be padded to the length of the longest individual sequence. |
dtype | (Optional, defaults to int32). Type of the output sequences. To pad sequences with variable length strings, you can use object . |
padding | String, 'pre' or 'post' (optional, defaults to 'pre'): pad either before or after each sequence. |
truncating | String, 'pre' or 'post' (optional, defaults to 'pre'): remove values from sequences larger than maxlen , either at the beginning or at the end of the sequences. |
value | Float or String, padding value. (Optional, defaults to 0.) |
Returns | |
---|---|
Numpy array with shape (len(sequences), maxlen) |
Raises | |
---|---|
ValueError | In case of invalid values for truncating or padding , or in case of invalid shape for a sequences entry. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences