tf.keras.preprocessing.sequence.skipgrams( sequence, vocabulary_size, window_size=4, negative_samples=1.0, shuffle=True, categorical=False, sampling_table=None, seed=None )
Defined in tensorflow/python/keras/_impl/keras/preprocessing/sequence.py
.
Generates skipgram word pairs.
This function transforms a sequence of word indexes (list of integers) into tuples of words of the form:
Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space
sequence
: A word sequence (sentence), encoded as a list of word indices (integers). If using a sampling_table
, word indices are expected to match the rank of the words in a reference dataset (e.g. 10 would encode the 10-th most frequently occurring token). Note that index 0 is expected to be a non-word and will be skipped.vocabulary_size
: Int, maximum possible word index + 1window_size
: Int, size of sampling windows (technically half-window). The window of a word w_i
will be [i - window_size, i + window_size+1]
.negative_samples
: Float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as positive samples.shuffle
: Whether to shuffle the word couples before returning them.categorical
: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]
), if True
, labels will be categorical, e.g. [[1,0],[0,1],[0,1] .. ]
.sampling_table
: 1D array of size vocabulary_size
where the entry i encodes the probability to sample a word of rank i.seed
: Random seed.couples, labels: where `couples` are int pairs and `labels` are either 0 or 1.
By convention, index 0 in the vocabulary is a non-word and will be skipped.
© 2018 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/skipgrams