Transforms a spectrogram into a form that's useful for speech recognition.
tf.raw_ops.Mfcc( spectrogram, sample_rate, upper_frequency_limit=4000, lower_frequency_limit=20, filterbank_channel_count=40, dct_coefficient_count=13, name=None )
Mel Frequency Cepstral Coefficients are a way of representing audio data that's been effective as an input feature for machine learning. They are created by taking the spectrum of a spectrogram (a 'cepstrum'), and discarding some of the higher frequencies that are less significant to the human ear. They have a long history in the speech recognition world, and https://en.wikipedia.org/wiki/Mel-frequency_cepstrum is a good resource to learn more.
Args | |
---|---|
spectrogram | A Tensor of type float32 . Typically produced by the Spectrogram op, with magnitude_squared set to true. |
sample_rate | A Tensor of type int32 . How many samples per second the source audio used. |
upper_frequency_limit | An optional float . Defaults to 4000 . The highest frequency to use when calculating the ceptstrum. |
lower_frequency_limit | An optional float . Defaults to 20 . The lowest frequency to use when calculating the ceptstrum. |
filterbank_channel_count | An optional int . Defaults to 40 . Resolution of the Mel bank used internally. |
dct_coefficient_count | An optional int . Defaults to 13 . How many output channels to produce per time slice. |
name | A name for the operation (optional). |
Returns | |
---|---|
A Tensor of type float32 . |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/raw_ops/Mfcc