Perform a quantized matrix multiplication of `a`

by the matrix `b`

with bias

tf.raw_ops.QuantizedMatMulWithBiasAndReluAndRequantize( a, b, bias, min_a, max_a, min_b, max_b, min_freezed_output, max_freezed_output, Toutput=tf.dtypes.quint8, transpose_a=False, transpose_b=False, input_quant_mode='MIN_FIRST', name=None )

add and relu and requantize fusion.

The inputs must be two-dimensional matrices and 1D bias vector. And the inner dimension of `a`

(after being transposed if `transpose_a`

is non-zero) must match the outer dimension of `b`

(after being transposed if `transposed_b`

is non-zero). Then do broadcast add operation with bias values on the matrix multiplication result. The bias size must match inner dimension of `b`

. Then do relu activation to get non-negative result. Then do requantize operation to get final uint8 result.

Args: a: A `Tensor`

. Must be one of the following types: `qint8`

, `quint8`

, `qint32`

, `qint16`

, `quint16`

. A matrix to be multiplied. Must be a two-dimensional tensor of type `quint8`

. b: A `Tensor`

. Must be one of the following types: `qint8`

, `quint8`

, `qint32`

, `qint16`

, `quint16`

. A matrix to be multiplied and must be a two-dimensional tensor of type `qint8`

. bias: A `Tensor`

. Must be one of the following types: `float32`

, `qint32`

. A 1D bias tensor with size matching with inner dimension of `b`

(after being transposed if `transposed_b`

is non-zero). min_a: A `Tensor`

of type `float32`

. The float value that the lowest quantized `a`

value represents. max_a: A `Tensor`

of type `float32`

. The float value that the highest quantized `a`

value represents. min_b: A `Tensor`

of type `float32`

. The float value that the lowest quantized `b`

value represents. max_b: A `Tensor`

of type `float32`

. The float value that the highest quantized `b`

value represents. min_freezed_output: A `Tensor`

of type `float32`

. The float value that the highest quantized output value after requantize. max_freezed_output: A `Tensor`

of type `float32`

. Toutput: An optional `tf.DType`

from: `tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16`

. Defaults to `tf.quint8`

. transpose_a: An optional `bool`

. Defaults to `False`

. If true, `a`

is transposed before multiplication. transpose_b: An optional `bool`

. Defaults to `False`

. If true, `b`

is transposed before multiplication. input_quant_mode: An optional `string`

from: `"MIN_FIRST", "SCALED"`

. Defaults to `"MIN_FIRST"`

. Input data quantization mode. Either MIN_FIRST(default) or SCALED. name: A name for the operation (optional).

Returns: A tuple of `Tensor`

objects (out, min_out, max_out).

out: A `Tensor` of type `Toutput`. min_out: A `Tensor` of type `float32`. max_out: A `Tensor` of type `float32`.

© 2020 The TensorFlow Authors. All rights reserved.

Licensed under the Creative Commons Attribution License 3.0.

Code samples licensed under the Apache 2.0 License.

https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/raw_ops/QuantizedMatMulWithBiasAndReluAndRequantize