Perform a quantized matrix multiplication of a
by the matrix b
with bias
tf.raw_ops.QuantizedMatMulWithBiasAndReluAndRequantize( a, b, bias, min_a, max_a, min_b, max_b, min_freezed_output, max_freezed_output, Toutput=tf.dtypes.quint8, transpose_a=False, transpose_b=False, input_quant_mode='MIN_FIRST', name=None )
add and relu and requantize fusion.
The inputs must be two-dimensional matrices and 1D bias vector. And the inner dimension of a
(after being transposed if transpose_a
is non-zero) must match the outer dimension of b
(after being transposed if transposed_b
is non-zero). Then do broadcast add operation with bias values on the matrix multiplication result. The bias size must match inner dimension of b
. Then do relu activation to get non-negative result. Then do requantize operation to get final uint8 result.
Args: a: A Tensor
. Must be one of the following types: qint8
, quint8
, qint32
, qint16
, quint16
. A matrix to be multiplied. Must be a two-dimensional tensor of type quint8
. b: A Tensor
. Must be one of the following types: qint8
, quint8
, qint32
, qint16
, quint16
. A matrix to be multiplied and must be a two-dimensional tensor of type qint8
. bias: A Tensor
. Must be one of the following types: float32
, qint32
. A 1D bias tensor with size matching with inner dimension of b
(after being transposed if transposed_b
is non-zero). min_a: A Tensor
of type float32
. The float value that the lowest quantized a
value represents. max_a: A Tensor
of type float32
. The float value that the highest quantized a
value represents. min_b: A Tensor
of type float32
. The float value that the lowest quantized b
value represents. max_b: A Tensor
of type float32
. The float value that the highest quantized b
value represents. min_freezed_output: A Tensor
of type float32
. The float value that the highest quantized output value after requantize. max_freezed_output: A Tensor
of type float32
. Toutput: An optional tf.DType
from: tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16
. Defaults to tf.quint8
. transpose_a: An optional bool
. Defaults to False
. If true, a
is transposed before multiplication. transpose_b: An optional bool
. Defaults to False
. If true, b
is transposed before multiplication. input_quant_mode: An optional string
from: "MIN_FIRST", "SCALED"
. Defaults to "MIN_FIRST"
. Input data quantization mode. Either MIN_FIRST(default) or SCALED. name: A name for the operation (optional).
Returns: A tuple of Tensor
objects (out, min_out, max_out).
out: A `Tensor` of type `Toutput`. min_out: A `Tensor` of type `float32`. max_out: A `Tensor` of type `float32`.
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/raw_ops/QuantizedMatMulWithBiasAndReluAndRequantize