CategoricalLogitsNegativeLogProbLoss
Inherits From: DistributionNegativeLogProbLoss
, NaturalParamsNegativeLogProbLoss
Defined in tensorflow/contrib/kfac/python/ops/loss_functions.py
.
Neg log prob loss for a categorical distribution parameterized by logits.
Note that the Fisher (for a single case) of a categorical distribution, with respect to the natural parameters (i.e. the logits), is given by:
F = diag(p) - p*p^T
where p = softmax(logits). F can be factorized as F = B * B^T where
B = diag(q) - p*q^T
where q is the entry-wise square root of p. This is easy to verify using the fact that q^T*q = 1.
dist
The underlying tf.distributions.Distribution.
fisher_factor_inner_shape
The shape of the tensor returned by multiply_fisher_factor.
fisher_factor_inner_static_shape
Static version of fisher_factor_inner_shape.
hessian_factor_inner_shape
The shape of the tensor returned by multiply_hessian_factor.
hessian_factor_inner_static_shape
Static version of hessian_factor_inner_shape.
inputs
The inputs to the loss function (excluding the targets).
params
Parameters to the underlying distribution.
targets
The targets being predicted by the model.
None or Tensor of appropriate shape for calling self._evaluate() on.
__init__
__init__( logits, targets=None, seed=None )
Instantiates a CategoricalLogitsNegativeLogProbLoss.
logits
: Tensor of shape [batch_size, output_size]. Parameters for underlying distribution.targets
: None or Tensor of shape [output_size]. Each elements contains an index in [0, output_size).seed
: int or None. Default random seed when sampling.evaluate
evaluate()
Evaluate the loss function on the targets.
evaluate_on_sample
evaluate_on_sample(seed=None)
Evaluates the log probability on a random sample.
seed
: int or None. Random seed for this draw from the distribution.Log probability of sampled targets, summed across examples.
multiply_fisher
multiply_fisher(vector)
Right-multiply a vector by the Fisher.
vector
: The vector to multiply. Must be the same shape(s) as the 'inputs' property.The vector right-multiplied by the Fisher. Will be of the same shape(s) as the 'inputs' property.
multiply_fisher_factor
multiply_fisher_factor(vector)
Right-multiply a vector by a factor B of the Fisher.
Here the 'Fisher' is the Fisher information matrix (i.e. expected outer- product of gradients) with respect to the parameters of the underlying probability distribtion (whose log-prob defines the loss). Typically this will be block-diagonal across different cases in the batch, since the distribution is usually (but not always) conditionally iid across different cases.
Note that B can be any matrix satisfying B * B^T = F where F is the Fisher, but will agree with the one used in the other methods of this class.
vector
: The vector to multiply. Must be of the shape given by the 'fisher_factor_inner_shape' property.The vector right-multiplied by B. Will be of the same shape(s) as the 'inputs' property.
multiply_fisher_factor_replicated_one_hot
multiply_fisher_factor_replicated_one_hot(index)
Right-multiply a replicated-one-hot vector by a factor B of the Fisher.
Here the 'Fisher' is the Fisher information matrix (i.e. expected outer- product of gradients) with respect to the parameters of the underlying probability distribtion (whose log-prob defines the loss). Typically this will be block-diagonal across different cases in the batch, since the distribution is usually (but not always) conditionally iid across different cases.
A 'replicated-one-hot' vector means a tensor which, for each slice along the batch dimension (assumed to be dimension 0), is 1.0 in the entry corresponding to the given index and 0 elsewhere.
Note that B can be any matrix satisfying B * B^T = H where H is the Fisher, but will agree with the one used in the other methods of this class.
index
: A tuple representing in the index of the entry in each slice that is 1.0. Note that len(index) must be equal to the number of elements of the 'fisher_factor_inner_shape' tensor minus one.The vector right-multiplied by B. Will be of the same shape(s) as the 'inputs' property.
multiply_fisher_factor_transpose
multiply_fisher_factor_transpose(vector)
Right-multiply a vector by the transpose of a factor B of the Fisher.
Here the 'Fisher' is the Fisher information matrix (i.e. expected outer- product of gradients) with respect to the parameters of the underlying probability distribtion (whose log-prob defines the loss). Typically this will be block-diagonal across different cases in the batch, since the distribution is usually (but not always) conditionally iid across different cases.
Note that B can be any matrix satisfying B * B^T = F where F is the Fisher, but will agree with the one used in the other methods of this class.
vector
: The vector to multiply. Must be the same shape(s) as the 'inputs' property.The vector right-multiplied by B^T. Will be of the shape given by the 'fisher_factor_inner_shape' property.
multiply_hessian
multiply_hessian(vector)
Right-multiply a vector by the Hessian.
Here the 'Hessian' is the Hessian matrix (i.e. matrix of 2nd-derivatives) of the loss function with respect to its inputs.
vector
: The vector to multiply. Must be the same shape(s) as the 'inputs' property.The vector right-multiplied by the Hessian. Will be of the same shape(s) as the 'inputs' property.
multiply_hessian_factor
multiply_hessian_factor(vector)
Right-multiply a vector by a factor B of the Hessian.
Here the 'Hessian' is the Hessian matrix (i.e. matrix of 2nd-derivatives) of the loss function with respect to its inputs. Typically this will be block-diagonal across different cases in the batch, since the loss function is typically summed across cases.
Note that B can be any matrix satisfying B * B^T = H where H is the Hessian, but will agree with the one used in the other methods of this class.
vector
: The vector to multiply. Must be of the shape given by the 'hessian_factor_inner_shape' property.The vector right-multiplied by B. Will be of the same shape(s) as the 'inputs' property.
multiply_hessian_factor_replicated_one_hot
multiply_hessian_factor_replicated_one_hot(index)
Right-multiply a replicated-one-hot vector by a factor B of the Hessian.
Here the 'Hessian' is the Hessian matrix (i.e. matrix of 2nd-derivatives) of the loss function with respect to its inputs. Typically this will be block-diagonal across different cases in the batch, since the loss function is typically summed across cases.
A 'replicated-one-hot' vector means a tensor which, for each slice along the batch dimension (assumed to be dimension 0), is 1.0 in the entry corresponding to the given index and 0 elsewhere.
Note that B can be any matrix satisfying B * B^T = H where H is the Hessian, but will agree with the one used in the other methods of this class.
index
: A tuple representing in the index of the entry in each slice that is 1.0. Note that len(index) must be equal to the number of elements of the 'hessian_factor_inner_shape' tensor minus one.The vector right-multiplied by B^T. Will be of the same shape(s) as the 'inputs' property.
multiply_hessian_factor_transpose
multiply_hessian_factor_transpose(vector)
Right-multiply a vector by the transpose of a factor B of the Hessian.
Here the 'Hessian' is the Hessian matrix (i.e. matrix of 2nd-derivatives) of the loss function with respect to its inputs. Typically this will be block-diagonal across different cases in the batch, since the loss function is typically summed across cases.
Note that B can be any matrix satisfying B * B^T = H where H is the Hessian, but will agree with the one used in the other methods of this class.
vector
: The vector to multiply. Must be the same shape(s) as the 'inputs' property.The vector right-multiplied by B^T. Will be of the shape given by the 'hessian_factor_inner_shape' property.
sample
sample(seed)
Sample 'targets' from the underlying distribution.
© 2018 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/kfac/loss_functions/CategoricalLogitsNegativeLogProbLoss