torch.nn.init
Created On: Jun 11, 2019 | Last Updated On: Jul 07, 2022
Warning
All the functions in this module are intended to be used to initialize neural network parameters, so they all run in torch.no_grad() mode and will not be taken into account by autograd.
-
torch.nn.init.calculate_gain(nonlinearity, param=None)[source] -
Return the recommended gain value for the given nonlinearity function.
The values are as follows:
nonlinearity
gain
Linear / Identity
Conv{1,2,3}D
Sigmoid
Tanh
ReLU
Leaky Relu
SELU
Warning
In order to implement Self-Normalizing Neural Networks , you should use
nonlinearity='linear'instead ofnonlinearity='selu'. This gives the initial weights a variance of1 / N, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain forSELUsacrifices the normalization effect for more stable gradient flow in rectangular layers.- Parameters
-
-
nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (
nn.functionalname) - param (Optional[Union[int, float]]) – optional parameter for the non-linear function
-
nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (
- Return type
Examples
>>> gain = nn.init.calculate_gain( ... "leaky_relu", 0.2 ... ) # leaky_relu with negative_slope=0.2
-
torch.nn.init.uniform_(tensor, a=0.0, b=1.0, generator=None)[source] -
Fill the input Tensor with values drawn from the uniform distribution.
.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.uniform_(w)
-
torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)[source] -
Fill the input Tensor with values drawn from the normal distribution.
.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.normal_(w)
-
torch.nn.init.constant_(tensor, val)[source] -
Fill the input Tensor with the value .
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.constant_(w, 0.3)
-
torch.nn.init.ones_(tensor)[source] -
Fill the input Tensor with the scalar value
1.Examples
>>> w = torch.empty(3, 5) >>> nn.init.ones_(w)
-
torch.nn.init.zeros_(tensor)[source] -
Fill the input Tensor with the scalar value
0.Examples
>>> w = torch.empty(3, 5) >>> nn.init.zeros_(w)
-
torch.nn.init.eye_(tensor)[source] -
Fill the 2-dimensional input
Tensorwith the identity matrix.Preserves the identity of the inputs in
Linearlayers, where as many inputs are preserved as possible.Examples
>>> w = torch.empty(3, 5) >>> nn.init.eye_(w)
-
torch.nn.init.dirac_(tensor, groups=1)[source] -
Fill the {3, 4, 5}-dimensional input
Tensorwith the Dirac delta function.Preserves the identity of the inputs in
Convolutionallayers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity- Parameters
- Return type
Examples
>>> w = torch.empty(3, 16, 5, 5) >>> nn.init.dirac_(w) >>> w = torch.empty(3, 24, 5, 5) >>> nn.init.dirac_(w, 3)
-
torch.nn.init.xavier_uniform_(tensor, gain=1.0, generator=None)[source] -
Fill the input
Tensorwith values using a Xavier uniform distribution.The method is described in
Understanding the difficulty of training deep feedforward neural networks- Glorot, X. & Bengio, Y. (2010). The resulting tensor will have values sampled from whereAlso known as Glorot initialization.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain("relu"))
-
torch.nn.init.xavier_normal_(tensor, gain=1.0, generator=None)[source] -
Fill the input
Tensorwith values using a Xavier normal distribution.The method is described in
Understanding the difficulty of training deep feedforward neural networks- Glorot, X. & Bengio, Y. (2010). The resulting tensor will have values sampled from whereAlso known as Glorot initialization.
- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.xavier_normal_(w)
-
torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu', generator=None)[source] -
Fill the input
Tensorwith values using a Kaiming uniform distribution.The method is described in
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification- He, K. et al. (2015). The resulting tensor will have values sampled from whereAlso known as He initialization.
- Parameters
-
-
tensor (Tensor) – an n-dimensional
torch.Tensor -
a (float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu') -
mode (Literal['fan_in', 'fan_out']) – either
'fan_in'(default) or'fan_out'. Choosing'fan_in'preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'preserves the magnitudes in the backwards pass. -
nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (
nn.functionalname), recommended to use only with'relu'or'leaky_relu'(default). - generator (Optional[Generator]) – the torch Generator to sample from (default: None)
-
tensor (Tensor) – an n-dimensional
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.kaiming_uniform_(w, mode="fan_in", nonlinearity="relu")
Note
Be aware that
fan_inandfan_outare calculated assuming that the weight matrix is used in a transposed manner, (i.e.,x @ w.TinLinearlayers, wherew.shape = [fan_out, fan_in]). This is important for correct initialization. If you plan to usex @ w, wherew.shape = [fan_in, fan_out], pass in a transposed weight matrix, i.e.nn.init.kaiming_uniform_(w.T, ...).
-
torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu', generator=None)[source] -
Fill the input
Tensorwith values using a Kaiming normal distribution.The method is described in
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification- He, K. et al. (2015). The resulting tensor will have values sampled from whereAlso known as He initialization.
- Parameters
-
-
tensor (Tensor) – an n-dimensional
torch.Tensor -
a (float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu') -
mode (Literal['fan_in', 'fan_out']) – either
'fan_in'(default) or'fan_out'. Choosing'fan_in'preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'preserves the magnitudes in the backwards pass. -
nonlinearity (Literal['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d', 'sigmoid', 'tanh', 'relu', 'leaky_relu', 'selu']) – the non-linear function (
nn.functionalname), recommended to use only with'relu'or'leaky_relu'(default). - generator (Optional[Generator]) – the torch Generator to sample from (default: None)
-
tensor (Tensor) – an n-dimensional
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.kaiming_normal_(w, mode="fan_out", nonlinearity="relu")
Note
Be aware that
fan_inandfan_outare calculated assuming that the weight matrix is used in a transposed manner, (i.e.,x @ w.TinLinearlayers, wherew.shape = [fan_out, fan_in]). This is important for correct initialization. If you plan to usex @ w, wherew.shape = [fan_in, fan_out], pass in a transposed weight matrix, i.e.nn.init.kaiming_normal_(w.T, ...).
-
torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)[source] -
Fill the input Tensor with values drawn from a truncated normal distribution.
The values are effectively drawn from the normal distribution with values outside redrawn until they are within the bounds. The method used for generating the random values works best when .
- Parameters
-
-
tensor (Tensor) – an n-dimensional
torch.Tensor - mean (float) – the mean of the normal distribution
- std (float) – the standard deviation of the normal distribution
- a (float) – the minimum cutoff value
- b (float) – the maximum cutoff value
- generator (Optional[Generator]) – the torch Generator to sample from (default: None)
-
tensor (Tensor) – an n-dimensional
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.trunc_normal_(w)
-
torch.nn.init.orthogonal_(tensor, gain=1, generator=None)[source] -
Fill the input
Tensorwith a (semi) orthogonal matrix.Described in
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks- Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.- Parameters
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.orthogonal_(w)
-
torch.nn.init.sparse_(tensor, sparsity, std=0.01, generator=None)[source] -
Fill the 2D input
Tensoras a sparse matrix.The non-zero elements will be drawn from the normal distribution , as described in
Deep learning via Hessian-free optimization- Martens, J. (2010).- Parameters
-
-
tensor (Tensor) – an n-dimensional
torch.Tensor - sparsity (float) – The fraction of elements in each column to be set to zero
- std (float) – the standard deviation of the normal distribution used to generate the non-zero values
- generator (Optional[Generator]) – the torch Generator to sample from (default: None)
-
tensor (Tensor) – an n-dimensional
- Return type
Examples
>>> w = torch.empty(3, 5) >>> nn.init.sparse_(w, sparsity=0.1)