bayesian_torch.layers module
January 26, 2022 · View on GitHub
A set of Bayesian neural network layers to perform stochastic variational inference
- Variational layers with reparameterized Monte Carlo estimators [Blundell et al. 2015]
- Variational layers with Flipout Monte Carlo estimators [Wen et al. 2018]
Layers
class BaseVariationalLayer_(torch.nn.Module)
Abstract class which inherits from torch.nn.Module
kl_div(mu_q, sigma_q, mu_p, sigma_p)
Calculates the Kullback-Leibler divergence from distribution normal Q (parametrized mu_q, sigma_q) to distribution normal P (parametrized mu_p, sigma_p)
Parameters:
- mu_q: torch.Tensor -> mu parameter of distribution Q
- sigma_q: torch.Tensor -> sigma parameter of distribution Q
- mu_p: float -> mu parameter of distribution P
- sigma_p: float -> sigma parameter of distribution P
Returns
torch.Tensor of shape 0
class LinearReparameterization
bayesian_torch.layers.LinearReparameterization(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_features: int -> size of each input sample,
- out_features: int -> size of each output sample,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterization and performs torch.nn.functional.linear.
Parameters:
- X: torch.Tensor with shape
(batch_size, in_features)
Returns:
- torch.Tensor with shape =
(X.shape[0], out_features), float corresponding to KL divergence from the samples weights distribution to the prior
class Conv1dReparameterization
bayesian_torch.layers.Conv1dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv1d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class Conv2dReparameterization
bayesian_torch.layers.Conv2dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv2d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class Conv3dReparameterization
bayesian_torch.layers.Conv3dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv3d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W, L)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class ConvTranspose1dReparameterization
bayesian_torch.layers.ConvTranspose1dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose1d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class ConvTranspose2dReparameterization
bayesian_torch.layers.ConvTranspose2dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose2d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class ConvTranspose3dReparameterization
bayesian_torch.layers.ConvTranspose3dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose3d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W, L)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class LSTMReparameterization
bayesian_torch.layers.LSTMReparameterization(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_features: int -> size of each input sample,
- out_features: int -> size of each output sample,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X, hidden_states=None)
Samples the weights with reparameterzation and performs LSTM feedforward operation.
Parameters:
- X: torch.Tensor with shape
(batch_size, in_features) - hidden_states: None or tuple (torch.Tensor with shape =
(X.shape[0], seq_len, out_features), torch.Tensor with shape =(X.shape[0], seq_len, out_features))
Returns:
- tuple: (torch.Tensor with shape =
(X.shape[0], seq_len, out_features), tuple (torch.Tensor with shape =(X.shape[0], seq_len, out_features), torch.Tensor with shape =(X.shape[0], seq_len, out_features))) , float corresponding to KL divergence from the samples weights distribution to the prior
class LinearFlipout
bayesian_torch.layers.LinearFlipout(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_features: int -> size of each input sample,
- out_features: int -> size of each output sample,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with flipout reparameterzation and performs torch.nn.functional.linear.
Parameters:
- X: torch.Tensor with shape
(batch_size, in_features)
Returns:
- torch.Tensor with shape =
(X.shape[0], out_features), float corresponding to KL divergence from the samples weights distribution to the prior
class Conv1dFlipout
bayesian_torch.layers.Conv1dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with flipout reparameterzation and performs torch.nn.functional.conv1d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class Conv2dFlipout
bayesian_torch.layers.Conv2dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with flipout reparameterzation and performs torch.nn.functional.conv2d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class Conv3dFlipout
bayesian_torch.layers.Conv3dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with flipout reparameterzation and performs torch.nn.functional.conv3d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W, L)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class ConvTranspose1dFlipout
bayesian_torch.layers.ConvTranspose1dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose1d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class ConvTranspose2dFlipout
bayesian_torch.layers.ConvTranspose2dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose2d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class ConvTranspose3dFlipout
bayesian_torch.layers.ConvTranspose3dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_channels: int -> number of channels in the input image,
- out_channels: int -> number of channels produced by the convolution,
- kernel_size: int -> size of the convolving kernel,
- stride: int -> stride of the convolution. Default: 1,
- padding: int -> zero-padding added to both sides of the input. Default: 0,
- dilation: int -> spacing between kernel elements. Default: 1,
- groups: int -> number of blocked connections from input channels to output channels,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X)
Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose3d. Check PyTorch official documentation for tensor output shape.
Parameters:
- X: torch.Tensor with shape
(batch_size, C, H, W, L)
Returns:
- torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior
class LSTMFlipout
bayesian_torch.layers.LSTMFlipout(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)
Parameters:
- in_features: int -> size of each input sample,
- out_features: int -> size of each output sample,
- prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
- prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
- posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
- posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
- bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,
forward(X, hidden_states=None)
Samples the weights with Flipout and performs LSTM feedforward operation.
Parameters:
- X: torch.Tensor with shape
(batch_size, in_features) - hidden_states: None or tuple (torch.Tensor with shape =
(X.shape[0], seq_len, out_features), torch.Tensor with shape =(X.shape[0], seq_len, out_features))
Returns:
- tuple: (torch.Tensor with shape =
(X.shape[0], seq_len, out_features), tuple (torch.Tensor with shape =(X.shape[0], seq_len, out_features), torch.Tensor with shape =(X.shape[0], seq_len, out_features))) , float corresponding to KL divergence from the samples weights distribution to the prior