bayesian_torch.layers module

January 26, 2022 · View on GitHub

A set of Bayesian neural network layers to perform stochastic variational inference

Variational layers with reparameterized Monte Carlo estimators [Blundell et al. 2015]
Variational layers with Flipout Monte Carlo estimators [Wen et al. 2018]

Layers

BaseVariationalLayer_
LinearReparameterization
Conv1dReparameterization
Conv2dReparameterization
Conv3dReparameterization
ConvTranspose1dReparameterization
ConvTranspose2dReparameterization
ConvTranspose3dReparameterization
LSTMReparameterization
LinearFlipout
Conv1dFlipout
Conv2dFlipout
Conv3dFlipout
ConvTranspose1dFlipout
ConvTranspose2dFlipout
ConvTranspose3dFlipout
LSTMFlipout

class BaseVariationalLayer_(torch.nn.Module)

Abstract class which inherits from torch.nn.Module

kl_div(mu_q, sigma_q, mu_p, sigma_p)

Calculates the Kullback-Leibler divergence from distribution normal Q (parametrized mu_q, sigma_q) to distribution normal P (parametrized mu_p, sigma_p)

Parameters:

mu_q: torch.Tensor -> mu parameter of distribution Q
sigma_q: torch.Tensor -> sigma parameter of distribution Q
mu_p: float -> mu parameter of distribution P
sigma_p: float -> sigma parameter of distribution P

Returns

torch.Tensor of shape 0

class LinearReparameterization

bayesian_torch.layers.LinearReparameterization(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_features: int -> size of each input sample,
out_features: int -> size of each output sample,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterization and performs torch.nn.functional.linear.

Parameters:

X: torch.Tensor with shape (batch_size, in_features)

Returns:

torch.Tensor with shape = (X.shape[0], out_features), float corresponding to KL divergence from the samples weights distribution to the prior

class Conv1dReparameterization

bayesian_torch.layers.Conv1dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv1d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class Conv2dReparameterization

bayesian_torch.layers.Conv2dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv2d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class Conv3dReparameterization

bayesian_torch.layers.Conv3dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv3d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W, L)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class ConvTranspose1dReparameterization

bayesian_torch.layers.ConvTranspose1dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose1d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class ConvTranspose2dReparameterization

bayesian_torch.layers.ConvTranspose2dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose2d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class ConvTranspose3dReparameterization

bayesian_torch.layers.ConvTranspose3dReparameterization(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose3d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W, L)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class LSTMReparameterization

bayesian_torch.layers.LSTMReparameterization(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_features: int -> size of each input sample,
out_features: int -> size of each output sample,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X, hidden_states=None)

Samples the weights with reparameterzation and performs LSTM feedforward operation.

Parameters:

X: torch.Tensor with shape (batch_size, in_features)
hidden_states: None or tuple (torch.Tensor with shape = (X.shape[0], seq_len, out_features), torch.Tensor with shape = (X.shape[0], seq_len, out_features))

Returns:

tuple: (torch.Tensor with shape = (X.shape[0], seq_len, out_features), tuple (torch.Tensor with shape = (X.shape[0], seq_len, out_features), torch.Tensor with shape = (X.shape[0], seq_len, out_features))) , float corresponding to KL divergence from the samples weights distribution to the prior

class LinearFlipout

bayesian_torch.layers.LinearFlipout(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_features: int -> size of each input sample,
out_features: int -> size of each output sample,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with flipout reparameterzation and performs torch.nn.functional.linear.

Parameters:

X: torch.Tensor with shape (batch_size, in_features)

Returns:

torch.Tensor with shape = (X.shape[0], out_features), float corresponding to KL divergence from the samples weights distribution to the prior

class Conv1dFlipout

bayesian_torch.layers.Conv1dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with flipout reparameterzation and performs torch.nn.functional.conv1d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class Conv2dFlipout

bayesian_torch.layers.Conv2dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with flipout reparameterzation and performs torch.nn.functional.conv2d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class Conv3dFlipout

bayesian_torch.layers.Conv3dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with flipout reparameterzation and performs torch.nn.functional.conv3d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W, L)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class ConvTranspose1dFlipout

bayesian_torch.layers.ConvTranspose1dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose1d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class ConvTranspose2dFlipout

bayesian_torch.layers.ConvTranspose2dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose2d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class ConvTranspose3dFlipout

bayesian_torch.layers.ConvTranspose3dFlipout(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_channels: int -> number of channels in the input image,
out_channels: int -> number of channels produced by the convolution,
kernel_size: int -> size of the convolving kernel,
stride: int -> stride of the convolution. Default: 1,
padding: int -> zero-padding added to both sides of the input. Default: 0,
dilation: int -> spacing between kernel elements. Default: 1,
groups: int -> number of blocked connections from input channels to output channels,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X)

Samples the weights with reparameterzation and performs torch.nn.functional.conv_transpose3d. Check PyTorch official documentation for tensor output shape.

Parameters:

X: torch.Tensor with shape (batch_size, C, H, W, L)

Returns:

torch.Tensor, float corresponding to KL divergence from the samples weights distribution to the prior

class LSTMFlipout

bayesian_torch.layers.LSTMFlipout(in_features, out_features, prior_mean, prior_variance, posterior_mu_init, posterior_rho_init, bias=True)

Parameters:

in_features: int -> size of each input sample,
out_features: int -> size of each output sample,
prior_mean: float -> mean of the prior arbitrary distribution to be used on the complexity cost,
prior_variance: float -> variance of the prior arbitrary distribution to be used on the complexity cost,
posterior_mu_init: float -> init trainable mu parameter representing mean of the approximate posterior,
posterior_rho_init: float -> init trainable rho parameter representing the sigma of the approximate posterior through softplus σ = log(1 + exp(ρ)),
bias: bool -> if set to False, the layer will not learn an additive bias. Default: True,

forward(X, hidden_states=None)

Samples the weights with Flipout and performs LSTM feedforward operation.

Parameters:

X: torch.Tensor with shape (batch_size, in_features)
hidden_states: None or tuple (torch.Tensor with shape = (X.shape[0], seq_len, out_features), torch.Tensor with shape = (X.shape[0], seq_len, out_features))

Returns:

tuple: (torch.Tensor with shape = (X.shape[0], seq_len, out_features), tuple (torch.Tensor with shape = (X.shape[0], seq_len, out_features), torch.Tensor with shape = (X.shape[0], seq_len, out_features))) , float corresponding to KL divergence from the samples weights distribution to the prior