Tutorial: Neural Networks
May 8, 2026 · View on GitHub
VTL provides a high-level Sequential model API in vtl.nn.models that
builds on the autograd engine. This tutorial walks through building,
training, and evaluating a neural network from scratch.
Overview
The training workflow has four steps:
- Build — define the model architecture with
Sequential - Forward — run input through
model.forward(x) - Loss — compute
model.loss(y_pred, y_target) - Update — call
loss.backprop()thenoptimizer.update()
Building a model
import vtl.autograd
import vtl.nn.models
ctx := autograd.ctx[f64]()
mut model := models.sequential_from_ctx[f64](ctx)
model.input([2]) // 2 input features
model.linear(8) // hidden layer: 2 -> 8
model.relu() // non-linearity
model.linear(1) // output layer: 8 -> 1
model.mse_loss() // loss function
All available layers
| Layer method | Description |
|---|---|
linear(n) | Linear transformation y = x·Wᵀ + b |
relu() | Rectified Linear Unit |
leaky_relu() | Leaky ReLU |
elu() | Exponential Linear Unit |
sigmoid() | Logistic sigmoid |
tanh() | Hyperbolic tangent |
softmax() | Softmax over the last dimension |
gelu() | Gaussian Error Linear Unit |
swish() | Swish activation |
mish() | Mish activation |
flatten() | Flatten all non-batch dimensions |
maxpool2d(kernel, padding, stride) | 2D max pooling |
avgpool2d(kernel, padding, stride) | 2D average pooling |
global_avgpool2d() | Global average pooling (per channel) |
conv2d(in, out, kernel_size, config) | 2D convolution |
batchnorm1d(num_features, config) | 1D batch normalisation |
layer_norm(normalized_shape, config) | Layer normalisation |
embedding(vocab_size, embed_dim) | Token embedding (integer indices → vectors) |
lstm(input_size, hidden_size, num_layers) | Long Short-Term Memory layer |
multihead_attention(embed_dim, num_heads) | Multi-head self-attention |
positional_encoding(embed_dim, max_len) | Sinusoidal positional encoding |
dropout() | Dropout (eval mode: no-op) |
All available losses
| Loss method | Class | Description |
|---|---|---|
mse_loss() | MSELoss | Mean Squared Error |
bce_loss() | BCELoss | Binary Cross-Entropy (per-element) |
sigmoid_cross_entropy_loss() | SigmoidCrossEntropyLoss | Sigmoid CE, numerically stable |
softmax_cross_entropy_loss() | SoftmaxCrossEntropyLoss | Softmax CE for multi-class |
cross_entropy_loss() | CrossEntropyLoss | Cross-Entropy |
huber_loss(delta: 1.0) | HuberLoss | Huber (smooth L1) loss |
nll_loss(weight) | NLLLoss | Negative Log Likelihood |
kl_div_loss() | KLDivLoss | KL Divergence D_KL(P‖Q) |
All available optimizers
| Optimizer | Module | Key feature |
|---|---|---|
adam_optimizer(lr, ...) | optimizers | Adaptive moment estimation |
adamw(...) | optimizers | Adam + decoupled weight decay |
rmsprop(...) | optimizers | Per-parameter learning rates |
adagrad(...) | optimizers | Accumulates squared grads |
sgd(...) | optimizers | Vanilla stochastic gradient descent |
See TUTORIAL_OPTIMIZERS.md for full optimizer details and scheduler usage.
Training loop
// assumes model, x, y_target defined above
mut optimizer := optimizers.adam_optimizer[f64](learning_rate: 0.001)
optimizer.build_params(model.info.layers)
for epoch in 0 .. 100 {
y_pred := model.forward(x)!
mut loss := model.loss(y_pred, y_target)!
println('Epoch ${epoch}: loss = ${loss.value.get([0]):.4f}')
loss.backprop()!
optimizer.update()!
}
Mini-batch training
For large datasets, split the data into mini-batches and loop over them
inside each epoch. Use .slice() to extract a batch:
// assumes model, x_all, y_tensor, n_batches, batch_size defined above
for b in 0 .. n_batches {
offset := b * batch_size
mut x_batch := x_all.slice([offset, offset + batch_size])!
y_batch := y_tensor.slice([offset, offset + batch_size])!
y_pred := model.forward(x_batch)!
mut loss := model.loss(y_pred, y_batch)!
loss.backprop()!
optimizer.update()!
}
Reproducibility
VTL initialises weights using the global random seed. Call rand.seed before
creating the autograd context to get reproducible results:
import rand
import vtl.autograd
rand.seed([u32(42), u32(0)])
ctx := autograd.ctx[f64]()
_ = ctx
Practical tips
Softmax cross-entropy (multi-class classification)
- Use interleaved class ordering (
class_id = i % n_classes) so every mini-batch sees all classes. - Keep
batch_size >= n_classes. - Use
learning_rate = 0.01andbatch_size = 6+for stable training.
MSE regression
- Full-batch gradient descent (all samples at once) works well for small datasets.
- Use
learning_rate = 0.001and at least 60 epochs.
Sigmoid cross-entropy (binary classification)
- Large batch sizes (32+) help — see the XOR example.
learning_rate = 0.01is a reliable starting point.
Examples
| Example | Task | Loss |
|---|---|---|
nn_xor | XOR binary classification | Sigmoid CE |
nn_regression_sine | sin(x) regression | MSE |
nn_simple_two_layer | Random target fitting | MSE |
nn_multiclass_iris | 3-class classification | Softmax CE |
nn_autoencoder_simple | Reconstruction | MSE |
See also
- Autograd Tutorial — how gradients are computed
- Optimizers Tutorial — Adam/AdamW/RMSProp/AdaGrad/SGD + schedulers
- First Steps — tensor creation and properties