vek

September 6, 2025 ยท View on GitHub

Build Status Go Reference

vek is a collection of SIMD accelerated vector functions for Go.

Most modern CPUs have special SIMD instructions (Single Instruction, Multiple Data) to process data in parallel, but there is currently no way to use them in a pure Go program. vek implements a large number of common vector operations in SIMD accelerated assembly code and wraps them in a simple Go API. vek supports most modern x86 CPUs and falls back to a pure Go implementation on unsupported platforms.

Features

  • Fast, average speedups of 10x for float32 vectors
  • Fallback to pure Go on unsupported platforms
  • Support for float64, float32 and bool vectors
  • Zero allocation variations of each function

Installation

go get -u github.com/viterin/vek

Getting Started

Simple Arithmetic Example

Vectors are represented as plain old floating point slices, there are no special data types in vek. All operations on float64 vectors reside in the vek package. It contains all the basic arithmetic operations:

package main

import (
	"fmt"
	"github.com/viterin/vek"
)

func main() {
	x := []float64{0, 1, 2, 3, 4}

	// Multiply a vector by itself element-wise
	y := vek.Mul(x, x)
	fmt.Println(x, y) // [0 1 2 3 4] [0 1 4 9 16]

	// Multiply each element by a number
	y = vek.MulNumber(x, 2)
	fmt.Println(x, y) // [0 1 2 3 4] [0 2 4 6 8]
}

Working With 32-Bit Vectors

The vek32 package contains float32 versions of each operation:

package main

import (
	"fmt"
	"github.com/viterin/vek/vek32"
)

func main() {
	// Add a float32 number to each element
	x := []float32{0, 1, 2, 3, 4}
	y := vek32.AddNumber(x, 2)

	fmt.Println(x, y) // [0 1 2 3 4] [2 3 4 5 6]
}

Comparisons and Selections

Floating point vectors can be compared to other vectors or numbers. The result is a bool vector indicating where the comparison holds true. bool vectors can be used to select matching elements, count matches and more:

package main

import (
	"fmt"
	"github.com/viterin/vek"
)

func main() {
	x := []float64{0, 1, 2, 3, 4, 5}
	y := []float64{5, 4, 3, 2, 1, 0}

	// []bool indicating where x < y (less than)
	m := vek.Lt(x, y)
	fmt.Println(m)            // [true true true false false false]
	fmt.Println(vek.Count(m)) // 3

	// []bool indicating where x >= 2 (greater than or equal)
	m = vek.GteNumber(x, 2)
	fmt.Println(m)          // [false false true true true true]
	fmt.Println(vek.Any(m)) // true

	// Selection of non-zero elements less than y
	z := vek.Select(x,
		vek.And(
			vek.Lt(x, y),
			vek.NeqNumber(x, 0),
		),
	)
	fmt.Println(z) // [1 2]
}

Creating and Converting Vectors

vek has a number of functions to construct new vectors and convert between vector types efficiently:

package main

import (
	"fmt"
	"github.com/viterin/vek"
	"github.com/viterin/vek/vek32"
)

func main() {
	// Vector with number repeated n times
	x := vek.Repeat(2, 5)
	fmt.Println(x) // [2 2 2 2 2]

	// Vector ranging from a to b (excl.) in steps of 1
	x = vek.Range(-2, 3)
	fmt.Println(x) // [-2 -1 0 1 2]

	// Conversion from float64 to int32
	xi32 := vek.ToInt32(x)
	fmt.Println(xi32) // [-2 -1 0 1 2]

	// Conversion from int32 to float32
	x32 := vek32.FromInt32(xi32)
	fmt.Println(x32) // [-2 -1 0 1 2]
}

Avoiding Allocations

By default, functions allocate a new array to store the result. Append _Inplace to a function to do the operation inplace, overriding the data of the first argument slice with the result. Append _Into to write the result into a target slice.

package main

import (
	"fmt"
	"github.com/viterin/vek"
)

func main() {
	x := []float64{0, 1, 2, 3, 4}
	vek.AddNumber_Inplace(x, 2)

	y := make([]float64, len(x))
	vek.AddNumber_Into(y, x, 2)

	fmt.Println(x, y) // [2 3 4 5 6] [4 5 6 7 8]
}

SIMD Acceleration

SIMD Acceleration is enabled by default on supported platforms, which is any x86/amd64 CPU with the AVX2 and FMA extensions. Use vek.Info() to see if hardware acceleration is enabled. Turn it off or on with vek.SetAcceleration(). Acceleration is currently disabled by default on mac as I have no machine to test it on.

package main

import (
	"fmt"
	"github.com/viterin/vek"
)

func main() {
	fmt.Printf("%+v", vek.Info())
	// {CPUArchitecture:amd64 CPUFeatures:[AVX2 FMA ..] Acceleration:true}
}

API

description
Arithmetic
vek.Add(x, y)element-wise addition
vek.AddNumber(x, a)add number to each element
vek.Sub(x, y)element-wise subtraction
vek.SubNumber(x, a)subtract number from each element
vek.Mul(x, y)element-wise multiplication
vek.MulNumber(x, a)multiply each element by number
vek.Div(x, y)element-wise division
vek.DivNumber(x, a)divide each element by number
vek.Abs(x)absolute values
vek.Neg(x)additive inverses
vek.Inv(x)multiplicative inverses
Aggregates
vek.Sum(x)sum of elements
vek.CumSum(x)cumulative sum
vek.Prod(x)product of elements
vek.CumProd(x)cumulative product
vek.Mean(x)mean
vek.Median(x)median
vek.Quantile(x, q)q-th quantile, 0 <= q <= 1
Distance
vek.Dot(x, y)dot product
vek.Norm(x)euclidean norm (length)
vek.Distance(x, y)euclidean distance
vek.ManhattanNorm(x)sum of absolute values
vek.ManhattanDistance(x, y)sum of absolute differences
vek.CosineSimilarity(x, y)cosine similarity
Matrices
vek.MatMul(x, y, n)multiply m-by-n and n-by-p matrix (row-major)
vek.Mat4Mul(x, y)specialization for 4 by 4 matrices
Special
vek.Sqrt(x)square root of each element
vek.Pow(x, y)element-wise power
vek.Round(x), Floor(x), Ceil(x)round to nearest, lesser or greater integer
Special (32-bit only)
vek32.Sin(x)sine of each element
vek32.Cos(x)cosine of each element
vek32.Exp(x)exponential function
vek32.Log(x), Log2(x), Log10(x)natural, base 2 and base 10 logarithms
Comparison
vek.Min(x)minimum value
vek.ArgMin(x)first index of the minimum value
vek.Minimum(x, y)element-wise minimum values
vek.MinimumNumber(x, a)minimum of each element and number
vek.Max(x)maximum value
vek.ArgMax(x)first index of the maximum value
vek.Maximum(x, y)element-wise maximum values
vek.MaximumNumber(x, a)maximum of each element and number
vek.Find(x, a)first index of number, -1 if not found
vek.Lt(x, y)element-wise less than
vek.LtNumber(x, a)less than number
vek.Lte(x, y)element-wise less than or equal
vek.LteNumber(x, a)less than or equal to number
vek.Gt(x, y)element-wise greater than
vek.GtNumber(x, a)greater than number
vek.Gte(x, y)element-wise greater than or equal
vek.GteNumber(x, a)greater than or equal to number
vek.Eq(x, y)element-wise equality
vek.EqNumber(x, a)equal to number
vek.Neq(x, y)element-wise non-equality
vek.NeqNumber(x, a)not equal to number
Boolean
vek.Not(x)element-wise not
vek.And(x, y)element-wise and
vek.Or(x, y)element-wise or
vek.Xor(x, y)element-wise exclusive or
vek.Select(x, y)select elements using boolean vector
vek.All(x)all bools are true
vek.Any(x)at least one bool is true
vek.None(x)none of the bools are true
vek.Count(x)number of true bools
Construction
vek.Zeros(n)vector of zeros
vek.Ones(n)vector of ones
vek.Repeat(a, n)vector with number repeated
vek.Range(a, b)vector from a to b (excl.) in steps of 1
vek.Gather(x, idx)select elements at given indices
vek.Scatter(x, idx, size)create vector with indices set to values
vek.FromBool(x), FromInt64, ..convert slice to floats
vek.ToBool(x), ToInt64, ..convert floats to other type

API Variations

vek32.xxx( .. )

The vek32 package contains identical functions for float32 vectors, e.g. vek32.Add(x, y).

vek.xxx_Inplace( .. )

Append _Inplace to the function name to mutate the argument vector inplace, e.g. vek.Add_Inplace(x, y). The first argument is the destination. It should not overlap other argument slices.

vek.xxx_Into( dst, .. )

Append _Into to the function name to write the result into a target slice, e.g. vek.Add_Into(dst, x, y). The destination should have sufficient capacity to hold the result, its length can be anything. It should not overlap other argument slices. The return value is the destination slice resized to the length of the result.

Notes

For maximum performance, most functions in this library were compiled from C++ using -ffast-math optimizations. This trades strict IEEE 754 floating-point compliance for speed, but assumes the floating point inputs are never NaN or Inf.

The behavior of these functions is undefined if you do generate NaN or Inf values. Furthermore, there can be minor differences in precision compared to standard Go math as a result of these optimizations.

Benchmarks

Comparison of SIMD accelerated functions to the pure Go fallback version for different size slices. Times are in nanoseconds. Functions are inplace.

go test -benchmem -timeout 0 -run=^# -bench=. ./internal/...

1k, Go1k, SIMD100k, Go100k, SIMDspeedup
vek.Add48419257,54426,4312x
vek32.Add61011684,87013,1646x
vek.Mul49918658,15426,9552x
vek32.Mul60712683,48613,0566x
vek.Abs794123120,01819,6806x
vek32.Abs73682113,4467,99014x
vek.Sum6333964,8246,8599x
vek32.Sum6312065,0073,19120x
vek.Quantile3,3753,075860,382816,8311x
vek32.Quantile3,3673,040751,790698,1111x
vek.Round1,485161250,31621,62211x
vek32.Round1,812102250,0359,72225x
vek.Sqrt1,900614326,99885,9864x
vek32.Sqrt1,704148247,94415,57115x
vek.Pow39,8336,1374,155,465776,5565x
vek32.Pow30,3862,0914,070,793292,98014x
vek32.Exp7,1773751,120,30049,69422x
vek32.Log4,6634531,017,24065,04216x
vek.Max7346243,4127,5686x
vek32.Max7312744,3493,48413x
vek.Maximum1,000517542,94466,4238x
vek32.Maximum873499556,10366,7868x
vek.Find2947721,9897,2563x
vek32.Find2233521,8133,0107x
vek.Lt54319564,13623,5483x
vek32.Lt53913062,44913,1885x
vek.And1,17260373,0772,683139x
vek.All2371121,69673829x
vek.Range6475965,4037,8898x
vek32.Range6333265,1553,25220x
vek.FromInt323355633,41011,4283x
vek32.FromInt324392944,3727,4236x
m=1k,n=1k,p=1, Gom=1k,n=1k,p=1, SIMDp=1k, Gop=1k, SIMDspeedup
vek.MatMul258,41838,835152,726,51220,823,9627x
vek32.MatMul256,45328,403147,474,08310,479,83414x
m=4,n=4,p=4, Gom=4,n=4,p=4, SIMD
vek.Mat4Mul2655x
vek32.Mat4Mul2655x