Datatypes Guide

May 14, 2026 ยท View on GitHub

Complete guide to HDF5 datatype mapping and Go type conversion


๐Ÿ“š Table of Contents


๐ŸŽฏ Overview

HDF5 uses its own type system that maps to native types in different programming languages. This library provides automatic conversion between HDF5 types and Go types.

Type Categories

CategoryHDF5 ClassGo RepresentationReadWrite
Fixed-pointH5T_INTEGERint8-64, uint8-64โœ…โœ…
Floating-pointH5T_FLOATfloat32, float64โœ…โœ…
StringH5T_STRINGstring, []stringโœ…โœ…
CompoundH5T_COMPOUNDmap[string]interface{}โœ…โœ…
ArrayH5T_ARRAY[N]T (fixed arrays)โœ…โœ…
EnumH5T_ENUMNamed integer constantsโœ…โœ…
ReferenceH5T_REFERENCEuint64, [12]byteโœ…โœ…
OpaqueH5T_OPAQUE[]byte with tagโœ…โœ…
TimeH5T_TIME-โŒโŒ Deprecated

๐Ÿ”ข Numeric Types

Integer Types

32-bit Signed Integer

HDF5 Types:

  • H5T_STD_I32LE (little-endian)
  • H5T_STD_I32BE (big-endian)
  • H5T_NATIVE_INT (platform-native, 32-bit)

Go Type: int32

Example:

// HDF5 file contains int32 dataset
data, err := ds.Read()  // Returns []float64

// Or preserve original type information
info, _ := ds.Info()
// info shows: "Datatype: int32"

// Value conversion: int32 โ†’ float64
// Example: 42 (int32) becomes 42.0 (float64)

Range: -2,147,483,648 to 2,147,483,647

64-bit Signed Integer

HDF5 Types:

  • H5T_STD_I64LE (little-endian)
  • H5T_STD_I64BE (big-endian)
  • H5T_NATIVE_LLONG (platform-native, 64-bit)

Go Type: int64

Example:

data, err := ds.Read()  // Returns []float64

// Value conversion: int64 โ†’ float64
// Example: 9223372036854775807 (int64) becomes 9.223372036854776e+18 (float64)

Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Precision Note: When converting int64 to float64, integers larger than 2532^{53} (9,007,199,254,740,992) may lose precision due to float64's mantissa limitations.

8-bit Signed Integer

HDF5 Types:

  • H5T_STD_I8LE (little-endian)
  • H5T_STD_I8BE (big-endian)
  • H5T_NATIVE_INT8 (platform-native)

Go Type: int8

Range: -128 to 127

16-bit Signed Integer

HDF5 Types:

  • H5T_STD_I16LE (little-endian)
  • H5T_STD_I16BE (big-endian)
  • H5T_NATIVE_INT16 (platform-native)

Go Type: int16

Range: -32,768 to 32,767

Unsigned Integers

Status: Fully supported (all widths)

HDF5 Types:

  • H5T_STD_U8LE/BE (uint8)
  • H5T_STD_U16LE/BE (uint16)
  • H5T_STD_U32LE/BE (uint32)
  • H5T_STD_U64LE/BE (uint64)

Go Conversion: Read as native unsigned integers, converted to float64 by Dataset.Read().

Precision Note: uint64 values above 2532^{53} (9,007,199,254,740,992) may lose precision when converted to float64.

Floating-Point Types

32-bit Float (Single Precision)

HDF5 Types:

  • H5T_IEEE_F32LE (little-endian)
  • H5T_IEEE_F32BE (big-endian)
  • H5T_NATIVE_FLOAT (platform-native)

Go Type: float32

Precision: ~7 decimal digits

Example:

data, err := ds.Read()  // Returns []float64

// Value conversion: float32 โ†’ float64
// Example: 3.14159265f (float32) becomes 3.1415927410125732 (float64)

Range: ยฑ1.18e-38 to ยฑ3.40e+38

64-bit Float (Double Precision)

HDF5 Types:

  • H5T_IEEE_F64LE (little-endian)
  • H5T_IEEE_F64BE (big-endian)
  • H5T_NATIVE_DOUBLE (platform-native)

Go Type: float64

Precision: ~15 decimal digits

Example:

data, err := ds.Read()  // Returns []float64 (native)

// No conversion needed
// Example: 3.141592653589793 (float64) stays exact

Range: ยฑ2.23e-308 to ยฑ1.80e+308

Numeric Type Conversion Summary

HDF5 TypeSizeGo Read TypeConversion
H5T_STD_I8LE/BE1 bytefloat64int8 โ†’ float64
H5T_STD_I16LE/BE2 bytesfloat64int16 โ†’ float64
H5T_STD_I32LE/BE4 bytesfloat64int32 โ†’ float64
H5T_STD_I64LE/BE8 bytesfloat64int64 โ†’ float64
H5T_STD_U8LE/BE1 bytefloat64uint8 โ†’ float64
H5T_STD_U16LE/BE2 bytesfloat64uint16 โ†’ float64
H5T_STD_U32LE/BE4 bytesfloat64uint32 โ†’ float64
H5T_STD_U64LE/BE8 bytesfloat64uint64 โ†’ float64 *
H5T_IEEE_F32LE/BE4 bytesfloat64float32 โ†’ float64
H5T_IEEE_F64LE/BE8 bytesfloat64No conversion

* uint64 values above 2532^{53} may lose precision in float64.


๐Ÿ“ String Types

Fixed-Length Strings

HDF5 Type: H5T_STRING with fixed size

Padding Strategies:

  1. Null-terminated (C-style): "hello\0\0\0"
  2. Null-padded: "hello\0\0\0"
  3. Space-padded: "hello "

Go Type: string

Automatic Handling: The library automatically strips padding.

Example:

// HDF5 file has fixed-length string dataset
strings, err := ds.ReadStrings()  // Returns []string

// Padding is automatically removed:
// HDF5 bytes: "hello\0\0\0" โ†’ Go string: "hello"
// HDF5 bytes: "world   "   โ†’ Go string: "world"

Python h5py equivalent:

# Creating fixed-length strings in Python
import h5py
import numpy as np

with h5py.File('strings.h5', 'w') as f:
    # Null-terminated
    dt = h5py.string_dtype(encoding='ascii', length=20)
    f.create_dataset('names', data=[b'Alice', b'Bob'], dtype=dt)

Variable-Length Strings

HDF5 Type: H5T_STRING with variable size

Storage: Global Heap (separate area in HDF5 file)

Go Type: string

Example:

// HDF5 file has variable-length string dataset
strings, err := ds.ReadStrings()  // Returns []string

// Strings can have different lengths:
// ["short", "a much longer string", "x"]

Python h5py equivalent:

import h5py

with h5py.File('vlen_strings.h5', 'w') as f:
    # Variable-length strings
    dt = h5py.string_dtype(encoding='utf-8')
    f.create_dataset('messages', data=["Hello", "World!"], dtype=dt)

Character Sets

EncodingStatusNotes
ASCIIโœ… FullStandard ASCII (0-127)
UTF-8โœ… FullUnicode support

๐Ÿ—๏ธ Compound Types

Compound types are struct-like data with named fields (similar to C structs or Go structs).

Basic Compound Type

HDF5 Type: H5T_COMPOUND

Go Type: map[string]interface{}

Example HDF5 Structure:

Compound Type:
  - "temperature" : float64
  - "humidity"    : float64
  - "location"    : string (fixed-length, 20 bytes)

Reading Compound Data:

compounds, err := ds.ReadCompound()  // Returns []map[string]interface{}

for i, record := range compounds {
    fmt.Printf("Record %d:\n", i)

    // Access fields by name
    temp := record["temperature"].(float64)
    humid := record["humidity"].(float64)
    loc := record["location"].(string)

    fmt.Printf("  Temperature: %.1fยฐC\n", temp)
    fmt.Printf("  Humidity: %.1f%%\n", humid)
    fmt.Printf("  Location: %s\n", loc)
}

Output:

Record 0:
  Temperature: 25.3ยฐC
  Humidity: 65.2%
  Location: Lab A
Record 1:
  Temperature: 26.1ยฐC
  Humidity: 63.8%
  Location: Lab B

Nested Compound Types

Compound types can contain other compound types:

HDF5 Structure:

Compound Type "Measurement":
  - "timestamp" : int64
  - "sensor" : Compound {
      - "id" : int32
      - "name" : string
    }
  - "value" : float64

Reading Nested Compounds:

compounds, err := ds.ReadCompound()

for _, record := range compounds {
    timestamp := record["timestamp"].(int64)
    value := record["value"].(float64)

    // Nested compound
    sensor := record["sensor"].(map[string]interface{})
    sensorID := sensor["id"].(int32)
    sensorName := sensor["name"].(string)

    fmt.Printf("Sensor %d (%s) at %d: %.2f\n",
        sensorID, sensorName, timestamp, value)
}

Compound Type with Arrays

HDF5 Structure: ``$ \text{Compound} \text{Type}:

  • "\text{name}" : \text{string}
  • "\text{scores}" : \text{array} \text{of} 5 \times \text{float64} $``

Status: โœ… Fully supported (including array fields within compounds).

Creating Compounds in Python

For testing or reference:

import h5py
import numpy as np

# Define compound datatype
dt = np.dtype([
    ('temperature', 'f8'),      # float64
    ('humidity', 'f8'),         # float64
    ('location', 'S20')         # fixed-length string (20 bytes)
])

# Create data
data = np.array([
    (25.3, 65.2, b'Lab A'),
    (26.1, 63.8, b'Lab B'),
    (24.8, 67.5, b'Lab C')
], dtype=dt)

# Write to HDF5
with h5py.File('compounds.h5', 'w') as f:
    f.create_dataset('measurements', data=data)

๐Ÿ”„ Type Conversion Rules

Automatic Conversions

The library performs these conversions automatically:

From (HDF5)To (Go)Information Loss?
int32float64โœ… No (exact)
int64float64โš ๏ธ Yes (> 2532^{53})
float32float64โœ… No (promoted)
float64float64โœ… No (exact)
fixed stringstringโœ… No (padding removed)
variable stringstringโœ… No (exact)

Precision Considerations

Integer to Float Conversion

Safe Range (no precision loss):

  • int32: All values (max 2312^{31} << 2532^{53})
  • int64: -2532^{53} to 2532^{53} (ยฑ9,007,199,254,740,992)

Example of Precision Loss:

// int64 value in HDF5: 9223372036854775807 ($2^{63}$ - 1)
// Converted to float64: 9223372036854776000 (rounded)
// Lost precision: ~1000

// For most scientific data, this is acceptable
// If exact large integers needed, wait for v1.0.0 (direct int64 support)

Float32 to Float64 Conversion

Float32 values are promoted to float64 without precision loss (but representation changes):

// float32 in HDF5: 3.14159265f (stored as 0x40490FDB)
// Converted to float64: 3.1415927410125732 (0x400921FB60000000)
//                       ^^^^^^^^ extra precision is not real data!

// For display, round appropriately:
fmt.Printf("%.6f\n", value)  // 3.141593 (shows only 6 digits)

โœ… Best Practices

1. Check Dataset Type Before Reading

info, err := ds.Info()
if err == nil {
    fmt.Println(info)  // Shows datatype

    // Choose appropriate read method
    if strings.Contains(info, "string") {
        strings, _ := ds.ReadStrings()
        // ...
    } else if strings.Contains(info, "compound") {
        compounds, _ := ds.ReadCompound()
        // ...
    } else {
        data, _ := ds.Read()  // Numeric
        // ...
    }
}

2. Handle Type Assertions Safely

for _, attr := range attrs {
    value, err := attr.ReadValue()
    if err != nil {
        fmt.Printf("error reading %s: %v\n", attr.Name, err)
        continue
    }
    switch v := value.(type) {
    case int32:
        fmt.Printf("int32: %d\n", v)
    case int64:
        fmt.Printf("int64: %d\n", v)
    case float64:
        fmt.Printf("float64: %.6f\n", v)
    case string:
        fmt.Printf("string: %q\n", v)
    default:
        fmt.Printf("unknown type: %T\n", v)
    }
}

3. Document Precision Requirements

If your application requires exact integer values > 2532^{53}:

// Check if dataset contains large integers
info, _ := ds.Info()
if strings.Contains(info, "int64") {
    log.Println("Warning: int64 dataset may lose precision when converted to float64")
    log.Println("Safe range: -$2^{53}$ to $2^{53}$ (ยฑ9,007,199,254,740,992)")
    log.Println("For exact int64 values, wait for v1.0.0")
}

4. Use Compound Types for Structured Data

Instead of separate datasets:

/measurement_temperature
/measurement_humidity
/measurement_location

Use compound types:

/measurements (compound with temperature, humidity, location fields)

Benefits:

  • Keeps related data together
  • More efficient storage
  • Easier to maintain consistency

5. Create Test Files in Pure Go

Generate test files using this library (no Python needed!):

package main

import (
    "log"
    "github.com/scigolib/hdf5"
)

func main() {
    // Create test file
    fw, _ := hdf5.CreateForWrite("test_types.h5", hdf5.CreateTruncate)
    defer fw.Close()

    // Test all supported types
    fw.CreateDataset("/int32", hdf5.Int32, []uint64{3})
    fw.CreateDataset("/int64", hdf5.Int64, []uint64{3})
    fw.CreateDataset("/float32", hdf5.Float32, []uint64{3})
    fw.CreateDataset("/float64", hdf5.Float64, []uint64{3})

    // Fixed-length strings
    fw.CreateDataset("/strings_fixed", hdf5.StringFixed(10), []uint64{2})

    // Arrays
    fw.CreateDataset("/arrays", hdf5.ArrayFloat32, []uint64{10},
        hdf5.WithArrayDims([]uint64{3, 3}))

    // Enums
    fw.CreateDataset("/status", hdf5.EnumInt8, []uint64{5},
        hdf5.WithEnumValues([]string{"OK", "ERROR"}, []int64{0, 1}))

    // Write data, verify with h5dump!
}

Pure Go workflow - Write with this library, validate with h5dump or Python h5py!


๐Ÿ“š Next Steps


Last Updated: 2025-11-13