Adding New GPU Architecture Support

April 9, 2026 ยท View on GitHub

Guide for adding support for a new AMD GPU architecture to the CK Tile Dispatcher.

See also: Main Dispatcher README | Codegen README

Overview

The dispatcher uses arch_specs.json as the single source of truth for GPU specifications:

arch_specs.json -> generate_arch_specs.py -> arch_specs_generated.py (Python)
                                        -> arch_specs_generated.hpp (C++)

Quick Start

# 1. Edit arch_specs.json
# 2. Run generator
python generate_arch_specs.py
# 3. Rebuild
cd ../build && cmake --build . -j8
# 4. Test
ctest

Step-by-Step Guide

Step 1: Edit arch_specs.json

Add new architecture under "architectures":

{
  "architectures": {
    "gfx1100": {
      "family": "rdna3",
      "description": "AMD Radeon RX 7000 series (RDNA3)",
      "warp_size": 32,
      "lds_capacity_kb": 64,
      "warp_configs": [
        [2, 4, 1],
        [4, 2, 1]
      ],
      "warp_tile_combos": {
        "fp16_fp16_fp16": [[16, 16, 16], [32, 32, 16]],
        "bf16_bf16_bf16": [[16, 16, 16], [32, 32, 16]]
      }
    }
  }
}

Step 2: Configuration Fields

FieldDescriptionExample
familyGPU family"cdna3", "rdna4"
descriptionHuman-readable name"AMD Instinct MI300"
warp_sizeWave/warp size64 (CDNA), 32 (RDNA)
lds_capacity_kbLDS memory in KB64
warp_configsValid [warp_m, warp_n, warp_k][[2,2,1], [4,4,1]]
warp_tile_combosWarp tiles per dtypeSee below

Step 3: Warp Tile Combinations

Map data type combinations to valid warp tile sizes:

"warp_tile_combos": {
  "fp16_fp16_fp16": [[32, 32, 8], [16, 16, 16], [32, 32, 16]],
  "bf16_bf16_bf16": [[32, 32, 8], [16, 16, 16]],
  "fp8_fp8_fp16": [[32, 32, 16], [32, 32, 32]],
  "int8_int8_int32": [[16, 16, 32], [32, 32, 16]]
}

Key format: {A_dtype}_{B_dtype}_{C_dtype}

Step 4: Run Generator

cd dispatcher/codegen
python generate_arch_specs.py

This generates:

  • arch_specs_generated.py (Python module)
  • ../include/ck_tile/dispatcher/arch_specs_generated.hpp (C++ header)

Step 5: Rebuild and Test

cd ../build
cmake --build . -j8
ctest --output-on-failure

Step 6: Verify

from arch_filter import ArchFilter

filter = ArchFilter("gfx1100")
is_valid = filter.is_kernel_valid(
    datatype_a="fp16", datatype_b="fp16", datatype_c="fp16",
    tile_m=128, tile_n=128, tile_k=32,
    warp_m=2, warp_n=2, warp_k=1,
    warp_tile_m=16, warp_tile_n=16, warp_tile_k=16
)
print(f"Valid: {is_valid}")

Reference

Supported Data Types

KeyDescription
fp16Half precision (16-bit)
bf16Brain float 16
fp32Single precision (32-bit)
fp64Double precision (64-bit)
fp88-bit float (E4M3)
bf88-bit brain float (E5M2)
int88-bit integer
int44-bit integer

GPU Families

FamilyDescription
cdna2MI200 series (gfx90a)
cdna3MI300 series (gfx942)
cdna4MI350 series (gfx950)
rdna3RX 7000 series (gfx1100)
rdna4RX 9000 series (gfx1201)

Pipeline LDS Limits

PipelineLDS Limit
compv432 KB
preshufflev232 KB
default64 KB

Troubleshooting

"Unknown GPU architecture"

  1. Check architecture key matches exactly (e.g., "gfx942" not "GFX942")
  2. Verify you ran generate_arch_specs.py
  3. Rebuild C++ code

Kernels being rejected

from arch_filter import ArchFilter, KernelConfig

filter = ArchFilter("gfx942")
result = filter.validate_kernel(config)
print(f"Valid: {result.valid}")
for error in result.errors:
    print(f"  Error: {error}")

Missing warp tile combination

  1. Check warp_tile_combos in arch_specs.json
  2. Ensure [warp_tile_m, warp_tile_n, warp_tile_k] is in the list
  3. Verify data type key format

File Structure

codegen/
|---- arch_specs.json              # Single source of truth (EDIT THIS)
|---- generate_arch_specs.py       # Generator script
|---- arch_specs_generated.py      # Generated Python module
+---- ADDING_NEW_GPU.md           # This file

include/ck_tile/dispatcher/
|---- arch_specs_generated.hpp     # Generated C++ header
+---- arch_filter.hpp              # C++ filter

Best Practices

  1. Test thoroughly - Run all tests after adding a new GPU
  2. Start minimal - Add only validated configurations
  3. Document sources - Note where warp tile combinations came from
  4. Keep in sync - If using tile_engine, keep both updated

More info: See ../README.md for full documentation.