Schema Configuration Guide

February 5, 2026 · View on GitHub

Gemini can either generate a random schema automatically or use a custom schema defined in a JSON file. This guide covers both approaches.

Important Limitations

UPDATE Statements Temporarily Disabled

UPDATE statements are currently disabled in Gemini. When the statement ratio includes updates, they are internally converted to INSERT statements. Full UPDATE support will return in v2.1.0 once Gemini v2 is fully stable.

To work around this, focus on INSERT and DELETE ratios:

--statement-ratios='{"mutation":{"insert":0.95,"update":0.0,"delete":0.05}}'

Automatic Schema Generation

By default, Gemini generates a random schema based on CLI parameters:

./gemini \
  --max-tables=2 \
  --max-partition-keys=4 \
  --min-partition-keys=1 \
  --max-clustering-keys=3 \
  --min-clustering-keys=0 \
  --max-columns=10 \
  --min-columns=3 \
  --test-cluster=... \
  --oracle-cluster=...

Schema Generation Parameters

ParameterDefaultDescription
--max-tables1Maximum number of tables to generate
--max-partition-keys8Maximum partition key columns
--min-partition-keys1Minimum partition key columns
--max-clustering-keys5Maximum clustering key columns
--min-clustering-keys0Minimum clustering key columns
--max-columns12Maximum regular columns
--min-columns5Minimum regular columns
--dataset-sizelargeSize preset: small or large
--cql-featuresnormalFeature set: basic, normal, or all

CQL Feature Levels

  • basic - Simple types only (int, text, boolean, etc.)
  • normal - Adds collections (list, set, map) and tuples
  • all - Adds UDTs (User-Defined Types) and complex nested types

Reproducible Schemas

Use --schema-seed to generate the same schema across runs:

# First run - note the schema seed from output
./gemini --test-cluster=... --oracle-cluster=...

# Later - reproduce exact schema
./gemini --schema-seed=12345 --test-cluster=... --oracle-cluster=...

Custom Schema File

For precise control, provide a JSON schema file with --schema:

./gemini --schema=my_schema.json --test-cluster=... --oracle-cluster=...

Schema JSON Structure

{
  "keyspace": {
    "name": "my_keyspace",
    "replication": {
      "class": "NetworkTopologyStrategy",
      "replication_factor": 3
    },
    "oracle_replication": {
      "class": "NetworkTopologyStrategy",
      "replication_factor": 1
    }
  },
  "tables": [
    {
      "name": "users",
      "partition_keys": [
        {"name": "user_id", "type": "uuid"}
      ],
      "clustering_keys": [
        {"name": "created_at", "type": "timestamp"}
      ],
      "columns": [
        {"name": "name", "type": "text"},
        {"name": "age", "type": "int"}
      ]
    }
  ]
}

Column Definition Examples

Simple types:

{"name": "user_id", "type": "int"}

Set collection:

{
  "name": "tags",
  "type": {
    "complex_type": "set",
    "value_type": "text",
    "frozen": false
  }
}

Map type:

{
  "name": "metadata",
  "type": {
    "complex_type": "map",
    "key_type": "text",
    "value_type": "int",
    "frozen": false
  }
}

List type:

{
  "name": "scores",
  "type": {
    "complex_type": "list",
    "value_type": "double",
    "frozen": true
  }
}

Tuple type:

{
  "name": "coordinates",
  "type": {
    "complex_type": "tuple",
    "value_types": ["double", "double"],
    "frozen": false
  }
}

UDT (User-Defined Type):

{
  "name": "address",
  "type": {
    "complex_type": "udt",
    "type_name": "address_type",
    "frozen": true,
    "value_types": {
      "street": "text",
      "city": "text",
      "zip": "int"
    }
  }
}

Supported Data Types

Gemini supports CQL data types that are compatible with the Go driver (gocql). The following tables describe what's supported and what isn't.

Simple Types (Fully Supported)

TypeDescriptionPartition KeyClustering KeyMap Key
asciiASCII string
bigint64-bit signed integer
booleanTrue/false
dateDate without time
double64-bit floating point
float32-bit floating point
inetIP address
int32-bit signed integer
smallint16-bit signed integer
textUTF-8 string
timeTime without date
timestampDate and time
timeuuidType 1 UUID (time-based)
tinyint8-bit signed integer
uuidUUID
varcharUTF-8 string (alias for text)

Types with Restrictions

TypeDescriptionPartition KeyClustering KeyMap KeyReason
blobBinary dataGo maps cannot use byte slices as keys (slices are not comparable)
decimalVariable-precision decimalUses *inf.Dec pointer type which is not comparable in Go maps
durationTime durationCQL restriction: duration cannot be used in primary keys or as map keys
varintArbitrary-precision integerUses *big.Int pointer type which is not comparable in Go maps

Complex Types (Fully Supported)

TypeDescriptionAs ColumnAs Partition Key
list<T>Ordered collectionOnly if frozen
set<T>Unique unordered collectionOnly if frozen
map<K,V>Key-value pairsOnly if frozen
tuple<T1,T2,...>Fixed-length typed sequenceOnly if frozen
udtUser-defined typeOnly if frozen
counterDistributed counter

Unsupported ScyllaDB Types

The following ScyllaDB 2025.1 types are NOT supported by Gemini:

TypeReason
vector<T, N>Vector type for ML/AI workloads - not implemented in gocql driver
frozen<T> (standalone)Frozen is a modifier, not a standalone type

Why Certain Types Can't Be Map Keys

Gemini uses Go maps internally to track partition keys and compare results between oracle and test clusters. Go maps require keys to be "comparable" types. The following types cannot be used as map keys:

  1. blob - Represented as []byte (byte slice) in Go. Slices are not comparable because they are reference types with pointer semantics.

  2. decimal - Represented as *inf.Dec (pointer to Decimal). Pointers compare by address, not value, making them unsuitable for map keys.

  3. varint - Represented as *big.Int (pointer to arbitrary-precision integer). Same pointer comparison issue as decimal.

  4. duration - CQL itself prohibits duration in primary keys and map keys due to its complex internal representation (months, days, nanoseconds).

Primary Key Restrictions

Partition Keys

Gemini can use these types for partition keys:

  • All simple types except duration
  • Frozen complex types (frozen list, frozen set, frozen map, frozen tuple, frozen UDT)

Clustering Keys

Gemini can use these types for clustering keys:

  • All simple types except duration
  • blob is allowed (unlike partition keys in some configurations)

Why These Restrictions Exist

  1. Go Language Limitations: Go's map type requires comparable keys. Types backed by pointers (*big.Int, *inf.Dec) or slices ([]byte) cannot be directly compared.

  2. CQL Restrictions: Some restrictions come from CQL itself - duration cannot be part of a primary key because it lacks a natural total ordering.

  3. Driver Limitations: The gocql driver maps CQL types to Go types, inheriting Go's type system constraints.

Replication Strategies

NetworkTopologyStrategy

For multiple datacenters:

{
  "class": "NetworkTopologyStrategy",
  "datacenter1": 3,
  "datacenter2": 2
}

CLI Shorthand

# NetworkTopologyStrategy
--replication-strategy=network

# Custom (JSON inline)
--replication-strategy="{'class':'NetworkTopologyStrategy','dc1':3, 'replication_factor': 3}"
--oracle-replication-strategy="{'class': 'NetworkTopologyStrategy', 'dc1': 1, 'replication_factory': 1}"

Complete Example Schema

{
  "keyspace": {
    "name": "ecommerce",
    "replication": {
      "class": "NetworkTopologyStrategy",
      "datacenter1": 3
    },
    "oracle_replication": {
      "class": "SimpleStrategy",
      "replication_factor": 1
    }
  },
  "tables": [
    {
      "name": "orders",
      "partition_keys": [
        {"name": "customer_id", "type": "uuid"}
      ],
      "clustering_keys": [
        {"name": "order_date", "type": "timestamp"},
        {"name": "order_id", "type": "timeuuid"}
      ],
      "columns": [
        {"name": "total", "type": "decimal"},
        {"name": "status", "type": "text"},
        {
          "name": "items",
          "type": {
            "complex_type": "list",
            "value_type": "text",
            "frozen": false
          }
        }
      ]
    }
  ]
}