karpenter-oci

September 15, 2025 ยท View on GitHub

Description

karpenter-oci is the oracle cloud implement of karpenter, it depends on karpenter. It supports OKE cluster, and self-managed cluster on oracle cloud. And you are interested in contribution, you can find the project from karpenter-oci

Arch Overview

Arch

Feature

  1. Automatically scale up node capacity when available resources are insufficient
  2. Decommission idle nodes when no workload is present
  3. Support multi authenticate Method: resource principle, instance principle, api key, session
  4. Image is configurable
  5. Subnet is configurable
  6. Supports configuration of none or multiple security groups
  7. Support VM and Bare Metal
  8. Support attachment of additional disk
  9. Support specifying the kubelet configuration

Installation

prepare

  • create a compartment, karpenter-oci will launch instance in this compartment
  • create an OKE cluster in the created compartment
  • create policy in oracle console for the Karpenter service account, the name could like karpenter-oke-policy, the statements as below:
Allow any-user to manage instance-family in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to manage instances in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to read instance-images in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to read app-catalog-listing in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to manage volume-family in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to manage volume-attachments in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to use volumes in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to use virtual-network-family in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to inspect vcns in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to use subnets in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to use network-security-groups in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to use vnics in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow any-user to use tag-namespaces in tenancy where all {request.principal.type = 'workload',request.principal.namespace = 'karpenter',request.principal.service_account = 'karpenter'}
Allow dynamic-group <dynamic-group-name> to {CLUSTER_JOIN} in compartment <compartment-name>
  • create tag namespace inside the created compartment, the namespace name could like oke-karpenter-ns, the required keys show in below sheet, if you want to attach more customer tags, you also can add them in the namespace.
keydescription
karpenter_k8s_oracle/ocinodeclassthe name of nodeclass used to crate instance
karpenter_sh/managed-bythe OKE cluster name
karpenter_sh/nodepoolthe name of nodepool used to create instance
karpenter_sh/nodeclaimthe name of nodeclaim used to create instance

install

replace the clusterName, clusterEndpoint, clusterDns, compartmentId, ociResourcePrincipalRegion with yours.

kubectl apply -f ./pkg/apis/crds/
helm upgrade --install karpenter ./charts/karpenter --namespace "karpenter" --create-namespace --set "settings.clusterName=karpenter-oci-test" --set "settings.clusterEndpoint=https://10.0.0.8:6443" --set "settings.clusterDns=10.96.5.5" --set "settings.compartmentId=ocid1.compartment.oc1..aaaaaaaa" --set "settings.ociResourcePrincipalRegion=us-ashburn-1"

or you can install from helm git repo

helm repo add karpenter-oci https://zoom.github.io/karpenter-oci

If you had already added this repo earlier, run helm repo update to retrieve the latest versions of the packages. You can then run helm search repo karpenter-oci to see the charts.

To install the karpenter chart, also replace the clusterName, clusterEndpoint, clusterDns, compartmentId, ociResourcePrincipalRegion with yours:

helm install karpenter karpenter-oci/karpenter --version 1.4.1 --namespace "karpenter" --create-namespace --set "settings.clusterName=karpenter-oci-test" --set "settings.clusterEndpoint=https://10.0.0.8:6443" --set "settings.clusterDns=10.96.5.5" --set "settings.compartmentId=ocid1.compartment.oc1..aaaaaaaa" --set "settings.ociResourcePrincipalRegion=us-ashburn-1"

To uninstall the chart:

helm uninstall karpenter

setting details

settingdescriptiondefault
clusterNamecluster name
clusterEndpointapi server private endpoint
clusterDnsIP addresses for the cluster DNS server, general is core dns ip
compartmentIdthe compartment id or your worker nodes
ociResourcePrincipalRegionthe region your cluster belong to, refer issue
ociAuthMethodsAPI_KEY, OKE, SESSION, INSTANCE_PRINCIPALOKE
flexCpuConstrainListto constrain the ocpu cores of flex instance, instance create in this cpu size list, ocpu is twice of vcpu"1,2,4,8,16,32,48,64,96,128"
flexCpuMemRatiosthe ratios of vcpu and mem, eg. FLEX_CPU_MEM_RATIOS=2,4, if create flex instance with 2 cores(1 ocpu), mem should be 4Gi or 8Gi"2,4,8"
tagNamespaceThe tag namespace used to create and list instances by karpenter-oci, karpenter-oci will attach nodepool and nodeclass tag on the instanceoke-karpenter-ns
vmMemoryOverheadPercenthe VM memory overhead as a percent that will be subtracted from the total memory for all instance types0.075

Usage

nodepool

nodepool use to specify the disruption strategy, cpu and memory limits and requirements. The oracle feature requirement include the below labels:

labeldescriptionexample
karpenter.k8s.oracle/instance-shape-namethe shape nameVM.Standard.E4.Flex
karpenter.k8s.oracle/instance-cputhe vcpu count of the instance shape, for flex shape, karpenter-oci will strictly create instance in these vcpu sizes4,8
karpenter.k8s.oracle/instance-memorythe memory size of the instance shape, the unit is MB2048,4096
karpenter.k8s.oracle/instance-gputhe gpu card count of the instance shape1
karpenter.k8s.oracle/is-flexiblethe instance shape is flexible or not"true"

example

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: karpenter-test
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidateAfter: 30m0s
    consolidationPolicy: WhenEmpty
  limits:
    cpu: 64
    memory: 300Gi
  template:
    metadata:
      labels:
        servicegroup: karpenter-test
    spec:
      expireAfter: Never
      nodeClassRef:
        group: karpenter.k8s.oracle
        kind: OciNodeClass
        name: karpenter-test
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
            ### if you wanna enable preemptible instances creation for lower cost, add below line.
            - preemptible
        - key: karpenter.k8s.oracle/instance-shape-name
          operator: In
          values:
            - VM.Standard.E4.Flex
        - key: karpenter.k8s.oracle/instance-cpu
          operator: In
          values:
            - '4'
            - '8'
            - '16'
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
      terminationGracePeriod: 30m

ocinodeclass

the ocinodeclass is used for config the oracle cloud related resource, like OS image, subnet, security group, and also kubelet config.

specdescriptionrequiredexample
bootConfig.bootVolumeSizeInGBsThe size of the boot volume in GBs. Minimum value is 50 GB and maximum value is 32,768 GB (32 TB).yes100
bootConfig.bootVolumeVpusPerGBThe number of volume performance units (VPUs) that will be applied to this volume per GByes10
imageSelector[i].compartmentIdthe compartment id of the imageyesocid1.compartment.oc1..aaaaaaaab4u67dhgtj5gpdpp3z42xqqsdnufxkatoild46u3hb67vzojfmzq
imageSelector[i].namethe image nameyesOracle-Linux-8.10-2025.02.28-0-OKE-1.30.1-760
launchOptionsLaunchOptions Options for tuning the compatibility and performance of VM shapesnodetail
blockDevicesThe details of the volume to create for CreateVolume operation.nosizeInGBs: 100 vpusPerGB: 10
imageFamilysupport OracleOKELinux and Ubuntu2204, for OKE cluster use OracleOKELinux and for self-managed cluster use Ubuntu2204yesOracleOKELinux
vcnIdthe vcnId of the clusteryes
subnetSelectorthe name of the subnet which you want to create the worker nodes instance inyesoke-nodesubnet-quick-test
securityGroupSelectorthe security groups you want to attach to the instanceno
tagsthe tags you want to attach to the instanceno
metaDataspecify for native cni cluster or SSH keynooke-native-pod-networking: true ssh_authorized_keys: <your_ssh_pub_key>
agentLista list of OCI agents to enableno- Bastion
userDatacustomer userdata you want to run in the cloud-init script, it will execute before the kubelet startno
kubeletcustomer kubelet confignoKubeletConfiguration
  • if your cluster use flannel as the cni, you can refer: example
apiVersion: karpenter.k8s.oracle/v1alpha1
kind: OciNodeClass
metadata:
  name: karpenter-test
spec:
  bootConfig:
    bootVolumeSizeInGBs: 100
    bootVolumeVpusPerGB: 10
  imageSelector:
    - name: Oracle-Linux-8.10-2025.02.28-0-OKE-1.30.1-760
      compartmentId: ocid1.compartment.oc1..aaaaaaaab4u67dhgtj5gpdpp3z42xqqsdnufxkatoild46u3hb67vzojfmzq
  imageFamily: OracleOKELinux
  kubelet:
    evictionHard:
      imagefs.available: 15%
      imagefs.inodesFree: 10%
      memory.available: 750Mi
      nodefs.available: 10%
      nodefs.inodesFree: 5%
    systemReserved:
      memory: 100Mi
  subnetSelector: 
    - name: {{ .subnetName }}
  vcnId: {{ .vcnId }}
  • if your cluster use the native cni, you should set oke-native-pod-networking in the metadata as true, you can refer: example
apiVersion: karpenter.k8s.oracle/v1alpha1
kind: OciNodeClass
metadata:
  name: karpenter-test
spec:
  bootConfig:
    bootVolumeSizeInGBs: 100
    bootVolumeVpusPerGB: 10
  imageSelector:
    - name: Oracle-Linux-8.10-2025.02.28-0-OKE-1.30.1-760
      compartmentId: ocid1.compartment.oc1..aaaaaaaab4u67dhgtj5gpdpp3z42xqqsdnufxkatoild46u3hb67vzojfmzq
  imageFamily: OracleOKELinux
  metaData:
    oke-native-pod-networking: "true"
  kubelet:
    evictionHard:
      imagefs.available: 15%
      imagefs.inodesFree: 10%
      memory.available: 750Mi
      nodefs.available: 10%
      nodefs.inodesFree: 5%
    systemReserved:
      memory: 100Mi
  subnetSelector: 
    - name: {{ .subnetName }}
  vcnId: {{ .vcnId }}

Debugging

To aid debugging, add the metaData.ssh_authorized_keys and agentList parameters to your OciNodeClass.

apiVersion: karpenter.k8s.oracle/v1alpha1
kind: OciNodeClass
metadata:
  name: karpenter-arm64-class
spec:
  ... <config> ...
  agentList:
    - Bastion
  metaData:
    ssh_authorized_keys: <your_ssh_pub_key>

You will then be able to use the OCI Bastion service, or directly SSH the instances. For example, ssh then dump the logs from the services:

ssh opc@my.instance.ip.addr
journalctl -xefu kubelet
journalctl -xefu oke    

Support

If you meet any problem, welcome to raise a issue.

Roadmap

itemdate
update karpenter core to v1.42025.June

Contributing

Contributing is welcome, you can raise a PR to add new feature or fix bugs. We use envtest to run the test suite, better add the related test case in your commit.

License

http://www.apache.org/licenses/LICENSE-2.0