Puppet Magic Castle

May 11, 2026 ยท View on GitHub

This repo contains the Puppet environment and the classes that are used to define the roles of the instances in a Magic Castle cluster.

Roles are attributed to instance based on their tags. For each tag, a list of classes to include is define. This mechanism is explained in section magic_castle::site.

The parameters of the classes can be customized by defined values in the hieradata. The profile:: sections list the available classes, their role and their parameters.

For classes with parameters, a folded default values subsection provides the default value of each parameter as it would be defined in hieradata. For some parameters, the value is displayed as ENC[PKCS7,...]. This corresponds to an encrypted random value generated by bootstrap.sh on the Puppet server initial boot. These values are stored in /etc/puppetlabs/code/environment/data/bootstrap.yaml - a file also created on Puppet server initial boot.

magic_castle::site

parameters

VariableDescriptionType
allList of classes that are included by all instancesArray[String]
tagsMapping tag-classes - instances that have the tag include the classesHash[Array[String]]
enable_chaosShuffle class inclusion order - used for debugging purposesBoolean
default values
magic_castle::site::all:
  - profile::base
  - profile::consul
  - profile::users::local
  - profile::sssd::client
  - profile::mail
  - profile::prometheus::node_exporter
  - swap_file
magic_castle::site::tags:
  dtn:
    - profile::globus
    - profile::nfs::client
    - profile::freeipa::client
    - profile::rsyslog::client
  login:
    - profile::fail2ban
    - profile::cvmfs::client
    - profile::slurm::submitter
    - profile::ssh::hostbased_auth::client
    - profile::nfs::client
    - profile::freeipa::client
    - profile::rsyslog::client
  mgmt:
    - mysql::server
    - profile::freeipa::server
    - profile::prometheus::server
    - profile::prometheus::slurm_exporter
    - profile::rsyslog::server
    - profile::squid::server
    - profile::slurm::controller
    - profile::freeipa::mokey
    - profile::slurm::accounting
    - profile::accounts
  node:
    - profile::cvmfs::client
    - profile::gpu
    - profile::jupyterhub::node
    - profile::slurm::node
    - profile::ssh::hostbased_auth::client
    - profile::ssh::hostbased_auth::server
    - profile::prometheus::slurm_job_exporter
    - profile::nfs::client
    - profile::freeipa::client
    - profile::rsyslog::client
  nfs:
    - profile::nfs::server
    - profile::cvmfs::alien_cache
  proxy:
    - profile::jupyterhub::hub
    - profile::reverse_proxy
    - profile::freeipa::client
    - profile::rsyslog::client
  efa:
    - profile::efa
example 1: enabling CephFS client in a complete Magic Castle cluster
magic_castle::site::tags:
  cephfs:
    - profile::ceph::client

Require adding cephfs tag in main.tf to all instances that should mount the Ceph fileystem.

example 2: barebone Slurm cluster with external LDAP authentication
magic_castle::site::all:
  - profile::base
  - profile::consul
  - profile::sssd::client
  - profile::users::local
  - swap_file

magic_castle::site::tags:
  mgmt:
    - profile::slurm::controller
    - profile::nfs::server
  login:
    - profile::slurm::submitter
    - profile::nfs::client
  node:
    - profile::slurm::node
    - profile::nfs::client
    - profile::gpu

profile::accounts

This class configures two services to bridge LDAP users, Slurm accounts and users' folders in filesystems. The services are:

  • mkhome: monitor new uid entries in slapd access logs and create their corresponding /home and optionally /scratch folders.
  • mkproject: monitor new gid entries in slapd access logs and create their corresponding /project folders and Slurm accounts if it matches the project regex.

parameters

VariableDescriptionType
project_regexRegex identifying FreeIPA groups that require a corresponding Slurm accountString
skel_archivesArchives extracted in each FreeIPA user's home when createdArray[Struct[{filename => String[1], source => String[1]}]]
manage_homeWhen true, mkhome create home folder for new FreeIPA usersBoolean
manage_scratchWhen true, mkhome create scratch folder for new FreeIPA usersBoolean
manage_projectWhen true, mkproject create project folder for new FreeIPA usersBoolean
default values
profile::accounts::project_regex: '(ctb\|def\|rpp\|rrg)-[a-z0-9_-]*'
profile::accounts::skel_archives: []
profile::accounts::manage_home: true
profile::accounts::manage_scratch: true
profile::accounts::manage_project: true
example
profile::accounts::project_regex: '(slurm)-[a-z0-9_-]*'
profile::accounts::skel_archives:
  - filename: hss-programing-lab-2022.zip
    source: https://github.com/ComputeCanada/hss-programing-lab-2022/archive/refs/heads/main.zip
  - filename: hss-training-topic-modeling.tar.gz
    source: https://github.com/ComputeCanada/hss-training-topic-modeling/archive/refs/heads/main.tar.gz

optional dependencies

This class works at its full potential if these classes are also included:

profile::base

This class install packages, creates files and install services that have yet justified the creation of a class of their own but are very useful to Magic Castle cluster operations.

parameters

VariableDescriptionType
versionCurrent version number of Magic CastleString
admin_emailEmail of the cluster administrator, use to send log and report cluster related issuesString
packagesList of additional OS packages that should be installedArray[String]
default values
profile::base::version: '13.0.0'
profile::base::admin_emain: ~ #undef
profile::base::packages: []
example
profile::base::version: '13.0.0-rc.2'
profile::base::admin_emain: "you@email.com"
profile::base::packages:
  - gcc-c++
  - make

dependencies

When profile::base is included, these classes are included too:

profile::base::azure

This class ensures Microsoft Azure Linux Guest Agent is not installed as it tends to interfere with Magic Castle configuration. The class also install Azure udev storage rules that would normally be provided by the Linux Guest Agent.

profile::base::etc_hosts

This class ensures that each instance declared in Magic Castle main.tf have an entry in /etc/hosts. The ip addresses, fqdns and short hostnames are taken from the terraform.instances datastructure provided by /etc/puppetlabs/data/terraform_data.yaml.

profile::base::powertools

This class ensures the DNF Powertools repo is enabled when using EL8. For all other EL versions, this class does nothing.

profile::ceph::client

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. reference

This class installs the Ceph packages, and configure and mount CephFS shares.

parameters

VariableDescriptionType
mon_hostList of Ceph monitor hostnamesArray[String]
sharesList of Ceph share structuresHash[String, CephFS]
example
profile::ceph::client::mon_host:
  - 192.168.1.3:6789
  - 192.168.2.3:6789
  - 192.168.3.3:6789
profile::ceph::client::shares:
  home:

  project:

Type CephFS

Defines a CephFS share configuration used by profile::ceph::client::shares.

FieldDescriptionTypeRequired
share_nameCeph share nameStringYes
access_keyCeph key for the userStringYes
export_pathCephFS export path to mountStdlib::UnixpathYes
bind_mountsOptional list of bind mounts created from the mounted shareArray[BindMount]No
binds_fcontext_equivalenceOptional SELinux fcontext equivalence target for bind mountsStdlib::UnixpathNo
mon_hostOptional list of monitor hosts for this mount (override default)Array[String]No
profile::ceph::client::shares:
  home:
    share_name: "home"
    access_key: "AQB...=="
    export_path: "/volumes/home"
    bind_mounts:
      - src: "/projects"
        dst: "/srv/projects"
        type: "directory"

Type BindMount

Defines a bind mount created from a subpath of the mounted CephFS share.

FieldDescriptionTypeRequired
srcSource path within the mounted shareStdlib::UnixpathYes
dstDestination path on the hostStdlib::UnixpathYes
typeOptional target type (file or directory)Enum['file','directory']No

profile::ceph::client::install

This class configures the upstream Ceph package repository and installs the Ceph client packages.

parameters

VariableDescriptionType
releaseCeph release used to configure the upstream package repository when version is not setString
versionOptional Ceph version used to configure the upstream package repository instead of releaseOptional[String]
default values
profile::ceph::client::install::release: reef
profile::ceph::client::install::version: ~
example
profile::ceph::client::install::release: squid
profile::ceph::client::install::version: ~

profile::consul

Consul is a service networking platform developed by HashiCorp. reference

This class install consul and configure the service. An instance becomes a Consul server agent if its local ip address is declared in profile::consul::servers. Otherwise, it becomes a Consul client agent.

parameters

VariableDescriptionType
serversIP addresses of the consul serversArray[String]
acl_api_tokenSecret in the UUID form allowing agents to interact with consul APIString
default values
profile::consul::servers: "%{alias('terraform.tag_ip.puppet')}"
profile::consul::acl_api_token: ENC[PKCS7,...]
example
profile::consul::servers:
  - 10.0.1.2
  - 10.0.1.3
  - 10.0.1.4

dependencies

When profile::consul is included, these classes are included too:

profile::consul::puppet_watch

This class configure a consul watch event that when triggered restart the Puppet agent. It is used mainly by Terraform to restart all Puppet agents across the cluster when the hieradata source files uploaded by Terraform are updated.

dependencies

When profile::consul::puppet_watch is included, this class is included too:

profile::cvmfs::client

The CernVM File System (CVMFS) provides a scalable, reliable and low-maintenance software distribution service. It was developed to assist High Energy Physics (HEP) collaborations to deploy software on the worldwide-distributed computing infrastructure used to run data processing applications. CernVM-FS is implemented as a POSIX read-only file system in user space (a FUSE module). Files and directories are hosted on standard web servers and mounted in the universal namespace /cvmfs. reference

This class installs CVMFS client and configure repositories.

parameters

VariableDescriptionType
quota_limitInstance local cache directory soft quota (MB)Integer
disable_autofsIf true, mount repositories directly instead of using autofsVariant[Boolean, String]
strict_mountIf true, mount only repositories that are listed repositoriesBoolean
repositoriesFully qualified repository names to include in use of utilities such as cvmfs_configArray[String]
alien_cache_repositoriesList of repositories that require an alien cacheArray[String]
cvmfs_rootMount root for CVMFS repositoriesString
default values
profile::cvmfs::client::quota_limit: 4096
profile::cvmfs::client::disable_autofs: false
profile::cvmfs::client::strict_mount: false
profile::cvmfs::client::repositories:
  - software.eessi.io
  - cvmfs-config.computecanada.ca
  - soft.computecanada.ca
profile::cvmfs::client::alien_cache_repositories: [ ]
profile::cvmfs::client::cvmfs_root: "/cvmfs"
example
profile::cvmfs::client::quota_limit: 8192
profile::cvmfs::client::repositories:
  - atlas.cern.ch
profile::cvmfs::client::alien_cache_repositories:
  - grid.cern.ch

dependencies

When profile::cvmfs::client is included, these classes are included too:

profile::cvmfs::local_user

This class configures a cvmfs local user. This guarantees a consistent UID and GID for user cvmfs across the cluster when using CVMFS Alien Cache.

parameters

VariableDescriptionType
unamecvmfs user nameString
groupcvmfs group nameString
uidcvmfs user idInteger
gidcvmfs group idInteger
selinux_userSELinux user for cvmfs filesString
mls_rangeSELinux MLS range for cvmfs filesString
default values
profile::cvmfs::local_user::uname: "cvmfs"
profile::cvmfs::local_user::group: "cvmfs-reserved"
profile::cvmfs::local_user::uid: 13000004
profile::cvmfs::local_user::gid: 8000131
profile::cvmfs::local_user::selinux_user: "unconfined_u"
profile::cvmfs::local_user::mls_range: "s0-s0:c0.c1023"

profile::cvmfs::alien_cache

This class determines the location of the CVMFS alien cache.

parameters

VariableDescriptionType
alien_fs_root_rawShared file system where the alien cache will be create (raw path)String
alien_folder_name_rawAlien cache folder name (raw value)String
default values
profile::cvmfs::alien_cache::alien_fs_root_raw: "scratch"
profile::cvmfs::alien_cache::alien_folder_name_raw: "cvmfs_alien_cache"

profile::efa

This class installs the Elastic Fabric Adapter drivers on an AWS instance with an EFA network interface. reference

parameters

VariableDescriptionType
versionEFA driver versionString
default values
profile::efa::version: 'latest'
example
profile::efa::version: '1.30.0'

profile::fail2ban

Fail2ban is an intrusion prevention software framework. Written in the Python programming language, it is designed to prevent brute-force attacks. reference

This class installs and configures fail2ban.

parameters

VariableDescriptionType
ignoreipList of IP addresses that can never be banned (compatible with CIDR notation)Array[String]

Refer to puppet-fail2ban for more parameters to configure.

default values
profile::fail2ban::ignoreip: []
example
profile::fail2ban::ignoreip:
  - 132.203.0.0/16
  - 10.0.0.0/8

dependencies

When profile::fail2ban is included, these classes are included too:

profile::freeipa

FreeIPA is a free and open source identity management system. FreeIPA is the upstream open-source project for Red Hat Identity Management. reference

This class configures either the instance as a FreeIPA client or a server based on the value of profile::freeipa::client::server_ip. If this value matches the instance local IP address, the server class is included - profile::freeipa::server, otherwise the client class is included - profile::freeipa::client.

dependencies

When profile::freeipa is included, theses classes can be included too:

profile::freeipa::base

This class configures files and services that are common to FreeIPA client and FreeIPA server.

parameters

VariableDescriptionType
ipa_domainFreeIPA primary domainString
default values
profile::freeipa::base::ipa_domain: "int.%{lookup('terraform.data.domain_name')}"

profile::freeipa::client

This class install packages, and configures files and services of a FreeIPA client.

parameters

VariableDescriptionType
server_ipFreeIPA server ip addressString
default values

By default, the FreeIPA server ip address corresponds to the local ip address of the first instance with the tag mgmt.

profile::freeipa::client::server_ip: "%{alias('terraform.tag_ip.mgmt.0')}"

profile::freeipa::server

This class configures files and services of a FreeIPA server.

parameters

VariableDescriptionType
id_startStarting user and group id numberInteger
admin_passwordPassword of the FreeIPA admin accountString
ds_passwordPassword of the directory serverString
hbac_servicesName of services to control with HBAC rulesArray[String]
enable_mokeyEnable the mokey serviceBoolean
default values
profile::freeipa::server::id_start: 60001
profile::freeipa::server::admin_password: ENC[PKCS7,...]
profile::freeipa::server::ds_password: ENC[PKCS7,...]
profile::freeipa::server::hbac_services: ["sshd", "jupyterhub-login"]
profile::freeipa::server::enable_mokey: true

profile::freeipa::mokey

mokey is web application that provides self-service user account management tools for FreeIPA. reference

This class installs mokey, configures its files and manage its service.

parameters

VariableDescriptionType
passwordPassword of Mokey table in MariaDBString
portMokey internal web server portInteger
enable_user_signupAllow users to create an account on the clusterBoolean
require_verify_adminRequire a FreeIPA to enable Mokey created account before usageBoolean
default values
profile::freeipa::mokey::password: ENC[PKCS7,...]
profile::freeipa::mokey::port: 12345
profile::freeipa::mokey::enable_user_signup: true
profile::freeipa::mokey::require_verify_admin: true

profile::gpu

This class installs and configures the NVIDIA GPU drivers if an NVIDIA GPU is detected. It supports PCI passthrough and VGPU, and selects automatically the correct configuration scenario.

For PCI passthrough, the class inlcudes profile::gpu::install::passthrough which installs the latest CUDA drivers available on NVIDIA yum repos. For VGPU, the class includes profile::gpu::install::vgpu which installs the driver from cloud provider specific source.

parameters

VariableDescriptionType
restrict_profilingRestrict access to NVIDIA GPU Performance Counters to rootBoolean
default values
profile::gpu::restrict_profiling: false

profile::gpu::config::mig

This class configures MIG profiles using NVIDIA MIG Manager.

parameters

VariableDescriptionType
mig_profileHash of key-value pair where keys are NVIDIA mig profile and values are their numbers.Variant[Undef, Hash]
mig_manager_versionVersion of NVIDIA MIG Manager to installString
default values ```yaml profile::gpu::config::mig::mig_profile: ~ profile::gpu::config::mig::mig_manager_version = '0.5.5' ```

profile::gpu::install

This class contains the common installation steps shared by the NVIDIA GPU driver installation profiles. It can also create symlinks to the installed driver libraries when applications expect them in a specific path.

parameters

VariableDescriptionType
dcgm_packagesDCGM packages installed from the NVIDIA CUDA repository. These packages provide the NVIDIA Data Center GPU Manager service used for GPU metrics collection, for example by slurm-job-exporter. Set to an empty list to skip DCGM package installation.Array[String]
lib_symlink_pathPath where symlinks to installed NVIDIA shared libraries are created. Useful when applications expect the driver libraries in a non-standard location.Optional[String]
default values ```yaml profile::gpu::install::dcgm_packages: - datacenter-gpu-manager-4-proprietary - datacenter-gpu-manager-4-core - datacenter-gpu-manager-4-cuda12 profile::gpu::install::lib_symlink_path: ~ ```
example
profile::gpu::install::dcgm_packages:
  - datacenter-gpu-manager-4-proprietary
  - datacenter-gpu-manager-4-core
profile::gpu::install::lib_symlink_path: '/usr/lib64/nvidia'

profile::gpu::install::passthrough

This class installs the NVIDIA driver stack for instances where the physical GPU is passed through directly to the virtual machine. It relies on the NVIDIA yum repositories and installs the packages required for CUDA workloads.

parameters

VariableDescriptionType
packagesNVIDIA-related packages installed for passthrough nodes.Array[String]
nvidia_driver_streamNVIDIA driver module stream enabled for passthrough installations.String
default values ```yaml profile::gpu::install::passthrough::packages: - nvidia-driver-cuda-libs - nvidia-driver - nvidia-driver-devel - nvidia-driver-libs - nvidia-driver-NVML - nvidia-modprobe profile::gpu::install::passthrough::nvidia_driver_stream: '550-dkms' ```
example
profile::gpu::install::passthrough::packages:
  - nvidia-driver-cuda-libs
  - nvidia-driver
  - nvidia-modprobe
profile::gpu::install::passthrough::nvidia_driver_stream: '575-dkms'

profile::gpu::install::vgpu

This class installs and configures the NVIDIA vGPU driver stack for instances that use mediated or vendor-provided virtual GPUs. It selects the appropriate installation backend and can also manage the licensing files required by NVIDIA vGPU deployments.

parameters

VariableDescriptionType
installerInstallation method used for NVIDIA vGPU drivers.Enum['rpm', 'bin', 'none']
grid_vgpu_typesList of regexes matched against terraform.self.specs.type to identify instances that should use the GRID vGPU installation path.Array[String]
gridd_contentContent written to /etc/nvidia/gridd.conf for NVIDIA vGPU licensing configuration.Optional[String]
gridd_sourceSource used to populate /etc/nvidia/gridd.conf for NVIDIA vGPU licensing configuration.Optional[String]
token_contentContent written to /etc/nvidia/ClientConfigToken/client_config.tok for NVIDIA License System client configuration.Optional[String]
token_sourceSource used to populate /etc/nvidia/ClientConfigToken/client_config.tok for NVIDIA License System client configuration.Optional[String]
default values ```yaml profile::gpu::install::vgpu::installer: none profile::gpu::install::vgpu::grid_vgpu_types: [] profile::gpu::install::vgpu::gridd_content: ~ profile::gpu::install::vgpu::gridd_source: ~ profile::gpu::install::vgpu::token_content: ~ profile::gpu::install::vgpu::token_source: ~ ```
example
profile::gpu::install::vgpu::installer: bin
profile::gpu::install::vgpu::grid_vgpu_types:
  - "^Standard_NV(6|12|18|36|72)ad[m]*s_A10_v5$"
  - "^Standard_NV(12|24|48)s_v3$"
  - "^Standard_NC(4|8|16|64)as_T4_v3$"
profile::gpu::install::vgpu::gridd_content: "FeatureType=4"
profile::gpu::install::vgpu::gridd_source: https://hpsrepo.fz-juelich.de/jusuf/nvidia/gridd.conf
profile::gpu::install::vgpu::token_content: "LICENSE_SYSTEM_TOKEN_CONTENT"
profile::gpu::install::vgpu::token_source: https://object-arbutus.alliancecan.ca/swift/v1/6c87c15eb7d2468daf3d2bd0c58bbfce/vgpu/kalpa-prod.tok

profile::gpu::install::vgpu::bin

This class installs the NVIDIA vGPU driver from the vendor-provided .run installer. It is intended for environments where the vGPU driver is distributed as a standalone binary rather than as OS packages.

parameters

VariableDescriptionType
sourceSource URL for the NVIDIA vGPU .run installer downloaded and executed by /usr/bin/mc-nvidia-installer.String
installer_flagsAdditional flags passed to /usr/bin/mc-nvidia-installer when installing the NVIDIA vGPU driver from the .run installer.String
default values ```yaml profile::gpu::install::vgpu::bin::installer_flags: '--kernel-module-type=proprietary --disable-nouveau --no-install-compat32-libs --no-wine-files --dkms' ```

profile::gpu::install::vgpu::rpm

This class installs the NVIDIA vGPU driver from RPM packages provided by a repository package. It is intended for environments where the cloud provider or site distributes vGPU drivers through a dedicated RPM repository.

parameters

VariableDescriptionType
sourceSource URL for the RPM repository package that provides the NVIDIA vGPU RPM packages.String
packagesList of NVIDIA vGPU RPM packages to install from the configured repository package.Array[String]
default values ```yaml profile::gpu::install::vgpu::rpm::source: http://repo.arbutus.cloud.computecanada.ca/pulp/repos/alma%{facts.os.release.major}/Packages/a/arbutus-cloud-vgpu-repo-1.0-1.el%{facts.os.release.major}.noarch.rpm profile::gpu::install::vgpu::rpm::packages: - nvidia-vgpu-kmod - nvidia-vgpu-gridd - nvidia-vgpu-tools ```

profile::gpu::services

This class manages the NVIDIA GPU services required by the installed driver stack. For VGPU instances, nvidia-gridd is added automatically.

parameters

VariableDescriptionType
namesNVIDIA GPU services to ensure running and enabled. For VGPU instances, nvidia-gridd is added automatically.Array[String]
default values ```yaml profile::gpu::services::names: - nvidia-persistenced - nvidia-dcgm ```
example
profile::gpu::services::names:
  - nvidia-persistenced
  - nvidia-gridd

profile::jupyterhub::hub

JupyterHub is a multi-user server for Jupyter Notebooks. It is designed to support many users by spawning, managing, and proxying many singular Jupyter Notebook servers. reference

This class installs and configure the hub part of JupyterHub.

parameters

VariableDescriptionType
register_urlURL that links to register page. Empty string means no visible link.String
reset_pw_urlURL that links to reset password page. Empty string means no visible link.String
default values
profile::jupyterhub::hub::register_url: "https://mokey.%{lookup('terraform.data.domain_name')}/auth/signup"
profile::jupyterhub::hub::reset_pw_url: "https://mokey.%{lookup('terraform.data.domain_name')}/auth/forgotpw"

dependency

When profile::jupyterhub::hub is included, this class is included too:

profile::jupyterhub::node

This class installs and configure the single-user notebook part of JupyterHub.

dependency

When profile::jupyterhub::node is included, these classes are included too:

profile::mail

Postfix is a free and open-source mail transfer agent that routes and delivers electronic mail.

This class instantiates postfix either as mail client or a relayhost. If the instance's local ip address is included in profile::mail::sender::relayhosts, the relayhost class is included - profile::mail::relayhost, otherwise the sender class is included - profile::mail::sender.

profile::mail::base

This class gathers configurations that are common between sender and relayhosts.

parameters

VariableDescriptionType
originDefine the origin domain of outgoing emailsString
authorized_submit_usersList of user authorized to send emailsArray[String]
default values
profile::mail::base::origin: "%{alias('terraform.data.domain_name')}"
profile::mail::base::authorized_submit_users: ["root", "slurm"]

profile::mail::dkim

DomainKeys Identified Mail (DKIM) is an email authentication method that permits a person, role, or organization that owns the signing domain to claim some responsibility for a message by associating the domain with the message.

This class installs and configures OpenDKIM daemon.

parameters

VariableDescriptionType
private_keyRSA private key in PEM format that will be used to sign outgoing emailsString

profile::mail::relayhost

This class configures Postfix as a relayhost for other instances inside the cluster to send emails outside of the internal domain. If an RSA private key is provided via profile::mail::dkim::private_key, the class also includes profile::mail::dkim.

dependency

When profile::mail::relayhost is included, this class may also be included too:

profile::mail::sender

This class configures Postfix as a client that will send outgoing to one of the relayhosts.

parameters

VariableDescriptionType
relayhostsList of internal instances that will forward outgoing emailsArray[String]
default values
profile::mail::sender::relayhosts: "%{alias('terraform.tag_ip.public')}"

profile::metrix

This class installs and configures the metrix portal.

parameters

VariableDescriptionType
login_tagsList of tags used to identify metrix login nodesArray[String]
default values
profile::metrix::login_tags:
  - login
example
profile::metrix::login_tags:
  - login
  - public

profile::prometheus::caddy_exporter

This class configures a local Caddy metrics endpoint and registers it in Consul for Prometheus scraping.

parameters

VariableDescriptionType
portPort used by the Caddy metrics endpointInteger
default values
profile::prometheus::caddy_exporter::port: 2020

dependency

When profile::prometheus::caddy_exporter is included, this class is included too:

profile::prometheus::node_exporter

Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. reference

This class configures a Prometheus exporter that exports server usage metrics, for example CPU and memory usage. It should be included on every instances of the cluster.

dependencies

When profile::prometheus::node_exporter is included, these classes are included too:

profile::prometheus::slurm_job_exporter

This class configures a Prometheus exporter that exports the Slurm compute node metrics, for example:

  • job memory usage
  • job memory max
  • job memory limit
  • job core usage total
  • job process count
  • job threads count
  • job power gpu

This exporter needs to run on compute nodes.

parameter

VariableDescriptionType
versionThe version of the slurm job exporter to installString
nvidia_ml_py_versionVersion of nvidia-ml-py for GPU metricsString
default values
profile::prometheus::slurm_job_exporter::version: '0.4.9'
profile::prometheus::slurm_job_exporter::nvidia_ml_py_version: '11.515.75'

dependency

When profile::prometheus::slurm_job_exporter is included, this class is included too:

  • [profile::consul](#profileconsul)

profile::prometheus::slurm_exporter

This class configures a Prometheus exporter that exports the Slurm scheduling metrics, for example:

  • allocated nodes
  • allocated gpus
  • pending jobs
  • completed jobs

This exporter typically runs on the Slurm controller server, but it can run on any server with a functional Slurm command-line installation.

parameters

VariableDescriptionType
portPort used by the exporterInteger
collectorsList of collectors to enableArray[String]
default values
profile::prometheus::slurm_exporter::port: 8081
profile::prometheus::slurm_exporter::collectors: ['partition']

profile::puppetserver

This class configures Puppet Server runtime settings, creates the Puppet Prometheus textfile exporter configuration, and ensures the Puppet Server service is running.

parameters

VariableDescriptionType
jruby_max_active_instancesMaximum number of active JRuby instances used by Puppet ServerInteger
java_heap_sizePuppet Server JVM heap size in MiBInteger
default values
profile::puppetserver::jruby_max_active_instances: 1
profile::puppetserver::java_heap_size: 1024
example
profile::puppetserver::jruby_max_active_instances: 2
profile::puppetserver::java_heap_size: 2048

profile::nfs

Network File System (NFS) is a distributed file system protocol [...] allowing a user on a client computer to access files over a computer network much like local storage is accessed. reference

This class instantiates either an NFS client or an NFS server. If profile::nfs::client::server matches the instance's local IP address, FQDN or hostname, the server class is included - profile::nfs::server, otherwise the client class is included - profile::nfs::client.

parameters

VariableDescriptionType
domainNFSv4 ID mapping domainString
default values
profile::nfs::domain: "%{lookup('profile::freeipa::base::ipa_domain')}"

profile::nfs::client

This class installs NFS and configures the client to mount shares exported by a single NFS server identified via its IP address or FQDN. The shares to mount are inferred from the list of volumes with an nfs tag in the terraform.instances datastructure. Additional shares can be mounted by providing a list of names with share_names variable.

share_names is can also be used to specify which shares to mount when there terraform.instances datastructure does not include any volume with the nfs tag.

This class is compatible with Amazon Elastic Filesystem. The variable server can be set to an EFS filesystem DNS name or IP address.

parameters

VariableDescriptionType
serverIP address or FQDN of the NFS serverString
share_namesNames of the exported shares to mountArray[String]
default values
profile::nfs::client::server: "%{alias('terraform.tag_ip.nfs.0')}"
profile::nfs::client::share_names: []

dependency

When profile::nfs::client is included, these classes are included too:

  • nfs (client_enabled => true)

profile::nfs::server

This class installs NFS and configure an NFS server that will export all volumes tagged as nfs. It can also export addditional paths specified by the variable export_paths.

parameters

VariableDescriptionType
no_root_squash_tagsArray of tags identifying instances that can mount NFS exports without root squashArray[String]
enable_client_quotasEnable query of quotas on NFS clientsBoolean
export_pathsList of paths to export in addition to volumes with nfs tagย Array[String]ย 
default values
profile::nfs::server::no_root_squash_tags: ['mgmt']
profile::nfs::server::enable_client_quotas: false
profile::nfs::server::export_paths: []

dependency

When profile::nfs::server is included, these classes are included too:

  • nfs (server_enabled => true)

profile::reverse_proxy

Caddy is an extensible, cross-platform, open-source web server written in Go. [...] It is best known for its automatic HTTPS features. reference

This class installs and configure Caddy as a reverse proxy to expose Magic Castle cluster internal services to the Internet.

parameters

VariableDescriptionType
domain_nameDomain name corresponding to the main DNS record A registeredString
main2sub_redirSubdomain to redirect to when hitting domain name directly. Empty means no redirect.String
subdomainsSubdomain names used to create vhosts to internal http endpointsHash[String, String]
remote_ipsList of allowed ip addresses per subdomain. Undef mean no restrictions.Hash[String, Array[String]]
robots_txtContent of a robots.txt file which will be served for all hosts.String
default values
profile::reverse_proxy::domain_name: "%{alias('terraform.data.domain_name')}"
profile::reverse_proxy::subdomains:
  ipa: "ipa.%{lookup('profile::freeipa::base::ipa_domain')}"
  mokey: "%{lookup('terraform.tag_ip.mgmt.0')}:%{lookup('profile::freeipa::mokey::port')}"
  jupyter: "https://127.0.0.1:8000"
profile::reverse_proxy::main2sub_redir: "jupyter"
profile::reverse_proxy::remote_ips: {}
profile::reverse_proxy::robots_txt: "User-agent: *\nDisallow: /"
example
profile::reverse_proxy::remote_ips:
  ipa:
    - 132.203.0.0/16

profile::rsyslog::base

Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for forwarding log messages in an IP network. reference

This class installs rsyslog and launch the service.

profile::rsyslog::client

This class install and configures rsyslog service to forward the instance's logs to rsyslog servers. The rsyslog servers are discovered by the instance via Consul.

dependencies

When profile::rsyslog::client is included, these classes are included too:

profile::rsyslog::server

This class install and configures rsyslog service to receives forwarded logs from all rsyslog client in the cluster.

dependencies

When profile::rsyslog::server is included, these classes are included too:

profile::vector

This class install and configures vector.dev service to manage logs. Refer to the documentation for configuration.

parameters

VariableDescriptionTypeOptional ?
configContent of the yaml configuration fileStringYes

profile::slurm::base

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management, or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters. reference

MUNGE (MUNGE Uid 'N' Gid Emporium) is an authentication service for creating and validating credentials. It is designed to be highly scalable for use in an HPC cluster environment. reference

This class installs base packages and config files that are essential to all Slurm's roles. It also installs and configure Munge service.

parameters

VariableDescriptionType
cluster_nameName of the clusterString
munge_keyBase64 encoded Munge keyString
slurm_versionSlurm version to installEnum['24.05', '24.11', '25.05', '25.11']
os_reserved_memoryMemory in MB reserved for the operating system on the compute nodesInteger
suspend_timeIdle time (seconds) for nodes to becomes eligible for suspension.Integer
suspend_rateThe rate (nodes per minute) at which nodes are placed into power save mode.Integer
resume_timeoutMaximum time permitted (seconds) between a node resume request and its availability.Integer
resume_rateThe rate (nodes per minute) at which nodes in power save mode are returned to normal operation.Integer
force_slurm_in_pathEnable Slurm's bin path in all users (local and LDAP) PATH environment variableBoolean
enable_scrontabEnable user's Slurm-managed crontabBoolean
enable_x11_forwardingEnable Slurm's built-in X11 forwarding capabilitiesBoolean
config_addendumAdditional parameters included at the end of slurm.conf.String
log_levelLog level of all Slurm daemonEnum['quiet', 'fatal', 'error', 'info', 'verbose', 'debug', 'debug[2-5]']
default values
profile::slurm::base::cluster_name: "%{alias('terraform.data.cluster_name')}"
profile::slurm::base::munge_key: ENC[PKCS7, ...]
profile::slurm::base::slurm_version: '23.11'
profile::slurm::base::os_reserved_memory: 512
profile::slurm::base::suspend_time: 3600
profile::slurm::base::suspend_rate: 20
profile::slurm::base::resume_timeout: 3600
profile::slurm::base::resume_rate: 20
profile::slurm::base::force_slurm_in_path: false
profile::slurm::base::enable_x11_forwarding: true
profile::slurm::base::config_addendum: ''

dependencies

When profile::slurm::base is included, these classes are included too:

profile::slurm::node

This class allows some configuration for the Slurm compute nodes.

parameters

VariableDescriptionType
pam_access_groupsGroups that can access the node regardless of Slurm jobsArray[String]
default values
profile::slurm::node::pam_access_groups: ['wheel']

profile::slurm::accounting

This class installs and configure the Slurm database daemon - slurmdbd. This class also installs and configures MariaDB for slurmdbd to store its tables.

parameters

VariableDescriptionType
passwordPassword used by for SlurmDBD to connect to MariaDBString
adminsList of Slurm administrator usernamesArray[String]
accountsDefine Slurm account name and specificationsHash[String, Hash]
usersDefine association between usernames and accountsHash[String, Array[String]]
optionsDefine additional cluster's global Slurm accounting optionsHash[String, Any]
dbd_portSlurmDBD service listening portInteger
default values
profile::slurm::accounting::password: ENC[PKCS7, ...]
profile::slurm::accounting::admin: ["centos"]
profile::slurm::accounting::accounts: {}
profile::slurm::accounting::users: {}
profile::slurm::accounting::options: {}
profile::slurm::accounting::dbd_port: 6869
example

Example of the definition of Slurm accounts and their association with users:

profile::slurm::accounting::admins: ['oppenheimer']

profile::slurm::accounting::accounts:
  physics:
    Fairshare: 1
    MaxJobs: 100
  engineering:
    Fairshare: 2
    MaxJobs: 200
  humanities:
    Fairshare: 1
    MaxJobs: 300

profile::slurm::accounting::users:
  oppenheimer: ['physics']
  rutherford: ['physics', 'engineering']
  sartre: ['humanities']

Each username in profile::slurm::accounting::users and profile::slurm::accounting::admins have to correspond to an LDAP or a local users. Refer to profile::users::ldap::users and profile::users::local::users for more information.

dependencies

When profile::slurm::accounting is included, these classes are included too:

profile::slurm::controller

This class installs and configure the Slurm controller daemon - slurmctld.

parameters

VariableDescriptionType
autoscale_versionVersion of Slurm Terraform cloud autoscale software to installString
tfe_tokenTerraform Cloud API Token. Required to enable autoscaling.String
tfe_workspaceTerraform Cloud workspace id. Required to enable autoscaling.String
tfe_var_poolVariable name in Terraform Cloud workspace to control autoscaling poolString
tfe_proxy_urlTerraform Cloud proxy URL. Normally used with MCHub as proxy.Optional[String]
selinux_contextSELinux context for jobs (Slurm > 20.11)String
default values
profile::slurm::controller::autoscale_version: "0.4.0"
profile::slurm::controller::selinux_context: "user_u:user_r:user_t:s0"
profile::slurm::controller::tfe_token: ""
profile::slurm::controller::tfe_workspace: ""
profile::slurm::controller::tfe_var_pool: "pool"
example
profile::slurm::controller::tfe_token: "7bf4bd10-1b62-4389-8cf0-28321fcb9df8"
profile::slurm::controller::tfe_workspace: "ws-jE6Lq2hggNPyRJcJ"

For more information on how to configure Slurm autoscaling with Terraform cloud, refer to the Terraform Cloud section of Magic Castle manual.

dependencies

When profile::slurm::accounting is included, these classes are included too:

profile::slurm::node

This class installs and configure the Slurm node daemon - slurmd.

parameters

VariableDescriptionType
enable_tmpfs_mountsEnable spank-cc-tmpfs_mounts pluginBoolean
pam_access_groupsGroups that can access the node regardless of Slurm jobsArray[String]
default values
profile::slurm::node::enable_tmpfs_mounts: true
profile::slurm::node::pam_access_groups: ['wheel']
example
profile::slurm::node::enable_tmpfs_mounts: false

dependency

When profile::slurm::node is included, this class is included too:

profile::software_stack

This class configures the initial shell profile that user will load on login and the default set of Lmod modules that will be loaded. The software stack selected depends on the Puppet fact software_stack which is set by Magic Castle Terraform variable software_stack.

VariableDescriptionType
min_uidMininum UID value required to load the software environment init script on loginInteger
initial_profilePath to shell script initializing software environment variablesString
extra_site_env_varsMap of environment variables that will be exported before sourcing profile shell scripts.Hash[String, String]
lmod_default_modulesList of lmod default modulesArray[String]
default values
profile::software_stack::min_uid: "%{alias('profile::freeipa::server::id_start')}"

computecanada software stack

profile::software_stack::initial_profile: "/cvmfs/soft.computecanada.ca/config/profile/bash.sh"
profile::software_stack::extra_site_env_vars: {}
profile::software_stack::lmod_default_modules:
    - gentoo/2020
    - imkl/2020.1.217
    - gcc/9.3.0
    - openmpi/4.0.3

eessi software stack

profile::software_stack::initial_profile: "/cvmfs/software.eessi.io/versions/2023.06/init/Magic_Castle/bash"
profile::software_stack::extra_site_env_vars: {}
profile::software_stack::lmod_default_modules:
  - GCC

dependencies

When profile::software_stack is included, these classes are included too:

profile::squid::server

Squid is a caching and forwarding HTTP web proxy. It has a wide variety of uses, including speeding up a web server by caching repeated requests reference

This class configures and installs the Squid service. Its main usage is to act as an HTTP cache for CVMFS clients in the cluster.

parameters

VariableDescriptionType
portSquid service listening portInteger
cache_sizeAmount of disk space (MB)Integer
cvmfs_acl_regexList of allowed CVMFS stratums as regexesArray[String]
default values
profile::squid::server::port: 3128
profile::squid::server::cache_size: 4096
profile::squid::server::cvmfs_acl_regex:
  - '^(cvmfs-.*\.computecanada\.ca)$'
  - '^(cvmfs-.*\.computecanada\.net)$'
  - '^(.*-cvmfs\.openhtc\.io)$'
  - '^(cvmfs-.*\.genap\.ca)$'
  - '^(.*\.cvmfs\.eessi-infra\.org)$'
  - '^(.*s1\.eessi\.science)$'

dependencies

When profile::squid::server is included, these classes are included too:

profile::sssd::client

The System Security Services Daemon is software originally developed for the Linux operating system that provides a set of daemons to manage access to remote directory services and authentication mechanisms. reference

This class configures external authentication domains

parameters

VariableDescriptionType
domainsConfig dictionary of domains that can authenticateHash[String, Any]
access_tagsList of host tags that domain user can connect toArray[String]
deny_accessDeny access to the domains on the host including this class, if undef, the access is defined by tags.Optional[Boolean]
ldapclient_domainIdentify which domain (i.e.: a key from domains) will be used by ldap clients. if FreeIPA is installed and this parameter is left undefined, ldap client defaults to FreeIPA domain.Optional[String]
default values
profile::sssd::client::domains: { }
profile::sssd::client::access_tags: ['login', 'node']
profile::sssd::client::deny_access: ~
example
profile::sssd::client::domains:
  MyOrgLDAP:
    id_provider: ldap
    auth_provider: ldap
    ldap_schema: rfc2307
    ldap_uri:
      - ldaps://server01.ldap.myorg.net
      - ldaps://server02.ldap.myorg.net
      - ldaps://server03.ldap.myorg.net
    ldap_search_base: ou=People,dc=myorg,dc=net
    ldap_group_search_base: ou=Group,dc=myorg,dc=net
    ldap_id_use_start_tls: False
    cache_credentials: true
    ldap_tls_reqcert: never
    access_provider: ldap
    filter_groups: 'cvmfs-reserved'

The domain's keys in this example are on an indicative basis and may not be mandatory. Some SSSD domain keys might also be missing. Refer to domain sections in sssd.conf manual for more informations.

profile::swap

This class creates a swap file on non-container instances and configures the kernel swappiness setting. The swap file is created on /mnt/ephemeral0 when that mountpoint exists, otherwise it is created on /mnt.

parameters

VariableDescriptionType
sizeSize of the swap fileString
swappinessValue assigned to the vm.swappiness sysctlInteger
default values
profile::swap::size: '1 GB'
profile::swap::swappiness: 10
example
profile::swap::size: '4 GB'
profile::swap::swappiness: 20

profile::ssh::base

This class optimizer ssh server daemon (sshd) configuration to achieve an A+ audit score on https://www.sshaudit.com/.

parameters

VariableDescriptionType
disable_passwd_authIf true, disable password authenticationBoolean
default values
profile::ssh::base::disable_passwd_auth: false

profile::ssh::known_hosts

This class populates the file /etc/ssh/ssh_known_hosts with the cluster's instance ed25519 hostkeys using data provided by Terraform.

profile::ssh::hostbased_auth::client

This class allows instances to connect with SSH to instances including profile::ssh::hostbased_auth::server using SSH hostbased authentication.

profile::ssh::hostbased_auth::server

This class enables SSH hostbased authentication on the instance including it.

parameter

VariableDescriptionType
shosts_tagsTags of instances that can connect this server using hostbased authenticationArray[String]
default values
profile::ssh::hostbased_auth::server::shosts_tags: ['login', 'node']

dependency

When profile::ssh::hostbased_auth::server is included, this class is included too:

profile::users::ldap

This class allows the definition of FreeIPA users directly in YAML. The alternatives being to use FreeIPA command-line, to use the FreeIPA web interface or to use Mokey.

parameters

VariableDescriptionType
usersDictionary of users to be created in LDAPHash[profile::users::ldap_user]
groupsDictionary of groups to be created in LDAPHash[profile::users::ldap_group]

A profile::users::ldap_user is defined as a dictionary with the following keys:

VariableDescriptionTypeOptional ?
groupsList of groups the user has to be part ofArray[String]Yes
public_keysList of ssh authorized keys for the userArray[String]Yes
passwdUser's passwordStringYes
manage_passwordIf enable, agents verify the password hashes matchBooleanYes

A profile::users::ldap_group is defined as a dictionary with the following keys:

VariableDescriptionTypeOptional ?
posixWhether this is a posix group or notBooleanYes
automemberWhether users are automatically member of that groupBooleanYes
hbac_rulesList of HBAC rule names ("tag:service") that apply to this groupArray[String]Yes

By default, Puppet will manage the LDAP user(s) password and change it in LDAP if its hash no longer match to what is prescribed in YAML. To disable this feature, add manage_password: false to the user(s) definition.

default values
profile::users::ldap::users:
  'user':
    count: "%{alias('terraform.data.nb_users')}"
    passwd: "%{alias('terraform.data.guest_passwd')}"
    manage_password: true

profile::users::ldap::groups:
  'def-sponsor00':
    automember: true
    hbac_rules: ['login:sshd', 'node:sshd', 'proxy:jupyterhub-login']

If profile::users::ldap::users is present in more than one YAML file in the hierarchy, all hashes for that parameter will be combined using Puppet's deep merge strategy.

examples

A batch of 10 users, user01 to user10, can be defined as:

profile::users::ldap::users:
  user:
    count: 10
    passwd: user.password.is.easy.to.remember
    groups: ['def-sponsor00']

A single user alice which can authenticate with SSH public key only can be defined as:

profile::users::ldap::users:
  alice:
    groups: ['def-sponsor00']
    public_keys: ['ssh-rsa ... user@local', 'ssh-ed25519 ...']

Allowing LDAP users to connect to the cluster only via JupyterHub:

profile::users::ldap::groups:
  'def-sponsor00':
    hbac_rules: ['proxy:jupyterhub-login']

profile::users::local

This class allows the definition of local users outside of FreeIPA realm.

Local user's home is local to the machine where it is created and can be found at the root of the filesytem i.e.: /username. Local users are the only type of users in Magic Castle allowed to be sudoers.

parameters

VariableDescriptionType
usersDictionary of users to be created locallyHash[profile::users::local_user]

A profile::users::local_user is defined as a dictionary with the following keys:

VariableDescriptionTypeOptional ? (default)
groupsList of groups the user has to be part ofArray[String]No
public_keysList of ssh authorized keys for the userArray[String]No
sudoerIf enable, the user can sudo without passwordBooleanYes (false)
selinux_userSELinux context for the userStringYes (unconfined_u)
mls_rangeMLS Range for the userStringYes (s0-s0:c0.c1023)
authenticationmethodsSpecifies AuthenticationMethods value for this user in sshd_configStringYes
manage_homeWhether we manage the home folderBooleanYes (true)
purge_ssh_keysWhether we purge ssh keysBooleanYes (true)
shellDefault shell of the userStringYes (/bin/bash)
uidUID of the userIntegerYes (undef)
gidGID of the userIntegerYes (undef)
groupPrimary group name of the userStringNo (username)
homeHome directory of the userStringYes (/username)
default values
profile::users::local::users:
  "%{alias('terraform.data.sudoer_username')}":
    public_keys: "%{alias('terraform.data.public_keys')}"
    groups: ['adm', 'wheel', 'systemd-journal']
    sudoer: true
    authenticationmethods: 'publickey'

If profile::users::local::users is present in more than one YAML file in the hierarchy, all hashes for that parameter will be combined using Puppet's deep merge strategy.

examples

A local user bob can be defined in hieradata as:

profile::users::local::users:
  bob:
    groups: ['group1', 'group2']
    public_keys: ['ssh-rsa...', 'ssh-dsa']
    # sudoer: false
    # selinux_user: 'unconfined_u'
    # mls_range: ''s0-s0:c0.c1023'
    # authenticationmethods: 'publickey,password publickey,keyboard-interactive'

profile::volumes

This class creates and mounts LVM volume groups. Each volume is formated as XFS.

If a volume is expanded after the initial configuration, the class will not expand the LVM volume automatically. These operations currently have to be accomplished manually.

parameters

VariableDescriptionType
devicesHash of devicesHash[String, Hash[String, Hash]]
default values
profile::volumes::devices: "%{lookup('terraform.self.volumes')}"
examples
profile::volumes::devices:
  "local":
    "tmp":
      "glob": "/dev/vdc",
      "size": 100,
      #"bind_mount": true,
      #"bind_target": "/tmp",
      #"owner": "root",
      #"group": "root",
      #"mode": "0755",
      #"seltype": "home_root_t",
      #"enable_resize": false,
      #"filesystem": "xfs",
      #"quota": nil

Default quotas on a filesystem can be defined as such:

profile::volumes::devices:
  "nfs":
    "home":
      "quota":
        #"bsoft": "1g"
        "bhard": "1g"
        #"isoft": "500000"
        "ihard": "500000"