Get Apache Airflow on IBM Cloud

October 13, 2020 ยท View on GitHub

We will deploy Apache Airflow on an IBM Cloud Kubernetes Cluster

  • Prerequisites :
    • You should have an IBM Cloud account, otherwise you can register here.
  1. Provisioning a new Kubernetes Cluster, if already have one skip to step 2
  2. Deploying the IBM Cloud Block Storage plug-in, if already have it skip to step 3
  3. Deploying Apache Airflow

Step 1 provisioning a new Kubernetes Cluster

  • Click the Catalog button on the top

  • Select Service from the left in the catalog

  • Search for Kubernetes Service and click on it Kubernetes

  • At the Kubernetes deployment page, we will specify our deployment details

  • Choose a plan standard or free, the free plan only has one worker node and no subnet, to provision a standard cluster, you will need to upgrade you account to Pay-As-You-Go

    • To upgrade to a Pay-As-You-Go account, complete the following steps:

    • In the console, go to Manage > Account.

    • Select Account settings, and click Add credit card.

    • Enter your payment information, click Next, and submit your information

  • Choose classic or VPC, read the docs and choose the most suitable type for yourself VPC

  • Please decide on your deployment's location parameters , for more information please visit Locations

    • Choose Geography (continent) continent
    • Choose Single or Multizone, in single zone your data is only kept in one datacenter, with Multizone your data is kept on multiple sites for more security avail
    • Choose a Worker Zone if using Single zones or Metro if Multizone worker
      • If you wish to use Multizone please set up your account with VRF or enable Vlan spanning
      • At your current location selection, it is possible there is no Virtual LAN currently available, then a new Vlan will be created for you
  • Choose a Worker node setup or use the preselected one, set Worker node amount per zone worker-pool

  • Choose Master Service Endpoint, In VRF-enabled accounts, you can choose private-only to make your master accessible on the private network or via VPN tunnel. Choose public-only to make your master publicly accessible. When you have a VRF-enabled account, your cluster is set up by default to use both private and public endpoints. For more information visit endpoints. endpoints

  • Give cluster a name

name-new

  • Give desired tags to your cluster, for more information visit tags

tags-new

  • Click create create-new

  • Wait for you cluster to be provisioned cluster-prepare

  • Your cluster is ready for usage

cluster-ready

Step 2 deploy IBM Cloud Block Storage plug-in

The Block Storage plug-in is a persistent, high-performance iSCSI storage that you can add to your apps by using Kubernetes Persistent Volumes (PVs).

  • Click the Catalog button on the top

  • Select Software from the catalog

  • Search for IBM Cloud Block Storage plug-in and click on it Block

  • On the application page Click in the dot next to the cluster, you wish to use

  • Click on Enter or Select Namespace and choose the default Namespace or use a custom one (if you get error please wait 30 minutes for the cluster to finalize) block-c

  • Give a name to this workspace

  • Click install and wait for the deployment block-create

Step 3 Deploy Apache Airflow

In this step we will deploy Apache Airflow on our cluster

  • Click the Catalog button on the top

  • Select Software from the left in the catalog

  • Search for Apache Aifrlow and click on it Search

  • On the application page Click in the dot next to the cluster we just created or use an existing one Cluster

  • Click on Enter or Select Namespace and choose one of the default Namespaces or use a custom one Namespace

  • Give a unique name to your workspace

Name

  • Select which resource group you want to use, it is for access controll and billing purposes. For more information please visit resource groups

apache-resource

  • Here you can give tags to your apache airflow workspace, which will affect your deployment. For more information visit tags

apache-tags

  • Click on Parameters with default values, You can set deployment values or use the default ones

def-val

  • Please tick the box next to the agreements and click install

Install

  • Your apache airflow workspace will start installing, please wait a couple of minutes for the deployment to finish

airflow-progress

  • You apache airflow workspace has been successfully deployed

airflow-finsihed

Verify Apache Airflow installation

  • Go to Resources in your browser

  • Click on Clusters

  • Click on your Cluster Resourcelect

  • Now you are at you cluster's overview, here Click on Actions on the top right and click on Web terminal from the dropdown menu

Actions

  • Click install, then wait couple of minutes

terminal-install

  • Click on Actions

  • Click Web terminal --> a terminal will open up

  • Type in the terminal, please change NAMESPACE to the namespace you choose at the deployment setup:

$ kubectl get ns

get-ns

$ kubectl get pod -n NAMESPACE -o wide 

get-pod

$ kubectl get service -n NAMESPACE

get-service

  • Your running Apache Airflow services will be visible

You successfully deployed Apache Airflow on IBM Cloud!