submit_job.md
March 2, 2023 ยท View on GitHub
There are two ways to submit PPML jobs:
- use PPML CLI to submit jobs manually
- use helm chart to submit jobs automatically
PPML CLI
Description
The PPML Command Line Interface is a unified tool to submit PPML spark jobs on a cluster.
Synopsis
Once a user application is bundled, it can be launched using the bigdl-ppml-submit.sh script. This script takes care of setting up SGX configuration and cluster & deploy mode, running PPML jobs in a secured environment:
./bigdl-ppml-submit.sh [options] <application-jar> [application-arguments]
Options
-
The following parameters enable Spark executor to run in SGX. Check the recommended configuration of SGX.
--sgx-enabledtrue -> enable Spark executor running on SGX, false -> native on k8s without SGX. The default value is false. Once --sgx-enabled is set as true, you should also set other sgx-related options (--sgx-log-level,--sgx-driver-memory,--sgx-driver-jvm-memory,--sgx-executor-memory,--sgx-executor-jvm-memory) otherwise PPML CLI will throw an error.--sgx-driver-jvm-memorySet the SGX driver JVM memory. The recommended setting is less than half of driver SGX EPC memory, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).--sgx-executor-jvm-memorySet the SGX executor JVM memory. The recommended setting is less than half of executor EPC memory, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g). -
Except for the above SGX options, other options are exactly the same as Spark properties
--masterThe master URL for the cluster (e.g. spark://23.195.26.187:7077)--deploy-modeWhether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)--driver-memoryAmount of memory to use for the driver process, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).--driver-coresNumber of cores to use for the driver process, only in cluster mode.--executor-memoryAmount of memory to use per executor process, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).--executor-coresThe number of cores to use on each executor.--num-executorsThe initial number of executors.--nameThe Spark application name is used by default to name the Kubernetes resources created like drivers and executors.--verbosePrint out fine-grained debugging information--classThe entry point for your application (e.g. org.apache.spark.examples.SparkPi)application-jar: Path to a bundled jar including your application and all dependencies.application-arguments: Arguments passed to the main method of your main class, if any.
Usage Examples
Submit Spark-Pi job (Spark native mode)
#!/bin/bash
bash bigdl-ppml-submit.sh \
--sgx-enabled false \
--master local[2] \
--driver-memory 32g \
--driver-cores 8 \
--executor-memory 32g \
--executor-cores 8 \
--num-executors 2 \
--class org.apache.spark.examples.SparkPi \
--name spark-pi \
--verbose \
local:///ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar 3000
Submit Spark-Pi job (Spark native mode, SGX enabled)
#!/bin/bash
bash bigdl-ppml-submit.sh \
--master local[2] \
--sgx-enabled true \
--sgx-driver-jvm-memory 12g \
--sgx-executor-jvm-memory 12g \
--driver-memory 32g \
--driver-cores 8 \
--executor-memory 32g \
--executor-cores 8 \
--num-executors 2 \
--class org.apache.spark.examples.SparkPi \
--name spark-pi \
--verbose \
local:///ppml/trusted-big-data-ml/work/spark-${SPARK_VERSION}/examples/jars/spark-examples_2.12-${SPARK_VERSION}.jar 3000
Submit Spark-Pi job (k8s client mode, SGX enabled)
#!/bin/bash
export secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin`
bash bigdl-ppml-submit.sh \
--master $RUNTIME_SPARK_MASTER \
--deploy-mode client \
--sgx-enabled true \
--sgx-driver-jvm-memory 12g \
--sgx-executor-jvm-memory 12g \
--driver-memory 32g \
--driver-cores 8 \
--executor-memory 32g \
--executor-cores 8 \
--num-executors 2 \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--class org.apache.spark.examples.SparkPi \
--name spark-pi \
--verbose \
local:///ppml/trusted-big-data-ml/work/spark-${SPARK_VERSION}/examples/jars/spark-examples_2.12-${SPARK_VERSION}.jar 3000
If you want to enable the Spark security configurations as in Spark security configurations, export secure_password before invoking PPML CLI to enable it.
export secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin`
Submit Spark-Pi job (k8s cluster mode, SGX enabled)
#!/bin/bash
export secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin`
bash bigdl-ppml-submit.sh \
--master $RUNTIME_SPARK_MASTER \
--deploy-mode cluster \
--sgx-enabled true \
--sgx-driver-jvm-memory 12g \
--sgx-executor-jvm-memory 12g \
--driver-memory 32g \
--driver-cores 8 \
--executor-memory 32g \
--executor-cores 8 \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--num-executors 2 \
--class org.apache.spark.examples.SparkPi \
--name spark-pi \
--verbose \
local:///ppml/trusted-big-data-ml/work/spark-${SPARK_VERSION}/examples/jars/spark-examples_2.12-${SPARK_VERSION}.jar 3000
If you want to enable the Spark security configurations as in Spark security configurations, export secure_password before invokeing PPML CLI to enable it.
export secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin`