Data Product Analytics - Azure DevOps Deployment

April 1, 2022 ยท View on GitHub

In the previous step we have generated a JSON output similar to the following, which will be required in the next steps:

{
  "clientId": "<GUID>",
  "clientSecret": "<GUID>",
  "subscriptionId": "<GUID>",
  "tenantId": "<GUID>",
  (...)
}

Create Service Connection

First, you need to create an Azure Resource Manager service connection. To do so, execute the following steps:

  1. First, you need to create an Azure DevOps Project. Instructions can be found here.

  2. In Azure DevOps, open the Project settings.

  3. Now, select the Service connections page from the project settings page.

  4. Choose New service connection and select Azure Resource Manager.

    ARM Connection

  5. On the next page select Service principal (manual).

  6. Select the appropriate environment to which you would like to deploy the templates. Only the default option Azure Cloud is currently supported.

  7. For the Scope Level, select Subscription and enter your subscription Id and name.

  8. Enter the details of the service principal that we have generated in step 3. (Service Principal ID = clientId, Service Principal Key = clientSecret, Tenant ID = tenantId) and click on Verify to make sure that the connection works.

  9. Enter a user-friendly Connection name to use when referring to this service connection. Take note of the name because this will be required in the parameter update process.

  10. Optionally, enter a Description.

  11. Click on Verify and save.

    Connection DevOps

More information can be found here.

Update Parameters

In order to deploy the Infrastructure as Code (IaC) templates to the desired Azure subscription, you will need to modify some parameters in the forked repository. Therefore, this step should not be skipped for neither Azure DevOps/GitHub options. There are two files that require updates:

  • .ado/workflows/dataProductDeployment.yml and
  • infra/params.dev.json.

Update these files in a separate branch and then merge via Pull Request to trigger the initial deployment.

Configure dataProductDeployment.yml

To begin, please open the .ado/workflows/dataProductDeployment.yml. In this file you need to update the variables section. Just click on .ado/workflows/dataProductDeployment.yml and edit the following section:

variables:
  AZURE_RESOURCE_MANAGER_CONNECTION_NAME: "integration-product-service-connection" # Update to '{resourceManagerConnectionName}'
  AZURE_SUBSCRIPTION_ID: "2150d511-458f-43b9-8691-6819ba2e6c7b"                    # Update to '{dataLandingZoneSubscriptionId}'
  AZURE_RESOURCE_GROUP_NAME: "dlz01-dev-dp001"                                     # Update to '{dataLandingZoneName}-rg'
  AZURE_LOCATION: "North Europe"                                                   # Update to '{regionName}'

The following table explains each of the parameters:

ParameterDescriptionSample value
AZURE_SUBSCRIPTION_IDSpecifies the subscription ID of the Data Management Landing Zone where all the resources will be deployed
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_LOCATIONSpecifies the region where you want the resources to be deployed. Please check Supported Regionsnortheurope
AZURE_RESOURCE_GROUP_NAMESpecifies the name of an existing resource group in your data landing zone, where the resources will be deployed.my-rg-name
AZURE_RESOURCE_MANAGER _CONNECTION_NAMESpecifies the resource manager connection name in Azure DevOps. You can leave the default value if you want to use GitHub Actions for your deployment. More details on how to create the resource manager connection in Azure DevOps can be found further above or here.my-connection-name

Configure params.dev.json

To begin, please open the infra/params.dev.json. In this file you need to update the variable values. Just click on infra/params.dev.json and edit the values. An explanation of the values is given in the table below:

ParameterDescriptionSample value
locationSpecifies the location for all resources.northeurope
environmentSpecifies the environment of the deployment.dev, tst or prd
prefixSpecifies the prefix for all resources created in this deployment.prefi
tagsSpecifies the tags that you want to apply to all resources.{key: value}
processingServiceSpecifies the data engineering service that will be deployed (Data Factory, Synapse).dataFactory or synapse
datalakeFileSystemIdsSpecifies the list of Resource IDs of Data Lake Gen2 containers which will be connected as datastores in the Machine Learning workspace. If you do not want to connect any datastores, provide an empty list.[/subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Storage/storageAccounts/{storage-name}/blobServices/default/containers/{container-name}]
aksIdSpecifies the object ID of the user who gets assigned to compute instance 001 in the Machine Learning Workspace./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.ContainerService/managedClusters/{aks-name}
externalContainerRegistryIdSpecifies the resource ID of a Conatiner Registry to which the Machine Learning MSI can be assigned. If you do not want to connect an external Container Registry, leave this value empty as is. Also set enableRoleAssignments to true to enable this./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.ContainerRegistry/registries/{containerRegistry-name}
machineLearningComputeInstance001AdministratorObjectIdSpecifies the object ID of the user who gets assigned to compute instance 001 in the Machine Learning Workspace.my-aad--user-object-id
machineLearningComputeInstance001AdministratorPublicSshKeySpecifies the public SSH key for compute instance 001 in the Machine Learning Workspace. Use a secret for this parameter and overwrite as part of the deployment pipelines.my-aad--user-object-id
administratorPasswordSpecifies the administrator password of the sql servers. Will be automatically set in the workflow. Leave this value as is.<your-secure-password>
synapseDefaultStorageAccountFileSystemIdSpecifies the Resource ID of the default Storge Account file system for Synapse./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Storage/storageAccounts/{storage-name}/blobServices/default/containers/{container-name}
purviewIdSpecifies the Resource ID of the central Purview instance./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Purview/accounts/{purview-name}
purviewManagedStorageIdSpecifies the Resource ID of the managed storage of the central purview instance./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Storage/storageAccounts/{storage-account-name}
purviewManagedEventHubIdSpecifies the Resource ID of the managed event hub of the central purview instance./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.EventHub/namespaces/{eventhub-namespace-name}
databricksWorkspaceIdSpecifies the Resource ID of the Databricks workspace that will be connected to the Machine Learning Workspace./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Databricks/workspaces/{databricks-name}
databricksWorkspaceUrlSpecifies the workspace URL of the Databricks workspace that will be connected to the Machine Learning Workspace.adb-{databricks-workspace-id}.azuredatabricks.net
databricksAccessTokenSpecifies the access token of the Databricks workspace that will be connected to the Machine Learning Workspace. Use a secret for this parameter and overwrite as part of the deployment pipelines./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Purview/accounts/{purview-name}
enableRoleAssignmentsSpecifies whether role assignments should be enabled.true or false
cognitiveServiceKindsSpecifies the cognitive service kind that will be deployed.[FormRecognizer, LUIS]
enableSearchSpecifies whether Azure Search should be deployed as part of the template.true or false
enableMonitoringSpecifies whether key monitoring components like Azure Dashboard, metrics and alerts are enabled.true or false
dataProductTeamEmailEmail ID of the group to receive monitoring alerts.email@domian.com
subnetIdSpecifies the resource ID of the subnet to which all services will connect./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/virtualNetworks/{vnet-name}/subnets/{subnet-name}
privateDnsZoneIdKeyVaultSpecifies the Resource ID of the private DNS zone for KeyVault./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.vaultcore.azure.net
privateDnsZoneIdSynapseDevSpecifies the Resource ID of the private DNS zone for Synapse Dev./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.dev.azuresynapse.net
privateDnsZoneIdSynapseSqlSpecifies the Resource ID of the private DNS zone for Synapse Sql./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.sql.azuresynapse.net
privateDnsZoneIdDataFactorySpecifies the Resource ID of the private DNS zone for Data Factory./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.datafactory.azure.net
privateDnsZoneIdDataFactoryPortalSpecifies the Resource ID of the private DNS zone for Data Factory Portal./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.adf.azure.com
privateDnsZoneIdCognitiveServiceSpecifies the Resource ID of the private DNS zone for Cognitive Services./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.cognitiveservices.azure.com
privateDnsZoneIdContainerRegistrySpecifies the Resource ID of the private DNS zone for Container Registry./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.azurecr.io
privateDnsZoneIdSearchSpecifies the Resource ID of the private DNS zone for Azure Search./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.search.windows.net
privateDnsZoneIdBlobSpecifies the resource ID of the private DNS zone for Blob Storage./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.blob.core.windows.net
privateDnsZoneIdFileSpecifies the resource ID of the private DNS zone for File Storage./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.file.core.windows.net
privateDnsZoneIdMachineLearningApiSpecifies the Resource ID of the private DNS zone for Machine Learning API./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.api.azureml.ms
privateDnsZoneIdMachineLearningNotebooksSpecifies the Resource ID of the private DNS zone for Machine Learning Notebooks./subscriptions/{subscription-id}/resourceGroups/{rg-name}/providers/Microsoft.Network/privateDnsZones/privatelink.notebooks.azure.net

Install Azure DevOps Pipelines GitHub Application

First you need to add and install the Azure Pipelines GitHub App to your GitHub account. To do so, execute the following steps:

  1. Click on Marketplace in the top navigation bar on GitHub.

  2. In the Marketplace, search for Azure Pipelines. The Azure Pipelines offering is free for anyone to use for public repositories and free for a single build queue if you're using a private repository.

    Install Azure Pipelines on GitHub

  3. Select it and click on Install it for free.

    GitHub Template repository

  4. If you are part of multiple GitHub organizations, you may need to use the Switch billing account dropdown to select the one into which you forked this repository.

  5. You may be prompted to confirm your GitHub password to continue.

  6. You may be prompted to log in to your Microsoft account. Make sure you log in with the one that is associated with your Azure DevOps account.

Configuring the Azure Pipelines project

As a last step, you need to create an Azure DevOps pipeline in your project based on the pipeline definition YAML file that is stored in your GitHub repository. To do so, execute the following steps:

  1. Select the Azure DevOps project where you have setup your Resource Manager Connection.

  2. Select Pipelines and then New Pipeline in order to create a new pipeline.

    Create Pipeline in DevOps

  3. Choose GitHub YAML and search for your repository (e.g. "GitHubUserName/RepositoryName").

    Choose code source in DevOps Pipeline

  4. Select your repository.

  5. Click on Existing Azure Pipelines in YAML file

  6. Select main as branch and /.ado/workflows/dataHubDeployment.yml as path.

    Configure Pipeline in DevOps

  7. Click on Continue and then on Run.

Merge these changes back to the main branch of your repository

After following the instructions and updating the parameters and variables in your repository in a separate branch and opening the pull request, you can merge the pull request back into the main branch of your repository by clicking on Merge pull request. Finally, you can click on Delete branch to clean up your repository. By doing this, you trigger the deployment workflow.

Follow the workflow deployment

Congratulations! You have successfully executed all steps to deploy the template into your environment through Azure DevOps.

Now, you can navigate to the pipeline that you have created as part of step 5 and monitor it as each service is deployed. If you run into any issues, please check the Known Issues first and open an issue if you come across a potential bug in the repository.

Previous Next