Skip to the content.

Automated deployment of CycleCloud and SLURM using CLI

GitHub Pages: Here

Azure CycleCloud allows the creation of resources to run High Performance Computing (HPC) applications based on widely used job schedulers such as PBS, SLURM, and LSF. Once CycleCloud is installed, it requires a few steps in the browser including setup of site name, accept the license agreement, setup the administrator account, and add subscription access permissions of the VM principal ID. After that, if one wants to create a SLURM cluster using existing built-in template, another set of steps is required; via either browser or command line.

What if a single script could handle all these steps in a single execution? Here we describe the key steps to achieve this automation and the script itself, which can be used as is—for quick provisioning of CycleCloud service—or serve as a building block to create more sophisticated clusters/automation. The script relies only on Command Line Interface (CLI) from both Azure and CycleCloud.

TL;TR

Here is the git repository that contains the script:

Example of execution from a user laptop:

Assumptions

1. How to run the script

Before running the automation script we need to setup a few environment variables; really a few ;-).

Modify setvars.sh to customize deployment variables, which are related to names of resource group, storage account, keyvault, among others. Type the following command to setup the variables before deployment.

source setvars.sh

In the automation process we also need two variables: CCPASSWORD and CCPUBKEY, which are not in setvars.sh. They will be automatically stored in keyvault and collected by cloud-init when provisioning the cyclecloud VM.

During the execution of the automation script, you will be asked about these two variables if they are not set. For CCPASSWORD you will be asked for two passwords that match and for CCPUBKEY, it will try to get the key from $HOME/.ssh/id_rsa.pub or you can provide an alternative file.

If you don’t want any interaction when executing the automation you can simply run:

export CCPASSWORD=<mygreatpassword>
export CCPUBKEY=$(cat ~/.ssh/id_rsa.pub)

You have two options to run the script:

Option 1: Provision and setup only CycleCloud without any cluster:

./cyclecloud_cli.sh

Option 2: Provision and setup of CycleCloud with a SLURM cluster ready for job submission (just pass a clustername):

./cyclecloud_cli.sh <clustername>

Once it is done you will receive three access points: (i) CycleCloud VM URL to access it via browser (ii) CycleCloud VM IP to access it via SSH; and (iii) cluster scheduler IP to access it via SSH for job submission in case you specified the clustername parameter.

No VPN setup

If you don’t have VPN setup, no problem, just make sure you comment these lines in setvars.sh:

# export VPNRG=myvpnrg
# export VPNVNET=myvpnvnet

Resources will be provisioned without VPN, but you won’t be able to check when the cyclecloud VM is provisioned nor the cluster is ready.

By doing so your output would be something like

Create resource group: mydemo                                         [done]
Create VNET/VSUBNET: mydemoVNET/mydemoSUBNET                          [done]
VPNRG/VPNVNET required for testing cyclecloud access                  [warning]
Create storage account: mydemosa                                      [done]
Create keyvault: mydemokv                                             [done]
Set keyvault secrets                                                  [done]
Start CycleCloud VM provisioning request                              [done]
Add VM principal ID access to subscription                            [done]
Add VM principal ID permission to keyvault: mydemokv                  [done]
CycleCloud access when cloud-init is done                             [info]
CycleCloud via SSH: ssh [-i <privsshkey>] azureuser@10.49.0.4         [info]
CycleCloud via WEB: http://10.49.0.4:8080                             [info]
Cannot test cyclecloud/cluster access as no VPN peer was established  [info]

2. Behind the scenes: provisioning and setup of CycleCloud

The script starts by making sure the required environment variables are setup, including the CycleCloud admin password and ssh public key. Next performs deployment of basic building blocks, including creation of resource group, provisioning of VNET and SUBNET, peering of VNET between the new VNET and your VPN VNET, provisioning of storage account for CycleCloud, and the keyvault.

Then it provisions the CycleCloud VM with the following Azure CLI command:

az vm create -n $VMNAME \
    -g $RG \
    --image $VMIMAGE \
    --size $SKU \
    --vnet-name $VMVNETNAME \
    --subnet $VMSUBNETNAME \
    --public-ip-address "" \
    --admin-username $ADMINUSER \
    --assign-identity \
    --generate-ssh-keys \
    --custom-data $CLOUDINITFILE

So the key component here is the --custom-data argument, which defines the cloud-init file. Cloud-init is a utility for automating the initialization of cloud instances at VM boot time. The automation script generates the cloud-init transparently. Here are few highlights of this auto-generated file. Its full content can be seen in the automation script.

One part of this cloud-init file contains a setup of commands to be executed:

runcmd:
    # Install CycleCloud
    - apt-get -y install gnupg2
    - wget -qO - https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
    - echo 'deb https://packages.microsoft.com/repos/cyclecloud bionic main' > /etc/apt/sources.list.d/cyclecloud.list
    - apt-get update
    - apt-get install -yq cyclecloud8=8.4.0-3122
    - /opt/cycle_server/cycle_server await_startup

    # Collect and process admin password and ssh public key
    - bash /tmp/azcliinstaller.sh
    - az login --identity --allow-no-subscriptions
    - CCPASSWORD=$(az keyvault secret show --name ccpassword --vault-name $KEYVAULT --query 'value' -o tsv)
    - CCPUBKEY=$(az keyvault secret show --name ccpubkey --vault-name $KEYVAULT --query 'value' -o tsv)
    - escaped_CCPASSWORD=\$(printf '%s\n' "$CCPASSWORD" | sed -e 's/[]\/\$*.^[]/\\\&/g')
    - escaped_CCPUBKEY=\$(printf '%s\n' "$CCPUBKEY" | sed -e 's/[]\/\$*.^[]/\\\&/g')
    - sed -i "s/CCPASSWORD/$escaped_CCPASSWORD/g" /tmp/${CYCLECLOUDACCOUNTFILE}
    - sed -i "s/CCPUBKEY/$escaped_CCPUBKEY/g" /tmp/${CYCLECLOUDACCOUNTFILE}

    # Setup CycleCloud
    - mv /tmp/$CYCLECLOUDACCOUNTFILE /opt/cycle_server/config/data/
    - apt-get install -yq unzip python3-venv
    - unzip /opt/cycle_server/tools/cyclecloud-cli.zip -d /tmp
    - python3 /tmp/cyclecloud-cli-installer/install.py -y --installdir /home/${cyclecloud_admin_name}/.cycle --system
    - runuser -l ${cyclecloud_admin_name} -c "/usr/local/bin/cyclecloud initialize --loglevel=debug --batch --url=http://localhost:8080 --verify-ssl=false --username=\"$cyclecloud_admin_name\" --password=\"$CCPASSWORD\""
    - mv /tmp/$AZURESUBSCRIPTIONFILE /opt/cycle_server/
    - runuser -l ${cyclecloud_admin_name} -c '/usr/local/bin/cyclecloud account create -f /opt/cycle_server/$AZURESUBSCRIPTIONFILE'
    - rm -f /opt/cycle_server/config/data/${CYCLECLOUDACCOUNTFILE}.imported

The script sets up the password and public key in the CycleCloud account json file, installs CycleCloud package, and waits until the CycleCloud server is up and running. After that it installs the CycleCloud CLI, and completes the CycleCloud account setup. Some of these steps require two files that are automatically created by cloud-init:

write_files:

 - path: /tmp/$CYCLECLOUDACCOUNTFILE
   content: |
     [
       {
         "AdType": "Application.Setting",
         "Name": "cycleserver.installation.initial_user",
         "Value": "${cyclecloud_admin_name}"
       },
       {
         "AdType": "AuthenticatedUser",
         "Name": "${cyclecloud_admin_name}",
         "RawPassword": "CCPASSWORD",  <<--- from Azure Keyvault at VM first boot time
         "Superuser": true
       },
       {
         "AdType": "Credential",
         "CredentialType": "PublicKey",
         "Name": "${cyclecloud_admin_name}/public",
         "PublicKey": "CCPUBKEY"       <<--- from Azure Keyvault at VM first boot time
       },
       {
         "AdType": "Application.Setting",
         "Name": "cycleserver.installation.complete",
         "Value": true
       }
     ]

 - path: /tmp/$AZURESUBSCRIPTIONFILE
   content: |
     {
       "Environment": "public",
       "AzureRMUseManagedIdentity": true,
       "AzureResourceGroup": "${cyclecloud_rg}",
       "AzureRMApplicationId": " ",
       "AzureRMApplicationSecret": " ",
       "AzureRMSubscriptionId": "${azure_subscription_id}",
       "AzureRMTenantId": " ${azure_tenant_id}",
       "DefaultAccount": true,
       "Location": "${cyclecloud_location}",
       "Name": "${cyclecloud_subscription_name}",
       "Provider": "azure",
       "ProviderId": "${azure_subscription_id}",
       "RMStorageAccount": "${cyclecloud_storage_account}",
       "RMStorageContainer": "${cyclecloud_storage_container}",
       "AcceptMarketplaceTerms": true
     }

3. Behind the scenes: provisioning and setup of SLURM cluster

If one wants the SLURM cluster as well, then an additional line is appended to the runcmd cloud-init part: - bash $CREATECLUSTERFILE—here $CREATECLUSTERFILE is an auto-generated script to handle the SLURM provisioning:

#!/bin/bash

echo "Setting up the slurm cluster!"
SLURMTEMPLATE=$(runuser -l $ADMINUSER -c 'cyclecloud show_cluster  -t' | grep  'slurm.*template' | awk '{print $1}' )
echo "SLURMTEMPLATE=$SLURMTEMPLATE"

runuser -l $ADMINUSER -c 'cyclecloud show_cluster  -t' | grep  'slurm.*template'  | awk '{print $1}'
SLURMTEMPLATE=$(runuser -l $ADMINUSER -c 'cyclecloud show_cluster  -t' | grep  "slurm.*template" | cut -d':' -f1)
runuser -l $ADMINUSER -c "cyclecloud create_cluster $SLURMTEMPLATE $CLUSTERNAME -p /tmp/$CLUSTERPARAMETERFILE"
runuser -l $ADMINUSER -c "cyclecloud start_cluster $CLUSTERNAME"

echo "Waiting for scheduler to be up-and-running..."
...
get_state(){ runuser -l $ADMINUSER -c "cyclecloud show_nodes scheduler -c $CLUSTERNAME --states='Started' --output='%(Status)s'" ; }

 for (( r=1; r<=max_retries; r++ )); do
    schedulerstate=$(get_state)
    ...
    # test states until scheduler is Ready

    if [ "$schedulerstate" == "Ready" ]; then
           echo "Scheduler provisioned"
           break
    fi
done

The $CLUSTERPARAMETERFILE file is also auto-generated by the cloud-init process:

- path: /tmp/$CLUSTERPARAMETERFILE
  content: |
    {
        "Credentials": "${cyclecloud_subscription_name}",
        "SubnetId": "${RG}/${VMVNETNAME}/${VMSUBNETNAME}",
        "ReturnProxy": false,
        "UsePublicNetwork": false,
        "ExecuteNodesPublic": false,
        "Region": "${REGION}",
        "AdditionalNFSExportPath": null,
        "AdditionalNFSMountPoint": null,
        "DynamicSpotMaxPrice": null,
        "HTCImageName" : "$CLUSTERIMAGE",
        "HPCImageName" : "$CLUSTERIMAGE",
        "SchedulerImageName" : "$CLUSTERIMAGE",
        "DynamicImageName" : "$CLUSTERIMAGE"
    }

4. Final remarks

One may want to quickly provision CycleCloud (with SLURM cluster) for testing, but does not want rely on a browser to go through some steps. Alternatives with terraform or bicep are also available (see References). With CLI, one can see the step-by-step procedures to have such a deployment which come in handy under certain circumstances.

The script is publicly available, and further improvements are expected. One can modify/expand it to handle different use cases. This initial version focuses on having a simple deployment with an interesting user experience.


References

  1. azure cyclecloud: https://learn.microsoft.com/en-us/azure/cyclecloud/overview
  2. azure cyclecloud marketplace install: https://learn.microsoft.com/en-us/azure/cyclecloud/qs-install-marketplace?view=cyclecloud-8
  3. azure cyclecloud manual install: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/install-manual?view=cyclecloud-8
  4. managed identities overview: https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview
  5. managed identity in cyclecloud: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/managed-identities?view=cyclecloud-8
  6. cyclecloud terraform automation: https://github.com/yosoyjay/cyclecloud-llm/tree/main/cyclecloud
  7. cyclecloud bicep automation: https://techcommunity.microsoft.com/t5/azure-high-performance-computing/automate-the-deployment-of-your-cyclecloud-server-with-bicep/ba-p/3668769
  8. cyclecloud bicep automation: https://github.com/edwardsp/cyclecloud-bicep/tree/main
  9. cyclecloud azhop automation: https://github.com/Azure/az-hop/blob/main/playbooks/roles/cyclecloud/files/configure.py
  10. lockdown network: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/running-in-locked-down-network?view=cyclecloud-8
  11. cyclecloud cluster templates: https://learn.microsoft.com/en-us/training/modules/customize-clusters-azure-cyclecloud/2-describe-templates
  12. cyclecloud projects: https://learn.microsoft.com/en-us/training/modules/customize-clusters-azure-cyclecloud/5-customize-software-installations