Start Azure Pipeline from another pipeline with ADO CLI & PowerShell

Are you looking to find the one simple command that can kick off a pipeline from another pipeline in Azure DevOps? You may have found a lot of annoying restrictions using straight YAML because so many things have to be known ahead-of-time (i.e. set using compile-time variables). This prevents you from doing such things as conditionally running pipelines, utilizing arrays defined at runtime as parameters into many ADO tasks/commands, or performing manipulations on runtime data for use in subsequent commands. You can't even kick off a pipeline without the previous one finishing (and invoking a pipeline trigger). Follow along as I explore a scripted solution to solve my problem.


My Use Case


I want to provide fresh Databricks images on a private Azure Container Registry (ACR). This way, data scientists are not confounded by changes to Databricks runtimes when their clusters start and stop, since changes can prevent their package & library installation scripts from running successfully. The pipeline is configured to look for updates to the Databricks image in Docker Hub, and then will pull it in and run a script that installs the desired data science libraries into the image stored on our ACR. This helps the Databricks runtimes start faster because when launching the normal way (not using a Docker image), libraries are loaded at cluster start time, and some R packages can take several minutes or even hours to install. At this time, the pipeline is only kicked off for Databricks updates, not for dependency updates.

A flowchart of what the master ADO pipeline is doing.

To Script, or Not To Script?


Azure DevOps offers a rich set of configurations you can specify in YAML. In many cases, people can simply use YAML in their pipelines without any scripts because of the plethora of predefined "tasks" that templatize common activities you may wish to run in your pipeline. You can simply pass the desired configuration to your task, rather than having to specify a bunch of boilerplate code such as logins or connections to servers and services. Some of the coolest ones facilitate Kubernetes and Helm installs, and push Docker images to an ACR (the purpose of my secondary pipeline) using a service connection rather than explicit login credentials.

However, there may still be instances where you need to automate something by running scripts in Bash or PowerShell. This is particularly useful for calling APIs and parsing output. Scripting is also helpful for performing actions on entities known only at runtime rather than compile time, such as iterating over an array of files in a directory or using a runtime variable in a conditional statement. Since these things are, to my knowledge, impossible to perform in YAML, I must turn to scripting for this important chunk of my automation.

Two Ways to Invoke Azure DevOps CLI


There are two Azure DevOps tasks available for automating Azure DevOps CLI commands within a pipeline: PowerShell@2 and AzureCLI@2. The former represents the standard method of engagement that one would get with the execution agent in a pipeline runner such as CircleCI or GitHub Actions. The latter offers a sliver of convenience in that you must also specify an Azure DevOps service connection that links to an Azure Active Directory service principal, thus allowing your Azure CLI to access permissioned resources without you having to dedicate steps in your script to login, and then worrying about managing the permissions/credentials therein. In my case, there are no commands requiring permissioned resources, so I can get away with using the more simplistic PowerShell@2 task.

No matter which task you select, remember that the only way to gain access to Azure DevOps is by using a Personal Access Token (PAT). These have the disadvantage that they expire after a while (and thus you must remember to renew them), and they are tied to a specific user account and not a service principal in Active Directory that you can create using infrastructure as code. (That said, you may be able to create the notion of a service user, grant them ADO access, make a PAT, and go about it that way.) Nevertheless, the Azure DevOps CLI will acknowledge a PAT you have set up as an environment variable called $env:AZURE_DEVOPS_EXT_PAT so you do not have to run az login. You can conveniently specify the environment variable by adding it to the env parameter within the inputs parameter of your task. Now, you can store the PAT in a variable group, found with the Library settings of ADO Pipelines. $(adoPAT)is defined in a couple variable groups that are named after possible deployment environments.

Here is a simple example illustrating both task types that will accomplish the same thing:

stages:
  - stage: myStage
    pool: windows
jobs:
  - job: AzCLI
    steps:
  - task: AzureCLI@2
    inputs:
  azureSubscription: MyServiceConnection
  scriptType: ps
  inlineScript: |
    az config set extension.use_dynamic_install=yes_without_prompt
az devops configure --defaults organization=https://dev.azure.com/Myorganization project=MyADOProject
az pipelines runs list
  env:
    AZURE_DEVOPS_EXT_PAT: $(adoPAT)
  - job: Psh
    steps:
  - task: PowerShell@2
    inputs:
  targetType: inline
  script: |
      az config set extension.use_dynamic_install=yes_without_prompt
az devops configure --defaults organization=https://dev.azure.com/Myorganization project=MyADOProject
az pipelines runs list
  env:
    AZURE_DEVOPS_EXT_PAT: $(adoPAT)

These two jobs perform the same thing, so use whichever one gives you the functionality you need. Remember the difference is that AzureCLI@2 references an ADO service connection. Also, note that Azure DevOps CLI actually comes in as an extension to Azure CLI. As such, we need to specify that the agent can install extensions without user intervention. However, calling az devops will automatically cause the agent to install the extension without us explicitly having to install it already. Simply configure it to your organization and project, and you will be ready to run the advanced commands.

Starting Pipelines From Other Pipelines


There are YAML specifications that can trigger pipelines from other pipelines, but they come with a hefty set of limitations that make them too rigid for my use case. I want to have one pipeline that knows about all the Docker images that need to be rebuilt; not one pipeline for each image, as this requires lots more boilerplate at the initialization of a new Docker image repository, plus a hefty maintenance cost and the likelihood of copy/paste errors. This is why it was so important to get Azure DevOps working, as we explored above. Now, let's dig into the meat of the script.

Get-ChildItem -Path .\ -Filter Dockerfile -Recurse -File -Name | ForEach-Object {
  $dockerbase = Get-Content -Path $_ -TotalCount 1
  $dockerurl = $dockerbase.Replace(":", "/tags/").replace("FROM ", "https://hub.docker.com/v2/repositories")
  $request = Invoke-WebRequest -UseBasicParsing -URI $dockerurl
  $data = ConvertFrom-Json $request.content
  $lastUpdated = data | Select -ExpandProperty last_updated
  $span = New-TimeSpan -Start $lastUpdated
  $imagePath = $_.replace("\Dockerfile", "")
  if ($span.Days -le 7) {
    echo "Rebuilding image $imagePath"
    # If it's been less than 7 days since the base image was updated, rebuild our image
    az pipelines run --id 1580 --branch refs/heads/master --parameters image="$imagePath" environment="dev"
  } else {
    echo "Image $imagePath does not need update"
  }
}

My real pipeline does not contain az pipelines runs list, as shown in the first example, but instead the script closely resembles this latest example. First, we need to discover each Dockerfile anywhere within our repository's directory structure, and fetch the first line of it. This is where the PowerShell command Get-ChildItem with the -Filter and -Recurse options comes in particularly handy. Pipe the output into the next command, ForEach-Object, and then specify the commands to run on each object within curly braces. Since each object is the path to a Dockerfile, we can use Get-Content with -TotalCount 1 to return the first line of the Dockerfile, which will always consist of FROM <image-tag>:latest. We store this in the script variable $dockerbase.

Once we have the image tag, we need to manipulate it into a Docker Hub URL so that we can find the status of the tag; in particular, its last_updated information. String manipulation is performed in PowerShell with $string.replace("subject", "replacement"). Invocations return a string, rather than operate on the original string. As such, replacements can be chained, but the output must be stored in a variable to be used later. Now that we have the Docker Hub URL, let's call it with the Invoke-WebRequest command, which achieves the same goal as curl in bash. If you have not run a browser through the UI of the pipeline's execution agent yet, then you will need to include -UseBasicParsing so it does not block you by trying to configure Edge Browser for the first time. Finally, to arrive at the last updated time of the Docker image, use ConvertFrom-Json on the request content to make $data, and then pipe $data through Select -ExpandProperty last_updated to obtain the last updated time.

Now that we have the last updated time of the image, we can use some date functions to compare that to the current date and see if it has been updated within the last 7 days. By making a New-Timespan called $span and only specifying the -Start as $lastUpdated, the ending time is automatically now (the current time), and you can easily find the number of days with $span.Days. Then, it is easy to compare this value by using comparison operators such as -le, -ge, or -eq in an if() statement.

At last, if the comparison proves true, we can kick off the pipeline with the Azure DevOps CLI command az pipelines run. In this command, we can specify the pipeline by name or by --id (as shown), plus provide parameters just as if we were selecting them from the Azure DevOps Console when starting a pipeline run. This is how you can conditionally start numerous pipelines from another pipeline, without the original pipeline having to be finished to invoke a trigger into the next pipeline.

Other Concerns


Waiting on each pipeline in the loop to finish before kicking off subsequent ones could be something you would like to do in your execution environment. In my case, it is not something I wish to do. However, if I wanted to, it would be easy enough to use various Azure DevOps CLI az pipelines runs commands to identify the current run, grab its status, and hold the parent pipeline in a spin lock with a PowerShell loop and timer before querying for the status again and making the decision to proceed or keep waiting.

It is also currently a bit out of my scope to look for updates in the other packages we are trying to install on top of the base Databricks image. For the image, we can fetch the information from Docker Hub easily. We could potentially use a similar process to identify all the packages in a Dockerfile and check any particular repository's API for "last updated" information. This way, we could also trigger the update if the Databricks base image or any supplementary packages have been updated in their respective repositories.

Illustration of checking the status of packages added to a Docker base image. Using API calls, we can determine if there has been an update to the base image or any packages within the desired time, and rerun the image build pipeline if needed.

If there is not "last updated" information that can be easily fetched from package repositories, then we would need to keep a database of the current installed version and compare it with the version coming from the repository. One might think this would require spinning up a CosmosDB instance and writing tools to keep it up-to-date. But, keep in mind this is Azure DevOps we're talking about! Why not simply keep the database within the Git repository itself, right next to the Dockerfile? For regulatory and auditing sake, it might be useful to keep such a file alongside the Dockerfile anyway. You could continuously append to the file, and write to it the pipeline build number, latest Git SHA, and the version numbers of all the packages. Or, you could simply overwrite the file and make a new commit to the repository each time, thus allowing you to use the Git history to examine the contents of past images in your ACR. (Of course, you should leave the pipeline build number in the commit message.)

Illustration of using a file within a Git repository to keep track of package versions. By comparing the contents of this file with the latest versions stored in package repositories, the pipeline can decide whether or not to build the Docker image for ACR again. It is also helpful from an auditing perspective to know what the contents of images are, especially when these images are used for training ML models, so it can be known exactly what versions of dependencies were used.

Trials and Tribulations


This wasn't just a walk in the park. I went through many designs of YAML and scripting before I fell on this approach. My initial strategies were all hamstrung by limitations in YAML's ability to process runtime variables in the ways I expected. I even had a scare when I couldn't find how to pass parameters into the Azure DevOps CLI pipeline command, thus had tried to develop a means of reconciling / coalescing values that had come in as parameters or variables, only to realize that the resulting runtime variable could not be used to name a variable group I wanted to use to differentiate between my different environments. This helpful GitHub issue detailed the reconciliation process, and I even filed a documentation bug on Azure DevOps to call out a page that had omitted details on the --parameters argument, since this was paramount for my desired concept to work correctly. In hindsight, it's hilarious to see how Microsoft produced two completely different views of documentation for the same command, with different choices of context among them (Azure DevOps Services vs. Server or TFS), and this information failed to translate between them. Hopefully, when you are searching for the arguments for a command, you're led to a page with the complete set of arguments and not led on a wild goose chase for workarounds like I was!

Further Reading




Introduction to the Azure DevOps CLI Extension: https://learn.microsoft.com/en-us/azure/devops/cli/?view=azure-devops

The correct documentation page that actually shows you how to pass parameters to a pipeline from the CLI: https://learn.microsoft.com/en-us/cli/azure/pipelines?view=azure-cli-latest#az-pipelines-run

Also look for hyperlinks embedded throughout the body of the post.

Comments

Popular posts from this blog

I/O 2021: New Decade, New Frontiers

Making a ROM hack of an old arcade game