I use Azure DevOps for work and we run our own privately hosted agents. These are deployed using ARM and configuration drift is kept to a minimum using Azure Desired State Configuration or DSC for short. The actual deploy is handled by an Azure DevOps YAML pipeline, which is triggered by a git commit. In terms of managing their availability we have an Azure automation runbook which starts and stops them based on a tag value (The runbook is amazing and when I remember where I got the code from I’ll post a link!).
It certainly sounds like I’m living the pain-free infrastructure as code dream. However, as you’ve probably guessed its not as fantastic as it sounds. VMs are always a pain to look after over time. The amount of pain they cause decreases as you polish your automation, but it will still always be there.
Some of the issues I’ve encountered include:
I know looking at the above list there doesn’t seem to be many issues (I’ve had more, but they are environment-specific) and some could be fixed with a bit of effort, but where does that end?
I’d realised the 15 minutes here and there to fix minor bugs had become a substantial part of my week. Especially if you include the time it takes for the deployment pipeline to run and then any DSC config wait time.
When setting up the self-hosted agents, I had investigated using containers instead. I liked the fact they were easy to spin up, which gave the possibility of tearing them down after every build. This would have solved at least two of the issues I highlighted above.
However, the orchestration of starting, stopping and scaling the containers using ACI and the agent registration in DevOps itself would have been substantial. It was time that I didn’t have as the build agents were a tool to be used by a project, not the project themselves.
The announcement of VM scale set agents going GA was the push I had needed. By using a vm scale set I could spin up instances reasonably quickly and set the instance to tear down once it had run a job. I could also use the DevOps agent pool settings to control the behaviour of the scale set.
Briefly scanning the documentation it looked as if after a few Azure cli commands I would have a working scale set configured perfectly for DevOps. However, it wasn’t until I saw the words “Create a scale set with custom image…” that I started to get a sinking feeling. As a rule, I avoid OS images, unless someone else produces them (too many years battling with Symantec Ghost!).
With the rise of Windows Virtual Desktop, I knew I would be involved with automating the creation of OS images, but I had hoped that I would play a minor part in any such automation and that maybe I could convince everybody to just layer DSC over an Azure Marketplace image.
Deep down I guess, I knew this wouldn’t work for build agents. You wouldn’t want to wait the 30 to 45 minutes our current new DSC deployments take. So I started to look at what technologies existed for creating custom images in Azure.
It turns out there are lots of ways to create an OS image in Azure and they are well documented in the Azure “How to” guides. The obvious one is to deploy a VM in Azure, sysprep it and then capture an image from that. I tried this approach when I was first developing my script to install our build software, but quickly abandoned it. It’s a reasonably old school way of doing things and has the normal old school problems.
I then looked at the new Azure Image Builder. I’m a sucker for Microsoft’s tools and this looked good, however, it’s Linux only at the moment. As much as I’d like to be young and hip and run my builds on Linux, it’s just not practical. For a start, I’m not a Linux admin, so troubleshooting would be a pain.
The next option in the Azure docs was creating an image with packer. I’d heard of Packer before, not only because one my friends is a massive fan of Hashicorp, but it seems to have come up in a lot of blogs I have read (I’d normally tune out at that point for reasons I’ve stated before). Reading through the Azure docs it seemed pretty cool, the template is JSON which because of ARM I’m familiar with and I’m pretty sure I could fit this in a DevOps pipeline.
When I start using a new technology I like to try and find a blog on the subject by someone I know to be competent. I found an excellent blog by Sam Cogan. If you’re not familiar with Sam’s work, check out https://samcogan.com. I regularly use his articles to solve work issues (Get IDs of installed MSIs etc.). It details how to create a pipeline to deploy your packer template and how to structure the git repo. There is also a link to his GitHub repo so you can use his code. Which is a massive help when you get started.
I originally planned to cover creating the image and scale sets in one post with links to the relevant documentation, but this has turned into a bit of an essay, so I’m going to split this over a few. The next instalment will be creating the image and installing custom software and then a post on the creation of the scale set and integration with DevOps. Finally, a follow-up post with my experience of running the feature. If there is anything I miss in any of these posts or something you would like more detail on, let me know in the comments.