When using git as your source repository in an Azure pipeline, you notice that the checkout time gets longer as time goes on. When you first start using the repo it could be 30s, but after a while it could be 45s, 1 minute or multiple minutes to check it out. Sometimes it’s because it’s become quite large or someone has committed large binary files, other times the cause can be less obvious.
I have been looking at improving the speed of working with git repos for a while and i’d almost given up. That was until a colleague spotted that the number of git tags a repository had, seemed to directly impact on how quickly it could be cloned.
Looking at the microsoft documentation of the steps.checkout
command, you’ll see there are quite a few options.
steps:
- checkout: string # Required as first property. Alias of the repository resource to check out or 'none'.
clean: string # If true, run git clean -ffdx followed by git reset --hard HEAD before fetching. (true, false)
fetchDepth: string # Depth of Git graph to fetch.
fetchTags: string # Set to 'true' to sync tags when fetching the repo, or 'false' to not sync tags. See remarks for the default behavior..
lfs: string # set to 'true' to download Git-LFS files. Default is not to download them.
persistCredentials: string # set to 'true' to leave the OAuth token in the Git config after the initial fetch. The default is not to leave it.
submodules: true | recursive # set to 'true' for a single level of submodules or 'recursive' to get submodules of submodules. Default is not to fetch submodules.
path: string # Where to put the repository. The root directory is $(Pipeline.Workspace).
condition: string # Evaluate this condition expression to determine whether to run this task.
continueOnError: boolean # Continue running even on failure?. (false,n,no,off,on,true,y,yes)
displayName: string # Human-readable name for the task.
target: stepTarget # Environment in which to run this task
enabled: boolean # Run this task when the job runs?. (false,n,no,off,on,true,y,yes)
env: # Variables to map into the process's environment
string: string # Name/value pairs.
name: string # ID of the step. ([-_A-Za-z0-9]*)
timeoutInMinutes: string # Time to wait for this task to complete before the server kills it.
retryCountOnTaskFailure: string # Number of retries if the task fails.
I had already looked at fetchDepth: 1
, I thought if I only cloned the last version of the code, then the clone would be faster. Which sounds sensible, but does nothing if the repo is full of git tags. So I had to set fetchTags: false
as well. I’ve tested this on a few pipelines and the difference has been 4min+ in some cases. Which is by far the best performance improvement I have made to a pipeline. I’ve also not noticed any ill effects of these changes.
You might be thinking, if this is so good why doesn’t Microsoft set this a default? - they have. Any pipeline created after September 2022 will have a fetchDepth
of 1 and a fetchTags
set to false. They don’t however, apply this to existing pipelines.
To use these settings, you need to checkout your repo(s) in your steps. As you can see below we are checking out the repo ‘self’ which means the repo that contains the yaml file of the pipeline you are working on. If you were using additional repos, you will need to name them. This would also have to be paired with a resources block
trigger:
- main
pool:
vmImage: ubuntu-latest
steps:
- checkout: self
fetchDepth: 1
fetchTags: false
- script: echo Hello, world!
displayName: 'Run a one-line script'
- script: |
echo Add other tasks to build, test, and deploy your project.
echo See https://aka.ms/yaml
displayName: 'Run a multi-line script'
I’d originally planned on writing a post on general pipeline performance, once I had tested a few tweaks, but this was too good to sit on. If you know of any other changes that will improve checkout performance or general git operations in a pipeline, please add them to the comments.