Khachatur Ashotyan

Posted on May 22 • Edited on May 23

MacOS Workers, or how I built my own Mac cloud

#macos #cicd #devops #jenkins

In Part 1 I laid out the Jenkins-as-a-Code setup (JCasC, Job DSL, ephemeral workers, Packer images), and said macOS workers deserved a separate post. This is that post.

For anyone who's never run macOS builds in CI: most things that are easy on Linux turn out to be hard on macOS, often for reasons that don't apply anywhere else. Apple's licensing rules mean you can't just spin up a Mac in AWS the way you do an Ubuntu box. Then there's the keychain, the signing tooling, and the Xcode versioning. The typical answer at most companies is a few Mac minis under someone's desk that everybody SSHes into, and that works for a single team right up until the company depends on it.

I wanted the same setup on macOS that I had for Linux and Windows: a fresh worker per build, destroyed when the build finishes. Getting there took a while.

Why macOS is hard in the first place

A few things to keep in mind first, because they explain why the architecture below looks the way it does.

1. The cloud Mac story is awkward. EC2 Mac instances exist - real Mac hardware in AWS data centers, you can rent one. But they're dedicated hosts with a 24-hour minimum allocation (Apple's licensing, not AWS being weird) and per-hour pricing is brutal next to Linux. If your worker lives for 30 minutes and you pay for 24 hours, the per-build math is rough.

2. Apple's EULA only allows macOS to run on Apple hardware. Which means you can't legally virtualize macOS on a Linux box. Real Mac hardware has to be in the loop somewhere - yours, rented, or in someone else's rack.

3. macOS virtualization is its own ecosystem. On Intel Macs the answer used to be VMware (vSphere or Fusion) or VirtualBox. On Apple Silicon, neither works the same way. Everything goes through Apple's Virtualization.framework now, and the tooling around it is still young.

4. Signing and notarization credentials fight you on ephemeral VMs. Developer ID certificates, app-specific passwords, the keychain - none of it was designed for "fresh VM every build". It assumes a developer's laptop. Making it work in CI is its own rabbit hole.

5. macOS images are huge. A baked Packer image with Xcode is 60-80 GB. Pulling that from cold storage is slow, so caching matters a lot more here than on Linux.

All five show up repeatedly in the rest of this post.

What I evaluated and didn't pick

There aren't many serious players in macOS CI. I evaluated the commercial options seriously before landing on what's below. None of them were bad. The economics just didn't line up for us.

Veertu Anka - the most mature paid platform for macOS CI virtualization. Roughly what Tart does, plus a polished UI, enterprise support, more features. Licensing is per-host or per-VM, which adds up fast once you have a real fleet. Credible if you've got budget and want a vendor to call.
MacStadium - managed-Mac hosting. You rent physical Macs in their DC, optionally with their orchestration layer (Orka). Good if you don't want to rack Macs yourself. Per-host per-month pricing fits a steady-state fleet; spiky volume or existing hardware makes it worse.
AWS EC2 Mac instances - see above. Worth it for very low-volume work, where avoiding ops outweighs the per-hour bill. The 24-hour minimum kills it for high-volume ephemeral CI.
GitHub Actions managed macOS runners - fine for OSS projects and small teams. Per-minute pricing gets painful at real volume. And the image is fixed - the moment you need anything past stock Xcode, you're stuck.

What sold me on Tart was the licensing more than the technology. The commercial license is free for personal and small-scale use, and the paid tier doesn't scale linearly with fleet size the way Anka's per-host model does. It's affordable at our build volume, and at one or two Macs you pay nothing at all.

As of April 2026 - when Cirrus Labs joined OpenAI - the licensing got better still: Tart, Vetu and Orchard have been relicensed under a more permissive license and the commercial fees dropped entirely.

The rest of the Cirrus Labs toolchain holds up too. Orchard sits on top of Tart for fleet orchestration, and Cirrus CLI lets you run CI tasks locally against a Tart VM. Being able to reproduce a Jenkins job on my laptop has saved hours of debugging CI-only failures.

Three ways I ended up provisioning macOS workers

No single tool covered everything I needed, so I ended up with three provisioners for three shapes of Mac fleet. All three follow the same pattern as in Part 1 (a Jenkins job invokes the provisioner, the worker comes up from a Packer image, runs the build, and gets destroyed), but the layer underneath each one is different.

Option A - Tart, for Apple Silicon

Tart is a small open-source CLI from Cirrus Labs that wraps Apple's Virtualization.framework. Hand it an OCI-compatible image (basically a tarball with the macOS VM disk) and it boots a VM on Apple Silicon in seconds. Images are reusable and layerable - it's the closest thing to "docker but for macOS VMs" I've come across.

How it fits:

Hardware: a fleet of Mac minis (or Studios) we own or rent, sitting in a rack.
Each Mac runs Tart on the host.
A Jenkins job grabs an available host, tart clones from a known image tag, tart runs it, registers the VM as a Jenkins agent, then tart deletes when the build finishes.
Packer's tart-cli source builds the images. Xcode, Homebrew, signing tools, language runtimes - all baked in at image-build time.

The good: spin-up is fast - tart clone to "agent connected" is under a minute. The image is a snapshot, so every build starts from byte-identical state.

The not-so-good: you still need to own or rent the Macs. The "real Apple machines in a rack" problem doesn't go away, you just orchestrate around it. And the Tart ecosystem is young - expect to write glue.

Option B - vSphere / VCSA, for the older Intel fleet

Before Apple Silicon, the Mac fleet was a stack of Intel Mac minis hooked into a vSphere cluster. macOS VMs were managed as ESXi guests, the same way any other VMware VM would be.

How it fits:

ESXi on each Mac mini (the only OS Apple's licensing lets you install on a Mac and still host macOS guests).
A golden macOS VM template lives in vSphere, baked by Packer's vSphere ISO builder.
Jenkins runs Terraform with the vSphere provider to clone the template (linked clones are faster), bring up the VM, register it as an agent, tear it down after.

This setup predates Apple Silicon and it still works, but it's the heaviest of the three. Linked clones help with spawn time, but it's still slower than Tart, and vSphere itself is a chunky thing to operate on top of that.

It's the responsible-enterprise path. If you already run VMware in your org it slots in fine, but nobody starting fresh in 2026 would pick it.

Option C - Orchard, for pooled / remote Macs

Orchard is also from Cirrus Labs, in the same family as Tart. Instead of orchestrating individual Mac hosts yourself, Orchard sits as a controller in front of a pool of workers and you request a VM through its API. It handles scheduling, queuing, and lifecycle for you.

How it fits:

A pool of Macs (yours, or from a managed provider like MacStadium), and you don't want individual Jenkins jobs picking physical machines.
Jenkins calls Orchard's API for a VM with a given image and resource profile, runs the build, releases the VM.
Capacity is the real constraint - 20 builds queued, 5 Macs free, Orchard handles the rest.

The good: Jenkins doesn't need to know where the Mac physically lives, which is a clean separation between provisioning and scheduling.

The not-so-good: it's yet another piece of infrastructure to run, which only pays off past a certain fleet size. With two or three Macs, raw Tart is simpler.

What gets baked into the Packer image

The principle stays the same: bake everything we can into the image so the build itself doesn't pay any setup time. The macOS image ends up being the heaviest in our fleet by a wide margin.

What goes into a typical macOS worker image:

OS: pinned macOS point release. Xcode compatibility is brittle - chasing "latest" is a bad idea.
Xcode: pinned version + command line tools. Xcode alone is 30+ GB.
Homebrew + packages: every brewed tool the build needs, pre-installed and pre-warmed.
Language runtimes: Node, Python, Ruby - pinned to match production.
Build tools: CMake, Ninja, Conan, whatever the project actually uses.
Signing tools: codesign, notarytool, xcrun - ship with Xcode, but worth confirming they're on PATH.
Pre-warmed caches: Conan, npm, brew - anything that would otherwise download on the first build.

The Packer template itself is short. Most of the work is in a chain of shell scripts that run after the base macOS install:

source "tart-cli" "macos" {
  vm_base_name = "ghcr.io/cirruslabs/macos-monterey-base:latest"
  vm_name      = "macos-ci-${var.image_version}"
  cpu_count    = 4
  memory_gb    = 8
  disk_size_gb = 120
  ssh_username = "admin"
  ssh_password = "admin"
}

build {
  sources = ["source.tart-cli.macos"]

  provisioner "shell" {
    scripts = [
      "scripts/post-install.sh",
      "scripts/brew-setup.sh",
      "scripts/xcode.sh",
      "scripts/nodejs-setup.sh",
      "scripts/deps.sh",
      "scripts/prewarm-caches.sh",
    ]
  }
}

The template is maybe 30 lines. The real work is in the shell scripts, which live in the same repo and go through the same PR review as the rest of the infra.

A baked image is around 60-80 GB. Storage matters, but cache locality matters more. Pulling a fresh 70 GB image from the registry on every first boot would crater throughput across the fleet, so we pre-cache base images on each host out of band.

The signing-on-ephemeral-VMs problem

Signing eats more first-setup time than anything else on this list, which is why it gets its own section.

Apple's signing pipeline assumes a developer machine with a persistent keychain - you unlock it once and sign apps for the rest of the day. With ephemeral CI VMs that breaks: every VM is brand new, no keychain, no saved password.

What we landed on:

Developer ID cert + private key live in a secrets manager (AWS Secrets Manager, Vault, whatever). Never in the image, never in git.
At job start, the pipeline pulls the cert + key and imports them into a temporary keychain it creates on the VM.
That keychain has a random password just for this build. It dies with the VM.
Notarization credentials (app-specific password or notarytool API key) come from the same secrets manager. Used directly - no keychain needed.
Build ends, VM is destroyed, keychain goes with it. Same lifecycle as the worker.

Trimmed-down version of the keychain-bootstrap script:

#!/usr/bin/env bash
set -euo pipefail

KEYCHAIN="ci-build.keychain"
KEYCHAIN_PASSWORD="$(openssl rand -base64 24)"

# Create a brand-new keychain just for this build.
security create-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN"
security set-keychain-settings -lut 21600 "$KEYCHAIN"
security unlock-keychain -p "$KEYCHAIN_PASSWORD" "$KEYCHAIN"

# Add it to the search list so codesign can find it.
security list-keychains -d user -s "$KEYCHAIN" $(security list-keychains -d user | tr -d '"')

# Import the cert + key from the secrets-manager-provided files.
security import "$DEVELOPER_ID_CERT" -k "$KEYCHAIN" -P "$CERT_PASSWORD" -T /usr/bin/codesign

# Grant codesign permission to use the key without prompting.
security set-key-partition-list -S apple-tool:,apple: -s -k "$KEYCHAIN_PASSWORD" "$KEYCHAIN" >/dev/null

Things that bit us:

set-key-partition-list is mandatory on modern macOS. Without it, codesign pops a UI password prompt that nothing will ever answer on a headless VM, and the build hangs indefinitely.
The keychain must be in the search list. A keychain that exists but isn't searched is invisible to codesign.
Notarization is asynchronous. notarytool submit --wait does block until it's done, but "done" can be several minutes away, so make sure your build timeouts account for it.
Stapling fails silently if you forget it. Notarization succeeds and the artifact ships, but end users still see a Gatekeeper warning because the ticket isn't stapled. Run xcrun stapler staple <artifact> after notarization.

None of this is deep magic, but first-time setup tends to eat a week of debugging on most teams. Budget for that, and get the keychain bootstrap script working before you write the rest of the pipeline.

Trade-offs - which one should you pick?

Probably more than one, depending on what fleet you've inherited. But if I were starting from scratch in 2026:

If you have...	Pick
A small pool of Apple Silicon Macs you own	Tart, directly. Free at this scale, nothing extra to run.
A larger fleet of Apple Silicon Macs, mixed ownership / remote	Tart + Orchard - same licensing, proper scheduling on top.
An existing vSphere installation and Intel Macs	vSphere / VCSA. Don't rebuild what works.
Need enterprise support, budget isn't tight	Veertu Anka.
Don't want to rack Macs, want a managed fleet	MacStadium (with their orchestration, or your own).
No physical Macs, very low volume	EC2 Mac. The 24-hour minimum stings, but sometimes the operational simplicity wins.
Open-source project, low volume	GitHub-hosted macOS runners. Free for OSS, nothing to host.
No physical Macs, high volume	MacStadium or similar. EC2 Mac economics break at this scale.

The approach I'd push back on is "let's just use Mac minis under someone's desk". It works for a single team, but the moment every iOS release across the company depends on it, you've got a bottleneck nobody owns.

What I'm still figuring out

A few open problems I haven't fully solved:

Image freshness - Xcode updates land every few weeks, and keeping the Packer image current without breaking everyone's build is constant work. We rebuild on a schedule and pin each job to a specific image version. The rebuild itself is a 90-minute job.
Cost. Mac hardware is expensive whether you own it or rent it. Above a certain build volume the math works; below that, per-build cost stings.
Apple Silicon transition for older code. Some of our C++ code still has Intel-only deps that haven't been ported. Those builds run on the vSphere/Intel fleet, which is shrinking. "Rewrite all the legacy build deps for arm64" is its own multi-quarter project.
Notarization queue times. Apple's notarization service has bad days where submissions take 20+ minutes. Nothing to do from our side - macOS builds just have a longer tail than everything else.

Closing thought

macOS CI doesn't get clean. There's no "just run a pod in EKS" equivalent, you'll have physical hardware in the loop, probably more than one hypervisor, and a signing problem that doesn't exist on any other platform. What's worked for us is treating macOS the way we treat everything else: ephemeral workers from a baked image, triggered by a job in git, with secrets pulled from a vault at runtime. Once the contract matches what Linux and Windows do, macOS stops being the part of CI that nobody wants to own.

Appendix - tools mentioned in this post

Cirrus Labs toolchain (the one I ended up on)

Tart - macOS VMs on Apple Silicon via Virtualization.framework. Free for small-scale; licensing.
Orchard - controller for pooled Tart hosts.
Cirrus CLI - run CI tasks locally against a Tart VM using a .cirrus.yml config.
Packer Tart plugin - Packer builder for Tart images.

Commercial alternatives I evaluated

Veertu Anka - paid platform for macOS CI virtualization, polished, enterprise support.
MacStadium - managed Mac hosting + optional Orka orchestration.
AWS EC2 Mac instances - real Apple hardware in AWS, 24-hour minimum allocation.
GitHub-hosted macOS runners - fine for OSS / small scale.

Other

vSphere and the Terraform vSphere provider - for the older Intel fleet.
HashiCorp Packer - bakes all the worker images.
Apple's notarytool - the modern notarization CLI.

This is Part 2 of My CI/CD Odyssey. Follow me here on dev.to if you want to get pinged when Part 3 drops. And if you're doing macOS CI differently, I'd love to hear about it in the comments.

DEV Community

MacOS Workers, or how I built my own Mac cloud

Why macOS is hard in the first place

What I evaluated and didn't pick

Three ways I ended up provisioning macOS workers

Option A - Tart, for Apple Silicon

Option B - vSphere / VCSA, for the older Intel fleet

Option C - Orchard, for pooled / remote Macs

What gets baked into the Packer image

The signing-on-ephemeral-VMs problem

Trade-offs - which one should you pick?

What I'm still figuring out

Closing thought

Appendix - tools mentioned in this post

Top comments (0)