sneeuwvlok/docs/plans/mandos-wake-on-demand-build-host.md
2026-04-01 16:09:51 +02:00

5 KiB

Mandos as a wake-on-demand build host

Goal

Mandos is primarily an interactive living-room machine, but it is also a strong candidate for handling remote Nix builds when it is idle. The goal is to make that dual use practical without keeping the machine powered all the time.

Current context

On main, Mandos is configured as an interactive gaming machine:

  • systems/x86_64-linux/mandos/default.nix
    • sneeuwvlok.hardware.has.gpu.nvidia = true
    • sneeuwvlok.hardware.has.audio = true
    • sneeuwvlok.desktop.use = "gamescope"
    • sneeuwvlok.application.steam.enable = true
  • homes/x86_64-linux/chris@mandos/default.nix
    • user-facing application set for an interactive machine

This makes Mandos a poor fit for "always running random infrastructure", but a reasonable fit for "available for work when needed".

Desired behavior

  • Mandos remains an interactive machine first.
  • Mandos can be used as a remote build worker when no one is actively using it.
  • Mandos should not need to stay fully on all day just to be eligible for builds.
  • Waking and idling down should be automatic enough that the machine can participate in builds without turning into a maintenance burden.

1. Use wake-on-LAN as the activation mechanism

Mandos should support being awakened by another machine on the same LAN.

Requirements:

  • BIOS or UEFI wake-on-LAN support enabled
  • NixOS interface configuration enabling wake-on-LAN
  • one low-power machine that is effectively always available to send wake requests

In this repo, ulmo is the obvious candidate to act as the coordinator, but the pattern should stay generic: one machine is always reachable, and one or more stronger machines can be woken on demand.

2. Prefer suspend-first over shutdown-first

There are two main power states worth considering:

  • Suspend on idle
    • faster resume
    • generally better user experience
    • often easier to make reliable for wake-on-LAN
  • Shutdown on idle
    • lowest power draw
    • more fragile in practice because firmware support for wake from soft-off varies
    • longer time to become available again

Recommended rollout order:

  1. Prove the concept with suspend on idle.
  2. Only consider full power-off later if the hardware and firmware behave reliably.

3. Add an explicit availability policy

The interesting lesson for tagging is not "Mandos should have a build tag". The interesting lesson is that some machines have a deliberate availability policy that affects how safely they can participate in automation.

A future host-level setting could encode this policy directly, for example:

  • always-on
  • wake-on-demand
  • manual

That setting would be a better source for any computed operational tag than current workload or ad hoc tags.

4. Idle detection should be policy-driven

If Mandos becomes a build worker, idle shutdown or suspend should depend on signals such as:

  • no local interactive session activity
  • no active build job
  • no long-running system task that should keep the machine awake

This should not be a blind timer that powers the machine down every X minutes regardless of context.

5. Build orchestration needs a coordinator

Wake-on-demand only works well if something else can wake the machine and wait for it to become reachable. In practice, this means:

  • a coordinator sends the wake signal
  • the build client retries until the machine is reachable
  • the remote builder participates only after it is actually ready

The exact implementation can vary, but the architectural point is the same: a wakeable build worker is not self-sufficient.

Risks and caveats

  • Firmware wake support may be unreliable, especially from full shutdown.
  • Build latency increases because wake and readiness checks take time.
  • A machine that users expect to be immediately available should not surprise them with power-state transitions at awkward moments.
  • Interactive workload detection matters; otherwise the machine will feel hostile as a living-room device.

Recommendation

Treat the Mandos idea as a good pattern, but generalize it:

  • some machines are interactive
  • some machines are wakeable on demand
  • some machines are suitable for interruptible background work

Those are more reusable concepts than "Mandos is the build server".

Implications for the tag strategy

This investigation strengthens a small part of the operational:* space:

  • operational:availability:always-on
  • operational:availability:wake-on-demand
  • operational:workload:interruptible

These should not be assigned by hand if they can instead be computed from explicit machine settings that describe availability policy.

References

  • Clan inventory tags and dynamic tags docs: https://clan.lol/docs/25.11/reference/options/clan_inventory
  • NixOS Wake-on-LAN wiki: https://wiki.nixos.org/wiki/Wake_on_LAN
  • Home-lab wake-on-demand discussion and patterns:
    • https://dgross.ca/blog/linux-home-server-auto-sleep
    • https://danielpgross.github.io/friendly_neighbor/howto-sleep-wake-on-demand.html