125 lines
5 KiB
Markdown
125 lines
5 KiB
Markdown
# Mandos as a wake-on-demand build host
|
|
|
|
## Goal
|
|
|
|
Mandos is primarily an interactive living-room machine, but it is also a strong candidate for handling remote Nix builds when it is idle. The goal is to make that dual use practical without keeping the machine powered all the time.
|
|
|
|
## Current context
|
|
|
|
On `main`, Mandos is configured as an interactive gaming machine:
|
|
|
|
- `systems/x86_64-linux/mandos/default.nix`
|
|
- `sneeuwvlok.hardware.has.gpu.nvidia = true`
|
|
- `sneeuwvlok.hardware.has.audio = true`
|
|
- `sneeuwvlok.desktop.use = "gamescope"`
|
|
- `sneeuwvlok.application.steam.enable = true`
|
|
- `homes/x86_64-linux/chris@mandos/default.nix`
|
|
- user-facing application set for an interactive machine
|
|
|
|
This makes Mandos a poor fit for "always running random infrastructure", but a reasonable fit for "available for work when needed".
|
|
|
|
## Desired behavior
|
|
|
|
- Mandos remains an interactive machine first.
|
|
- Mandos can be used as a remote build worker when no one is actively using it.
|
|
- Mandos should not need to stay fully on all day just to be eligible for builds.
|
|
- Waking and idling down should be automatic enough that the machine can participate in builds without turning into a maintenance burden.
|
|
|
|
## Recommended model
|
|
|
|
### 1. Use wake-on-LAN as the activation mechanism
|
|
|
|
Mandos should support being awakened by another machine on the same LAN.
|
|
|
|
Requirements:
|
|
|
|
- BIOS or UEFI wake-on-LAN support enabled
|
|
- NixOS interface configuration enabling wake-on-LAN
|
|
- one low-power machine that is effectively always available to send wake requests
|
|
|
|
In this repo, `ulmo` is the obvious candidate to act as the coordinator, but the pattern should stay generic: one machine is always reachable, and one or more stronger machines can be woken on demand.
|
|
|
|
### 2. Prefer suspend-first over shutdown-first
|
|
|
|
There are two main power states worth considering:
|
|
|
|
- **Suspend on idle**
|
|
- faster resume
|
|
- generally better user experience
|
|
- often easier to make reliable for wake-on-LAN
|
|
- **Shutdown on idle**
|
|
- lowest power draw
|
|
- more fragile in practice because firmware support for wake from soft-off varies
|
|
- longer time to become available again
|
|
|
|
Recommended rollout order:
|
|
|
|
1. Prove the concept with suspend on idle.
|
|
2. Only consider full power-off later if the hardware and firmware behave reliably.
|
|
|
|
## 3. Add an explicit availability policy
|
|
|
|
The interesting lesson for tagging is not "Mandos should have a build tag". The interesting lesson is that some machines have a deliberate availability policy that affects how safely they can participate in automation.
|
|
|
|
A future host-level setting could encode this policy directly, for example:
|
|
|
|
- `always-on`
|
|
- `wake-on-demand`
|
|
- `manual`
|
|
|
|
That setting would be a better source for any computed operational tag than current workload or ad hoc tags.
|
|
|
|
## 4. Idle detection should be policy-driven
|
|
|
|
If Mandos becomes a build worker, idle shutdown or suspend should depend on signals such as:
|
|
|
|
- no local interactive session activity
|
|
- no active build job
|
|
- no long-running system task that should keep the machine awake
|
|
|
|
This should not be a blind timer that powers the machine down every X minutes regardless of context.
|
|
|
|
## 5. Build orchestration needs a coordinator
|
|
|
|
Wake-on-demand only works well if something else can wake the machine and wait for it to become reachable. In practice, this means:
|
|
|
|
- a coordinator sends the wake signal
|
|
- the build client retries until the machine is reachable
|
|
- the remote builder participates only after it is actually ready
|
|
|
|
The exact implementation can vary, but the architectural point is the same: a wakeable build worker is not self-sufficient.
|
|
|
|
## Risks and caveats
|
|
|
|
- Firmware wake support may be unreliable, especially from full shutdown.
|
|
- Build latency increases because wake and readiness checks take time.
|
|
- A machine that users expect to be immediately available should not surprise them with power-state transitions at awkward moments.
|
|
- Interactive workload detection matters; otherwise the machine will feel hostile as a living-room device.
|
|
|
|
## Recommendation
|
|
|
|
Treat the Mandos idea as a good pattern, but generalize it:
|
|
|
|
- some machines are **interactive**
|
|
- some machines are **wakeable on demand**
|
|
- some machines are suitable for **interruptible background work**
|
|
|
|
Those are more reusable concepts than "Mandos is the build server".
|
|
|
|
## Implications for the tag strategy
|
|
|
|
This investigation strengthens a small part of the `operational:*` space:
|
|
|
|
- `operational:availability:always-on`
|
|
- `operational:availability:wake-on-demand`
|
|
- `operational:workload:interruptible`
|
|
|
|
These should not be assigned by hand if they can instead be computed from explicit machine settings that describe availability policy.
|
|
|
|
## References
|
|
|
|
- Clan inventory tags and dynamic tags docs: `https://clan.lol/docs/25.11/reference/options/clan_inventory`
|
|
- NixOS Wake-on-LAN wiki: `https://wiki.nixos.org/wiki/Wake_on_LAN`
|
|
- Home-lab wake-on-demand discussion and patterns:
|
|
- `https://dgross.ca/blog/linux-home-server-auto-sleep`
|
|
- `https://danielpgross.github.io/friendly_neighbor/howto-sleep-wake-on-demand.html`
|