feat: add option to wait on apt locks#47
Conversation
Signed-off-by: L0RD-ZER0 <68327382+L0RD-ZER0@users.noreply.github.com>
mrrobot47
left a comment
There was a problem hiding this comment.
Took a closer look at the apt lock-wait handling — nice idea and the install-side tiers work as intended. A few gaps stand out, most importantly that the update path does not actually wait (the apt lists lock is not covered by DPkg::Lock::Timeout, verified on apt 2.8.3). Details and suggestions inline.
| # setup.sh so this file is self-contained when sourced by migrate.sh or | ||
| # remote-migrate. | ||
| if [ -n "${EE_APT_LOCK_WAIT:-}" ]; then | ||
| _EE_APT_LOCK_UPDATE="-o DPkg::Lock::Timeout=900" |
There was a problem hiding this comment.
DPkg::Lock::Timeout does not cover apt-get update. It waits on the dpkg frontend lock (/var/lib/dpkg/lock-frontend) used by install, but not the apt lists lock (/var/lib/apt/lists/lock) that update takes. Verified on apt 2.8.3 (Ubuntu 24.04) with the relevant lock held and Timeout=6:
apt-get update→ fails in <1s (E: Could not get lock /var/lib/apt/lists/lock)apt-get install→ waits the full 6s (Waiting for cache lock…) before giving up
So _EE_APT_LOCK_UPDATE is effectively a no-op on every apt-get update (third-party repo add/drop, setup_php, check_dependencies, and setup.sh bootstrap). That's this PR's own target case: on a freshly provisioned VM the apt-daily/unattended-upgrades timer holds the lists lock while running apt-get update, so the first apt-get update here can still abort with "Could not get lock" even with EE_APT_LOCK_WAIT=1.
The install tiers are fine. For update, the timeout option can't help — wrap it in a retry/wait loop instead, e.g.:
ee_apt_update() {
local i
for i in $(seq 1 30); do
apt-get update && return 0
ee_log_info2 "apt lists lock busy; retrying in 10s"
sleep 10
done
apt-get update
}and call ee_apt_update wherever apt-get $_EE_APT_LOCK_UPDATE update runs today.
| # DPkg lock-wait tiers, activated by EE_APT_LOCK_WAIT. Duplicated from | ||
| # setup.sh so this file is self-contained when sourced by migrate.sh or | ||
| # remote-migrate. |
There was a problem hiding this comment.
This comment reads as "migrations honor EE_APT_LOCK_WAIT," but they mostly don't:
remote-migrateruns apt viarun_remote_command, i.e.ssh host "source …; $COMMAND"—EE_APT_LOCK_WAITis never forwarded, so the remote shell has it unset and every remoteapt-get(remote-migratelines 52/55/71, plus the sourcedsetup_php/check_dependencies) takes the empty-tier no-op path.migrate.sh's own bootstrapapt-get update/install(lines 20, 23) run beforesource helper-functions, so the tiers are undefined there.
The duplication does keep the file from erroring under standalone sourcing — it just doesn't deliver the wait to those paths. Either narrow the comment, or actually cover them (forward the opt-in into the run_remote_command ssh string, and apply the same handling to the migration bootstraps). Minimal comment fix:
| # DPkg lock-wait tiers, activated by EE_APT_LOCK_WAIT. Duplicated from | |
| # setup.sh so this file is self-contained when sourced by migrate.sh or | |
| # remote-migrate. | |
| # DPkg lock-wait tiers, activated by EE_APT_LOCK_WAIT. Duplicated from | |
| # setup.sh so the tier vars stay defined (no-op) when this file is sourced | |
| # standalone by migrate.sh / remote-migrate — note those scripts' own | |
| # apt-get calls and remote (ssh) installs do not inherit the wait. |
There was a problem hiding this comment.
Is updating the old migration scripts from v3 to v4 required? From my view, those scripts would mostly be run manually, and the env var can be exported if waiting is required for functions in functions. Updating the comment.
| # before touching the system. | ||
| ee_php_cli_installable() { | ||
| apt-get install -s "php${1}-cli" >/dev/null 2>&1 | ||
| apt-get $_EE_APT_LOCK_LIGHT_INSTALL install -s "php${1}-cli" >/dev/null 2>&1 |
There was a problem hiding this comment.
apt-get -s (simulate) disables locking entirely (Debug::NoLocking, per apt-get(8)), so $_EE_APT_LOCK_LIGHT_INSTALL is never consulted on this dry-run probe — it's dead here. Drop it:
| apt-get $_EE_APT_LOCK_LIGHT_INSTALL install -s "php${1}-cli" >/dev/null 2>&1 | |
| apt-get install -s "php${1}-cli" >/dev/null 2>&1 |
There was a problem hiding this comment.
https://linux.die.net/man/8/apt-get#:~:text=%2Ds%2C%20%2D%2Dsimulate,no%20consequence%20(rare).
Did not find any reference for this, so keeping it in for now. If it works, its ok, and if its a nop, its even better.
| if [ "$EE_LINUX_DISTRO" == "Ubuntu" ]; then | ||
| apt-get install -y software-properties-common curl ca-certificates | ||
| apt-get $_EE_APT_LOCK_INSTALL install -y software-properties-common curl ca-certificates | ||
| add-apt-repository -y ppa:ondrej/php || true |
There was a problem hiding this comment.
add-apt-repository runs its own internal apt-get update, which won't see the -o DPkg::Lock::Timeout token on the surrounding apt-get lines — so this step (among the most lock-prone on a fresh VM) isn't covered by EE_APT_LOCK_WAIT. The per-call-site -o injection can't reach sub-tool apt calls.
For the install-side locks, setting the timeout once via apt config is both simpler than threading a token through ~14 sites and reaches add-apt-repository too — write it before bootstrap:
# when EE_APT_LOCK_WAIT is set
printf 'DPkg::Lock::Timeout "1800";\n' > /etc/apt/apt.conf.d/99ee-lock-wait(and remove it on exit). Note this still won't help apt-get update — that lock isn't covered by DPkg::Lock::Timeout at all (see the note on the _EE_APT_LOCK_UPDATE definition).
There was a problem hiding this comment.
Instead decided to use the -n flag with add-apt-repository and do a manual update with flock gating. You can refer to the option here: https://manpages.ubuntu.com/manpages/jammy/man1/add-apt-repository.1.html#n,
| # When EE_APT_LOCK_WAIT is set, wait for the dpkg/apt lock instead of | ||
| # failing immediately. Tier variables are left empty (no-op) when unset. |
There was a problem hiding this comment.
EE_APT_LOCK_WAIT is tested with [ -n … ], so it's presence-only: EE_APT_LOCK_WAIT=0 / false / no all enable waiting, and a numeric value like EE_APT_LOCK_WAIT=300 is silently ignored (the hardcoded 900/1800 win). The name suggests a tunable duration, which invites both mistakes. Either make the value the timeout, or document it as an on/off flag:
| # When EE_APT_LOCK_WAIT is set, wait for the dpkg/apt lock instead of | |
| # failing immediately. Tier variables are left empty (no-op) when unset. | |
| # Any non-empty EE_APT_LOCK_WAIT (e.g. =1) enables the dpkg lock-wait; it is an | |
| # on/off flag, not a duration (the timeouts below are fixed). Tier variables | |
| # are left empty (no-op) when unset. |
mrrobot47
left a comment
There was a problem hiding this comment.
Re-reviewed the "fix: address review changes" commit (8baec11). The substantive items from the last pass are resolved:
apt-get updatelock-wait no-op →_EE_APT_LOCK_UPDATEremoved and replaced with theee_apt_updateretry loop (the right mechanism, sinceDPkg::Lock::Timeoutcan't cover the lists lock).-ssimulate probe → lock option dropped.add-apt-repository→-ny+ explicitee_apt_update(confirmed-ny/--no-updateis valid and skips the internal update on Ubuntu 24.04).functionscomment reworded to be honest about the migration/remote caveat and the on/off-flag semantics.
On the migration-scripts question (your reply): agreed — fine to leave the v3→v4 paths for the separate ssh-proxy / migrations follow-up, not blocking here.
A few smaller things on the new retry, inline — none are correctness blockers. The main one: the retry fires (with a "lock busy" message) on any hard apt error, not just a busy lock. One nit elsewhere: the setup.sh comment (lines 12-13) still says the tiers make apt "wait for the dpkg/apt lock … no-op when unset", but update no longer uses the timeout tier (it's the retry loop now) — worth reconciling with the updated functions comment.
| ee_apt_update() { | ||
| local i | ||
| for i in $(seq 1 30); do | ||
| apt-get update && return 0 | ||
| ee_log_info2 "apt lists lock busy; retrying in 10s" | ||
| sleep 10 | ||
| done | ||
| apt-get update | ||
| } |
There was a problem hiding this comment.
ee_apt_update retries on any non-zero apt-get update, not just a busy lock, and always prints "apt lists lock busy". Two consequences:
- A genuinely broken repo (404 / nonexistent suite / GPG error — I confirmed these exit 100 on apt 2.8.3) gets retried 30×10s ≈ 5 min before the real error surfaces, with a misleading "lock busy" line hiding the actual cause. (A transient/unreachable mirror is not affected —
apt-get updatereturns 0 there, so no retry.) - It's unconditional, so it runs even when
EE_APT_LOCK_WAITis unset — whereas theinstalltiers stay flag-gated. The two halves of the feature now disagree on whether lock-waiting is opt-in.
Consider retrying only when the failure is actually a lock, and surfacing other errors immediately. e.g. (the if-condition keeps it set -e-safe; mirror the same change in setup.sh's copy):
ee_apt_update() {
local i err
for i in $(seq 1 30); do
if err="$(apt-get update 2>&1)"; then printf '%s\n' "$err"; return 0; fi
printf '%s\n' "$err" >&2
printf '%s' "$err" | grep -qiE "Could not get lock|Unable to lock" || return 1
ee_log_info2 "apt lists lock busy; retrying in 10s"
sleep 10
done
apt-get update
}| apt-get install -y software-properties-common curl ca-certificates | ||
| add-apt-repository -y ppa:ondrej/php || true | ||
| apt-get $_EE_APT_LOCK_INSTALL install -y software-properties-common curl ca-certificates | ||
| (add-apt-repository -ny ppa:ondrej/php && ee_apt_update) || true |
There was a problem hiding this comment.
The && ee_apt_update here is redundant: nothing between this line and the unconditional ee_apt_update at line 287 reads the apt lists (the codename probe via get_ondrej_php_ppa_release_status is an HTTP check), so line 287 already performs the one refresh needed before ee_php_cli_installable. On a persistent failure this also doubles the retry wait (~10 min) for a single repo-add. Drop it and let line 287 update once:
| (add-apt-repository -ny ppa:ondrej/php && ee_apt_update) || true | |
| add-apt-repository -ny ppa:ondrej/php || true |
| # apt-get install -s uses Debug::NoLocking | ||
| # |
There was a problem hiding this comment.
Tighten this — line 249 is a whitespace-only # (no information), and the first line states a fact without the point. Per the concise-comments convention, say the why in one line:
| # apt-get install -s uses Debug::NoLocking | |
| # | |
| # -s (simulate) takes no dpkg lock, so a lock-wait option here would be a no-op |
Summary
Adds an opt-in
EE_APT_LOCK_WAITenvironment variable that makes allapt-getanddpkgoperations wait for existing locks to be released instead of failing immediately. This is useful on freshly provisioned servers where unattended-upgrades or other background package managers may hold the dpkg lock during EasyEngine installation.Changes
setup.sh: DefineEE_APT_LOCK_WAITand three tiered lock-timeout variables (_EE_APT_LOCK_UPDATE,_EE_APT_LOCK_LIGHT_INSTALL,_EE_APT_LOCK_INSTALL) that expand to-o DPkg::Lock::Timeout=<seconds>when enabled, or remain empty (no-op) when unset.functions: Duplicate the tier definitions so the file remains self-contained when sourced independently bymigrate.shorremote-migrate. Inject the lock-wait variables into everyapt-get update,apt-get install, and simulation (install -s) call.Lock timeout tiers
_EE_APT_LOCK_UPDATEapt-get update_EE_APT_LOCK_LIGHT_INSTALLgawk,sqlite3,iproute2, dry-run checks)_EE_APT_LOCK_INSTALLsoftware-properties-common)Usage
When
EE_APT_LOCK_WAITis unset or empty, behavior is unchanged —apt-getcommands run without any lock timeout option.