Skip to content

feat: add option to wait on apt locks#47

Open
L0RD-ZER0 wants to merge 2 commits into
masterfrom
feat/apt-lock-waiting
Open

feat: add option to wait on apt locks#47
L0RD-ZER0 wants to merge 2 commits into
masterfrom
feat/apt-lock-waiting

Conversation

@L0RD-ZER0

Copy link
Copy Markdown

Summary

Adds an opt-in EE_APT_LOCK_WAIT environment variable that makes all apt-get and dpkg operations wait for existing locks to be released instead of failing immediately. This is useful on freshly provisioned servers where unattended-upgrades or other background package managers may hold the dpkg lock during EasyEngine installation.

Changes

  • setup.sh: Define EE_APT_LOCK_WAIT and three tiered lock-timeout variables (_EE_APT_LOCK_UPDATE, _EE_APT_LOCK_LIGHT_INSTALL, _EE_APT_LOCK_INSTALL) that expand to -o DPkg::Lock::Timeout=<seconds> when enabled, or remain empty (no-op) when unset.
  • functions: Duplicate the tier definitions so the file remains self-contained when sourced independently by migrate.sh or remote-migrate. Inject the lock-wait variables into every apt-get update, apt-get install, and simulation (install -s) call.

Lock timeout tiers

Variable Timeout Used for
_EE_APT_LOCK_UPDATE 900s (15 min) apt-get update
_EE_APT_LOCK_LIGHT_INSTALL 900s (15 min) Lightweight installs (gawk, sqlite3, iproute2, dry-run checks)
_EE_APT_LOCK_INSTALL 1800s (30 min) Heavier installs (PHP, extensions, software-properties-common)

Usage

EE_APT_LOCK_WAIT=1 bash setup.sh

When EE_APT_LOCK_WAIT is unset or empty, behavior is unchanged — apt-get commands run without any lock timeout option.

Signed-off-by: L0RD-ZER0 <68327382+L0RD-ZER0@users.noreply.github.com>
@L0RD-ZER0 L0RD-ZER0 requested a review from mrrobot47 June 24, 2026 06:01

@mrrobot47 mrrobot47 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look at the apt lock-wait handling — nice idea and the install-side tiers work as intended. A few gaps stand out, most importantly that the update path does not actually wait (the apt lists lock is not covered by DPkg::Lock::Timeout, verified on apt 2.8.3). Details and suggestions inline.

Comment thread functions Outdated
# setup.sh so this file is self-contained when sourced by migrate.sh or
# remote-migrate.
if [ -n "${EE_APT_LOCK_WAIT:-}" ]; then
_EE_APT_LOCK_UPDATE="-o DPkg::Lock::Timeout=900"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DPkg::Lock::Timeout does not cover apt-get update. It waits on the dpkg frontend lock (/var/lib/dpkg/lock-frontend) used by install, but not the apt lists lock (/var/lib/apt/lists/lock) that update takes. Verified on apt 2.8.3 (Ubuntu 24.04) with the relevant lock held and Timeout=6:

  • apt-get update → fails in <1s (E: Could not get lock /var/lib/apt/lists/lock)
  • apt-get install → waits the full 6s (Waiting for cache lock…) before giving up

So _EE_APT_LOCK_UPDATE is effectively a no-op on every apt-get update (third-party repo add/drop, setup_php, check_dependencies, and setup.sh bootstrap). That's this PR's own target case: on a freshly provisioned VM the apt-daily/unattended-upgrades timer holds the lists lock while running apt-get update, so the first apt-get update here can still abort with "Could not get lock" even with EE_APT_LOCK_WAIT=1.

The install tiers are fine. For update, the timeout option can't help — wrap it in a retry/wait loop instead, e.g.:

ee_apt_update() {
  local i
  for i in $(seq 1 30); do
    apt-get update && return 0
    ee_log_info2 "apt lists lock busy; retrying in 10s"
    sleep 10
  done
  apt-get update
}

and call ee_apt_update wherever apt-get $_EE_APT_LOCK_UPDATE update runs today.

Comment thread functions Outdated
Comment on lines +6 to +8
# DPkg lock-wait tiers, activated by EE_APT_LOCK_WAIT. Duplicated from
# setup.sh so this file is self-contained when sourced by migrate.sh or
# remote-migrate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment reads as "migrations honor EE_APT_LOCK_WAIT," but they mostly don't:

  • remote-migrate runs apt via run_remote_command, i.e. ssh host "source …; $COMMAND"EE_APT_LOCK_WAIT is never forwarded, so the remote shell has it unset and every remote apt-get (remote-migrate lines 52/55/71, plus the sourced setup_php/check_dependencies) takes the empty-tier no-op path.
  • migrate.sh's own bootstrap apt-get update/install (lines 20, 23) run before source helper-functions, so the tiers are undefined there.

The duplication does keep the file from erroring under standalone sourcing — it just doesn't deliver the wait to those paths. Either narrow the comment, or actually cover them (forward the opt-in into the run_remote_command ssh string, and apply the same handling to the migration bootstraps). Minimal comment fix:

Suggested change
# DPkg lock-wait tiers, activated by EE_APT_LOCK_WAIT. Duplicated from
# setup.sh so this file is self-contained when sourced by migrate.sh or
# remote-migrate.
# DPkg lock-wait tiers, activated by EE_APT_LOCK_WAIT. Duplicated from
# setup.sh so the tier vars stay defined (no-op) when this file is sourced
# standalone by migrate.sh / remote-migrate — note those scripts' own
# apt-get calls and remote (ssh) installs do not inherit the wait.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is updating the old migration scripts from v3 to v4 required? From my view, those scripts would mostly be run manually, and the env var can be exported if waiting is required for functions in functions. Updating the comment.

Comment thread functions Outdated
# before touching the system.
ee_php_cli_installable() {
apt-get install -s "php${1}-cli" >/dev/null 2>&1
apt-get $_EE_APT_LOCK_LIGHT_INSTALL install -s "php${1}-cli" >/dev/null 2>&1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get -s (simulate) disables locking entirely (Debug::NoLocking, per apt-get(8)), so $_EE_APT_LOCK_LIGHT_INSTALL is never consulted on this dry-run probe — it's dead here. Drop it:

Suggested change
apt-get $_EE_APT_LOCK_LIGHT_INSTALL install -s "php${1}-cli" >/dev/null 2>&1
apt-get install -s "php${1}-cli" >/dev/null 2>&1

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://linux.die.net/man/8/apt-get#:~:text=%2Ds%2C%20%2D%2Dsimulate,no%20consequence%20(rare).

Did not find any reference for this, so keeping it in for now. If it works, its ok, and if its a nop, its even better.

Comment thread functions Outdated
if [ "$EE_LINUX_DISTRO" == "Ubuntu" ]; then
apt-get install -y software-properties-common curl ca-certificates
apt-get $_EE_APT_LOCK_INSTALL install -y software-properties-common curl ca-certificates
add-apt-repository -y ppa:ondrej/php || true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add-apt-repository runs its own internal apt-get update, which won't see the -o DPkg::Lock::Timeout token on the surrounding apt-get lines — so this step (among the most lock-prone on a fresh VM) isn't covered by EE_APT_LOCK_WAIT. The per-call-site -o injection can't reach sub-tool apt calls.

For the install-side locks, setting the timeout once via apt config is both simpler than threading a token through ~14 sites and reaches add-apt-repository too — write it before bootstrap:

# when EE_APT_LOCK_WAIT is set
printf 'DPkg::Lock::Timeout "1800";\n' > /etc/apt/apt.conf.d/99ee-lock-wait

(and remove it on exit). Note this still won't help apt-get update — that lock isn't covered by DPkg::Lock::Timeout at all (see the note on the _EE_APT_LOCK_UPDATE definition).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead decided to use the -n flag with add-apt-repository and do a manual update with flock gating. You can refer to the option here: https://manpages.ubuntu.com/manpages/jammy/man1/add-apt-repository.1.html#n,

Comment thread setup.sh
Comment on lines +12 to +13
# When EE_APT_LOCK_WAIT is set, wait for the dpkg/apt lock instead of
# failing immediately. Tier variables are left empty (no-op) when unset.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EE_APT_LOCK_WAIT is tested with [ -n … ], so it's presence-only: EE_APT_LOCK_WAIT=0 / false / no all enable waiting, and a numeric value like EE_APT_LOCK_WAIT=300 is silently ignored (the hardcoded 900/1800 win). The name suggests a tunable duration, which invites both mistakes. Either make the value the timeout, or document it as an on/off flag:

Suggested change
# When EE_APT_LOCK_WAIT is set, wait for the dpkg/apt lock instead of
# failing immediately. Tier variables are left empty (no-op) when unset.
# Any non-empty EE_APT_LOCK_WAIT (e.g. =1) enables the dpkg lock-wait; it is an
# on/off flag, not a duration (the timeouts below are fixed). Tier variables
# are left empty (no-op) when unset.

@mrrobot47 mrrobot47 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the "fix: address review changes" commit (8baec11). The substantive items from the last pass are resolved:

  • apt-get update lock-wait no-op → _EE_APT_LOCK_UPDATE removed and replaced with the ee_apt_update retry loop (the right mechanism, since DPkg::Lock::Timeout can't cover the lists lock).
  • -s simulate probe → lock option dropped.
  • add-apt-repository-ny + explicit ee_apt_update (confirmed -ny/--no-update is valid and skips the internal update on Ubuntu 24.04).
  • functions comment reworded to be honest about the migration/remote caveat and the on/off-flag semantics.

On the migration-scripts question (your reply): agreed — fine to leave the v3→v4 paths for the separate ssh-proxy / migrations follow-up, not blocking here.

A few smaller things on the new retry, inline — none are correctness blockers. The main one: the retry fires (with a "lock busy" message) on any hard apt error, not just a busy lock. One nit elsewhere: the setup.sh comment (lines 12-13) still says the tiers make apt "wait for the dpkg/apt lock … no-op when unset", but update no longer uses the timeout tier (it's the retry loop now) — worth reconciling with the updated functions comment.

Comment thread functions
Comment on lines +21 to +29
ee_apt_update() {
local i
for i in $(seq 1 30); do
apt-get update && return 0
ee_log_info2 "apt lists lock busy; retrying in 10s"
sleep 10
done
apt-get update
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ee_apt_update retries on any non-zero apt-get update, not just a busy lock, and always prints "apt lists lock busy". Two consequences:

  • A genuinely broken repo (404 / nonexistent suite / GPG error — I confirmed these exit 100 on apt 2.8.3) gets retried 30×10s ≈ 5 min before the real error surfaces, with a misleading "lock busy" line hiding the actual cause. (A transient/unreachable mirror is not affected — apt-get update returns 0 there, so no retry.)
  • It's unconditional, so it runs even when EE_APT_LOCK_WAIT is unset — whereas the install tiers stay flag-gated. The two halves of the feature now disagree on whether lock-waiting is opt-in.

Consider retrying only when the failure is actually a lock, and surfacing other errors immediately. e.g. (the if-condition keeps it set -e-safe; mirror the same change in setup.sh's copy):

ee_apt_update() {
  local i err
  for i in $(seq 1 30); do
    if err="$(apt-get update 2>&1)"; then printf '%s\n' "$err"; return 0; fi
    printf '%s\n' "$err" >&2
    printf '%s' "$err" | grep -qiE "Could not get lock|Unable to lock" || return 1
    ee_log_info2 "apt lists lock busy; retrying in 10s"
    sleep 10
  done
  apt-get update
}

Comment thread functions
apt-get install -y software-properties-common curl ca-certificates
add-apt-repository -y ppa:ondrej/php || true
apt-get $_EE_APT_LOCK_INSTALL install -y software-properties-common curl ca-certificates
(add-apt-repository -ny ppa:ondrej/php && ee_apt_update) || true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The && ee_apt_update here is redundant: nothing between this line and the unconditional ee_apt_update at line 287 reads the apt lists (the codename probe via get_ondrej_php_ppa_release_status is an HTTP check), so line 287 already performs the one refresh needed before ee_php_cli_installable. On a persistent failure this also doubles the retry wait (~10 min) for a single repo-add. Drop it and let line 287 update once:

Suggested change
(add-apt-repository -ny ppa:ondrej/php && ee_apt_update) || true
add-apt-repository -ny ppa:ondrej/php || true

Comment thread functions
Comment on lines +248 to +249
# apt-get install -s uses Debug::NoLocking
#

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tighten this — line 249 is a whitespace-only # (no information), and the first line states a fact without the point. Per the concise-comments convention, say the why in one line:

Suggested change
# apt-get install -s uses Debug::NoLocking
#
# -s (simulate) takes no dpkg lock, so a lock-wait option here would be a no-op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants