Skip to content

feat: email retry queue for upstream provider failures#323

Open
hhvrc wants to merge 1 commit into
developfrom
feature/email-retry-queue
Open

feat: email retry queue for upstream provider failures#323
hhvrc wants to merge 1 commit into
developfrom
feature/email-retry-queue

Conversation

@hhvrc

@hhvrc hhvrc commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What

Adds a retry queue so emails that fail to reach the upstream provider (Mailjet 5xx/429, SMTP transient errors, network blips) are persisted and retried instead of silently lost.

How it works

Send path (API). IEmailService is now a thin decorator (QueueingEmailService) over the raw IEmailSender:

  • Send succeeds → done.
  • Transient failure (EmailDeliveryException.IsTransient, i.e. provider 5xx or 429, or an unexpected error) → enqueue a QueuedEmail row and return normally (the user's signup / email-change never fails because of mail).
  • Permanent failure (other 4xx) → logged and dropped.
  • Password reset is intentionally not queued — a failed reset can just be re-requested, and we don't want a background job minting reset tokens.

Storage. A queued_emails table with a queued_email_type discriminator and a jsonb payload whose shape depends on the type. Crucially, no token or link is ever stored — only { userId, email[, newEmail] }.

Retry worker (Cron). A Hangfire [CronJob("* * * * *")] (ProcessEmailQueueJob) drains due rows via EmailQueueProcessor. For each row it looks the account up and regenerates a fresh token right before sending (rotating the activation request / pending email-change token), then sends through the raw sender. Per-row exponential backoff (NextAttemptAt), max-attempts give-up, and immediate drop on permanent failure.

Scope: account activation, email verification, email-change notice.

Notable structural change

The email send infrastructure (IEmailSender + Mailjet/SMTP/None implementations, Fluid templates, MailKit, mail options, .liquid files) moved from API into Common, because the Cron worker references only Common and must be able to send. The application-facing IEmailService + decorator stay in API. IEmailService's queueable methods now take the target userId so the worker can regenerate links.

Migration

Autogenerated AddEmailQueue (against MigrationOpenShockContext, matching the existing migrations) — adds the queued_email_type enum, the queued_emails table, and an index on next_attempt_at. No data migration.

Tests

EmailQueueTests (9 tests) exercise the decorator and processor directly against the real test DB with a controllable fake sender:

  • decorator enqueues a token-free row on transient failure; does not enqueue on permanent failure or for password reset;
  • processor regenerates a working activation token (verified end-to-end via /account/activate), drops for already-activated accounts, reschedules with backoff on transient failure, gives up after max attempts;
  • email-change-notice resend and email-verification token regeneration (verified via /account/email-change/verify).

All 9 pass; the existing 15 MailTests (real flows via Mailpit) still pass through the new decorator.

🤖 Generated with Claude Code


Open in Stage

When the upstream email provider fails transiently (5xx / 429 / network),
the email is now queued and retried instead of being lost.

- Move the email send infrastructure (IEmailSender + Mailjet/SMTP/None,
  Fluid templates, MailKit, mail options, .liquid files) from API into
  Common so both the API send path and the Cron worker can use it.
- Add IEmailService.QueueingEmailService decorator (API): send now, and on
  a transient/unexpected failure enqueue a token-free QueuedEmail row;
  permanent failures and password resets are logged and dropped.
- QueuedEmail entity: a queued_email_type discriminator + a jsonb payload
  whose shape depends on the type. Tokens are never persisted; the worker
  regenerates a fresh token right before each resend.
- Drain the queue from a Hangfire [CronJob] (Cron/ProcessEmailQueueJob)
  every minute via EmailQueueProcessor, with per-row exponential backoff,
  a max-attempts give-up, and drop-on-permanent-failure.
- Scope: account activation, email verification, email-change notice.
  Password reset is intentionally excluded.
- Autogenerated EF migration AddEmailQueue (MigrationOpenShockContext).
- Integration tests covering the decorator and the processor.
@hhvrc hhvrc force-pushed the feature/email-retry-queue branch from a4eb7de to 3fdf6a8 Compare June 24, 2026 11:51
@stage-review

stage-review Bot commented Jun 24, 2026

Copy link
Copy Markdown

Ready to review this PR? Stage has broken it down into 6 individual chapters for you:

Title
1 Move email infrastructure from API to Common
2 Define email queue schema and models
3 Implement core retry logic and transport
4 Wire retry queue into API and AccountService
5 Register background job and tests
6 Other changes
Open in Stage

Chapters generated by Stage for commit 3fdf6a8 on Jun 24, 2026 11:51am UTC.

catch (EmailDeliveryException ex) when (!ex.IsTransient)
{
// Permanent provider rejection — retrying would just fail again.
_logger.LogError(ex, "Permanent failure sending {EmailType} email; dropping", type);
catch (Exception ex)
{
// Transient EmailDeliveryException or any unexpected error — queue for retry.
_logger.LogWarning(ex, "Failed to send {EmailType} email; queueing for retry", type);
catch (Exception enqueueEx)
{
// Last resort: the email is lost, but a queue write failure must not crash the caller.
_logger.LogError(enqueueEx, "Failed to queue {EmailType} email for retry", type);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants