feat: email retry queue for upstream provider failures#323
Open
hhvrc wants to merge 1 commit into
Open
Conversation
When the upstream email provider fails transiently (5xx / 429 / network), the email is now queued and retried instead of being lost. - Move the email send infrastructure (IEmailSender + Mailjet/SMTP/None, Fluid templates, MailKit, mail options, .liquid files) from API into Common so both the API send path and the Cron worker can use it. - Add IEmailService.QueueingEmailService decorator (API): send now, and on a transient/unexpected failure enqueue a token-free QueuedEmail row; permanent failures and password resets are logged and dropped. - QueuedEmail entity: a queued_email_type discriminator + a jsonb payload whose shape depends on the type. Tokens are never persisted; the worker regenerates a fresh token right before each resend. - Drain the queue from a Hangfire [CronJob] (Cron/ProcessEmailQueueJob) every minute via EmailQueueProcessor, with per-row exponential backoff, a max-attempts give-up, and drop-on-permanent-failure. - Scope: account activation, email verification, email-change notice. Password reset is intentionally excluded. - Autogenerated EF migration AddEmailQueue (MigrationOpenShockContext). - Integration tests covering the decorator and the processor.
a4eb7de to
3fdf6a8
Compare
|
Ready to review this PR? Stage has broken it down into 6 individual chapters for you: Chapters generated by Stage for commit 3fdf6a8 on Jun 24, 2026 11:51am UTC. |
| catch (EmailDeliveryException ex) when (!ex.IsTransient) | ||
| { | ||
| // Permanent provider rejection — retrying would just fail again. | ||
| _logger.LogError(ex, "Permanent failure sending {EmailType} email; dropping", type); |
| catch (Exception ex) | ||
| { | ||
| // Transient EmailDeliveryException or any unexpected error — queue for retry. | ||
| _logger.LogWarning(ex, "Failed to send {EmailType} email; queueing for retry", type); |
| catch (Exception enqueueEx) | ||
| { | ||
| // Last resort: the email is lost, but a queue write failure must not crash the caller. | ||
| _logger.LogError(enqueueEx, "Failed to queue {EmailType} email for retry", type); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a retry queue so emails that fail to reach the upstream provider (Mailjet 5xx/429, SMTP transient errors, network blips) are persisted and retried instead of silently lost.
How it works
Send path (API).
IEmailServiceis now a thin decorator (QueueingEmailService) over the rawIEmailSender:EmailDeliveryException.IsTransient, i.e. provider 5xx or 429, or an unexpected error) → enqueue aQueuedEmailrow and return normally (the user's signup / email-change never fails because of mail).Storage. A
queued_emailstable with aqueued_email_typediscriminator and ajsonbpayload whose shape depends on the type. Crucially, no token or link is ever stored — only{ userId, email[, newEmail] }.Retry worker (Cron). A Hangfire
[CronJob("* * * * *")](ProcessEmailQueueJob) drains due rows viaEmailQueueProcessor. For each row it looks the account up and regenerates a fresh token right before sending (rotating the activation request / pending email-change token), then sends through the raw sender. Per-row exponential backoff (NextAttemptAt), max-attempts give-up, and immediate drop on permanent failure.Scope: account activation, email verification, email-change notice.
Notable structural change
The email send infrastructure (
IEmailSender+ Mailjet/SMTP/None implementations, Fluid templates,MailKit, mail options,.liquidfiles) moved from API into Common, because the Cron worker references only Common and must be able to send. The application-facingIEmailService+ decorator stay in API.IEmailService's queueable methods now take the targetuserIdso the worker can regenerate links.Migration
Autogenerated
AddEmailQueue(againstMigrationOpenShockContext, matching the existing migrations) — adds thequeued_email_typeenum, thequeued_emailstable, and an index onnext_attempt_at. No data migration.Tests
EmailQueueTests(9 tests) exercise the decorator and processor directly against the real test DB with a controllable fake sender:/account/activate), drops for already-activated accounts, reschedules with backoff on transient failure, gives up after max attempts;/account/email-change/verify).All 9 pass; the existing 15
MailTests(real flows via Mailpit) still pass through the new decorator.🤖 Generated with Claude Code