Retry storm · double charge · 2% of checkout · support flood
Payment Ops · Incident brief
The Payment Service That Double-Charged Customers
Retry storm · double charge · 2% of checkout · support flood
Live evidence
- Stripe webhookT+5m
Elevated capture timeouts (18%) — clients reporting 504 on /v1/capture
- Finance SlackT+45m
Reconciliation batch: $240k duplicate captures across 2% of checkouts
- Support queueT+1h
Ticket volume +400% — 'charged twice for same order'
Problem statement
After PSP timeouts, clients retried capture requests. 2% of checkouts were double-charged because the first capture often succeeded despite the timeout. Reconciliation found $240k in duplicates.
Architecture sketch shows sync PSP calls with no idempotency store and retries delegated to clients.
- 2% of checkouts double-charged after PSP timeout + client retry.
- Support tickets spiked 400% in 2 hours.
- Retry storm increased PSP error rate to 18%.
- No idempotency keys on capture endpoint.
- Reconciliation batch found $240k in duplicate captures.
Architecture
Team whiteboard — incomplete. Missing paths implied by the incident.
The sketch on your whiteboard is the team's incomplete draft from a design review — not a correct or complete architecture. It omits major runtime paths and components implied by the incident.
Impacted services
- Payment APIcritical
Duplicate captures; retry amplification
- External PSPdegraded
18% error rate under retry storm
- PostgreSQLdegraded
Lock contention on payment rows
- Support / Financecritical
400% ticket spike; manual refunds