mirror of
https://github.com/chatwoot/chatwoot.git
synced 2026-06-04 21:02:35 +08:00
Outgoing email on a Microsoft email inbox was failing with `535 5.7.3 Authentication unsuccessful` immediately after the user re-authenticated, while incoming (IMAP) continued to work. The root cause is in our OAuth callback: we persist the id_token's `email` claim into `channel.imap_login`, but Microsoft's SMTP AUTH (XOAUTH2) validates the username against the access token's **UPN**, not the mailbox's primary SMTP address or aliases. When those diverge — common in tenants that use one domain for sign-in identities and another for mailbox addresses — SMTP rejects every send. This PR makes `OauthCallbackController#update_channel` prefer `preferred_username` (v2.0) / `upn` (v1.0) from the id_token over `email`, with `email` as the fallback so Google flows are unchanged. ## What changed - `app/controllers/oauth_callback_controller.rb` — extract `imap_login_identity` (default: `users_data['email']`, same as before). `update_channel` now calls it instead of inlining the email claim. No behavioural change in the base controller. - `app/controllers/microsoft/callbacks_controller.rb` — override `imap_login_identity` to return `preferred_username || upn || super`. Provider-specific knowledge stays in the provider subclass; Google's flow is literally untouched. - `spec/controllers/microsoft/callbacks_controller_spec.rb` — adds one example that reproduces the divergent shape (`email: <alias>`, `preferred_username: <upn>`) and asserts `imap_login` lands on the UPN while `channel.email` stays on the mailbox alias. `channel.email` and `find_channel_by_email` still key on the id_token's `email` claim everywhere, so customer-facing From identity and reconnect-by-email matching are unchanged. Behavioural matrix: | Provider / shape | `imap_login` before | `imap_login` after | |-----------------------------------------------------------------|---------------------|--------------------| | Microsoft, `upn == email` (common case) | email | UPN (= email, same string) | | Microsoft, `upn != email` (e.g. UPN on one domain, mailbox alias on another) | alias (broken) | UPN (works) | | Google | email | email (no change, base impl used) | ## Why this happens (Microsoft side, for context) Two facts that together produce the bug — one is documented, the other we verified empirically because Microsoft's docs don't address it: 1. **`email` and `upn` are different claims and can legitimately diverge.** In Entra, the UPN is the sign-in identity; the id_token's `email` claim is the user's mailbox property (which can be a proxy address). For v2.0 tokens (which is what we use — `/common/oauth2/v2.0/token` in `MicrosoftConcern`), the documented "username to sign in as" claim is `preferred_username`. See [ID token claims reference](https://learn.microsoft.com/en-us/entra/identity-platform/id-token-claims-reference). So `users_data['email']` was the wrong source for `imap_login`; tenants where UPN == primary SMTP happened to mask the bug. 2. **Exchange's XOAUTH2 SMTP rejects aliases in the `user=` field, even though IMAP accepts them.** This asymmetry is **not** in [Microsoft's canonical XOAUTH2 doc](https://learn.microsoft.com/en-us/exchange/client-developer/legacy-protocols/how-to-authenticate-an-imap-pop-smtp-application-by-using-oauth) — the doc treats the SASL `user=` field uniformly across IMAP, POP, and SMTP, with only a shared-mailbox carve-out spelled out. We confirmed the SMTP-strict behaviour empirically by running an XOAUTH2 AUTH probe with the same access token: `user=<alias>` returned `535 5.7.3`, `user=<UPN>` returned `235`. Same token, only the `user=` field differed. That's what tied the symptom to claim-shape. Hence the patch: read the documented "sign-in name" claim and persist that, not the customer-facing mailbox address. ## How to reproduce (before the fix) 1. Set up a Microsoft email inbox in a tenant where the user's UPN domain differs from the user's primary SMTP / mailbox domain (e.g. UPN `user@tenant-a.example`, mailbox `User@tenant-b.example` where `tenant-b.example` is a proxy address on the same mailbox). 2. Connect or re-authenticate the inbox through the standard flow. 3. Inspect the channel: ```ruby ch.imap_login # => "User@tenant-b.example" (alias — wrong) token = ch.provider_config['access_token'] JSON.parse(Base64.urlsafe_decode64(token.split('.')[1].then { |s| s + '=' * (-s.length % 4) }))['upn'] # => "user@tenant-a.example" (UPN — what SMTP actually needs) ``` 4. Send a reply. SMTP returns `535 5.7.3 Authentication unsuccessful`. Incoming IMAP continues to work fine. After the fix, `imap_login` is set to the UPN at callback time and SMTP succeeds. ## Notes for review - The id_token decoding path (`users_data`) already exists and is trusted by the rest of the callback — no new attack surface. - Scoped to Microsoft on purpose. A provider-agnostic fallback chain in the base controller would also have worked (Google id_tokens don't carry `preferred_username` / `upn`, so it'd land on email anyway), but keeping Google's code path identical to today removes any risk of an unintended interaction with whatever a future Google update might add to its id_token. Pattern matches the existing `find_channel_by_email` override Google already has. - The [id_token claims reference](https://learn.microsoft.com/en-us/entra/identity-platform/id-token-claims-reference) warns that `preferred_username` is mutable and "can't be used to make authorization decisions." That warning targets apps using the claim as a stable cross-session identifier for app-level authz — we're not. We use it as the SASL `user=` string for SMTP AUTH, validated by Microsoft against the same token it just issued. The claim's mutability matters only if a tenant admin renames a UPN between re-auths, in which case the stored `imap_login` goes stale and SMTP 535s — but the pre-patch code has the identical mutability characteristic on the `email` claim (rename a mailbox's primary SMTP and you get the same stale-then-535). The patch doesn't enlarge that failure surface; it shrinks it, because UPN-vs-alias divergence is now handled rather than always-broken. - Related but out of scope: SMTP failures don't trigger `channel.authorization_error!` today. `ExceptionList::SMTP_EXCEPTIONS` (`lib/exception_list.rb`) only contains `Net::SMTPSyntaxError`; `Net::SMTPAuthenticationError` bubbles unhandled, and `ApplicationMailer#handle_smtp_exceptions` only logs. So a 535 — whether from this bug or any other identity drift — never prompts re-auth in the UI, unlike the IMAP path (`fetch_imap_emails_job` catches `OAuth2::Error` and calls `authorization_error!`). Worth a separate PR; flagging here so we don't pretend stale-identity SMTP errors are self-healing. - The alternative I considered — decoding the access token's `upn` directly — is more correct in principle but relies on Microsoft access tokens being JWTs, which is undocumented behaviour for the `outlook.office.com` audience and contradicts the [docs guidance](https://learn.microsoft.com/en-us/entra/identity-platform/access-tokens) that access tokens are opaque to clients. - **Not addressed here, flagging for follow-up:** existing channels that were re-authenticated before this fix still hold the alias in `imap_login` and will keep 535ing until the user re-auths again. A one-shot rake task could decode the stored access tokens and reconcile, but fixing forward via natural re-auth is lower-risk; let me know if you want the backfill. - Also out of scope: `app/mailers/conversation_reply_mailer_helper.rb` reads `provider_config['access_token']` raw without going through `Microsoft::RefreshOauthTokenService`, while the IMAP path refreshes. That's a real asymmetry but unrelated to this bug (the token here was minutes-old) and worth its own PR. |
||
|---|---|---|
| .. | ||
| callbacks_controller.rb | ||