custom domains: middleware and verification#2897
custom domains: middleware and verification#2897Soxasora wants to merge 157 commits intostackernews:masterfrom
Conversation
- ACM support - custom domains crud, resolvers, fragments - custom domains form, guidelines - custom domains context - domain verification every 5 minutes via pgboss - domain validation schema - basic custom domains middleware, to be completed - TODOs tracings
- CustomDomain -> Domain - DomainVerification table - CNAME, TXT, SSL verification types - WIP DomainVerification upsert
…ange status of a Record from its Attempt, multi-purpose dns verification
…tch as they can fail
- use DomainVerificationStatus enum for domains and records - adapt Territory Form UI to new schema - return 'records' as an object with its types - wip: prepare for attempts and certificate usage for prisma
fix: - fix setDomain mutation transaction - fix schema typedefs enhance: - DNS records guidelines with flex-wrap for longer records cleanup: - add comments to worker - remove console.log on validation values
… HOLD handle territory changes via triggers - on territory stop, HOLD the domain - on territory takeover from another user, delete the domain and its associated records handle ACM certificates via trigger - on domain/domainCertificate deletion, ask ACM to delete the certificate via a pgboss job; removes the need to ask ACM in multiple places clear domains that have been on HOLD for more than 30 days, check every midnight via pgboss schedule use 'domains' profile for worker jobs
…not valid or we're in production
…on pgboss retry jobs
…l error if it fails
…igration: DomainStatus and RecordStatus enums; cleanup and docs consistency aws: - `requestCertificate` now calls `certDetails` once and returns the certificate other than its certificateArn - DNS validation values for SSL are now obtained by the certificate we got during request ^ schema: - remove `DomainVerificationStatus` - `DomainStatus` enum for domain lifecycle - `RecordStatus` enum for domain verification - adapt domainVerification worker to use the new enums
… same certificate if requested multiple times in 1 hour
…sed for personalization
…y're already unique with their own indexes
| // main domain | ||
| const SN_MAIN_DOMAIN = new URL(process.env.NEXT_PUBLIC_URL) | ||
| // territory paths that needs to be rewritten to ~subname | ||
| const SN_TERRITORY_PATHS = ['/new', '/top', '/post', '/edit', '/rss'] |
There was a problem hiding this comment.
Missing /search from territory paths causes broken scoping
High Severity
SN_TERRITORY_PATHS is missing /search. The search component uses usePrefix(sub) which returns '' on custom domains, producing a /search URL. Since /search isn't in the territory paths list, the middleware won't rewrite it to /~subName/search, so search on custom domains loses territory scoping and becomes a global search instead.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c80b68e. Configure here.
There was a problem hiding this comment.
Search is postponed to the custom domains UI/UX PR which will focus on paths, custom domain <-> SN relationship and more.
This PR addresses custom domain access and verification.
| {append} | ||
| </span> | ||
| ) | ||
| } |
There was a problem hiding this comment.
CopyButton append mode never shows copied visual feedback
Low Severity
The new append rendering path in CopyButton renders a static {append} element and never reflects the copied state visually. The default button mode swaps its content to a Thumb icon on success, but the append mode always renders the same ClipboardLine icon regardless of whether the copy succeeded. Users relying on the clipboard icon (used in DomainGuidelines) get no inline feedback beyond the toast notification.
Reviewed by Cursor Bugbot for commit 616ce1f. Configure here.
There was a problem hiding this comment.
This all looks sane and well organized to me. Hard to tell exactly how it'll interact with all the reverse proxies, but that's what the admin gating is for.
Not blocking, but I had a bit of trouble navigating the state machine.
IMO there a few areas where my confidence is not super high:
- state machine related issues
- interactions with aws (out of our control mostly until we're in prod)
- header handling in the proxy (you made it dead simple but I need to walk through it another time or two)
I'll continue reviewing tonight to gain more confidence in those areas, but with admin gating, this is good to go afaict.
You mentioned you had some minor fixes you wanted to make before merging so I'll wait for your green light.
| # reachable by containers on 172.30.0.2(:53), outside of docker with 0.0.0.0:5353 | ||
| DOMAINS_DNS_SERVER=172.30.0.2 | ||
| # custom domains debugging | ||
| NEXT_PUBLIC_CUSTOM_DOMAINS_DEBUG=1 |
There was a problem hiding this comment.
Do we need to add this to the service worker?
huumn
left a comment
There was a problem hiding this comment.
found a few non-blocking things in the proxy
| // NextJS recommends to not add the CSP header to prefetches and static assets | ||
| // See https://nextjs.org/docs/app/building-your-application/configuring/content-security-policy | ||
| { | ||
| source: '/((?!api|_next/static|_error|404|500|offline|_next/image|_next/webpack-hmr|favicon.ico).*)', |
There was a problem hiding this comment.
Because we exclude /api, calls to /api/graphql do not strip x-stackernews-subname. This isn't a problem now because we don't read the header outside of getserversideprops, but if that changes we'll be reading a spoofable x-stackernews-subname. I'm not sure why that might change though.
It's at least worth commenting on. We could also go further running a little header stripping middleware when source: '/((?api|_next/static|_error|404|500|offline|_next/image|_next/webpack-hmr|favicon.ico).*)'.
| // TODO: handle auth sync | ||
|
|
||
| // clean up the pathname from any subname | ||
| if (pathname.startsWith('/~')) { |
There was a problem hiding this comment.
https://www.pizza.com/~bitcoin is redirected to https://www.pizza.com/bitcoin. It's kind of low severity given it'd have to be manually input, but it might be worth redirecting to https://stacker.news/~bitcoin or 404ing.
If we redirect https://www.pizza.com/~bitcoin to https://stacker.news/~bitcoin in the proxy, then we can probably avoid doing badge rewrites in components.
Not blocking either. Just a suggestion.
There was a problem hiding this comment.
https://www.pizza.com/~bitcoin redirects to https://www.pizza.com since that part removes the territory name.
It would be much easier to redirect to stacker.news via proxy indeed. I'll implement this!
There was a problem hiding this comment.
I've been trying to implement this but in dev this can't work due to a Next.js bug: vercel/next.js#44482
It seems like redirecting from www.pizza.com/~bitcoin to localhost:3000/~bitcoin won't work because localhost will be ignored and it will be treated as a relative redirect (www.pizza.com/~bitcoin) causing a redirect loop.
If I instead redirect to, idk, stacker.news it will work.
In production it should work, but the fact that it's untestable and will actually break the middleware in dev... makes this a bit weird to work with.
I think we can postpone this to the UI/UX PR, I like the idea of being able to redirect to stacker news in one place, probably we need to be creative.
|
I found these mermaid diagrams helpful for understanding how things progress through domain verification, aws calls, and the proxy. stateDiagram-v2
[*] --> PENDING: setDomain creates row
PENDING --> ACTIVE: verifyDomain returns VERIFIED
PENDING --> HOLD: 48h elapsed OR 3 AWS retries exhausted OR Sub STOPPED
ACTIVE --> HOLD: Sub STOPPED trigger
HOLD --> PENDING: setDomain re-verify on same domain
HOLD --> [*]: clearLongHeldDomains after 30 days
ACTIVE --> [*]: Sub owner changes
HOLD --> [*]: Sub owner changes
sequenceDiagram
participant W as worker
participant A as ACM
participant E as ELBv2
W->>A: "requestCertificate(domain, IdempotencyToken=domainId)"
A-->>W: certificateArn
W->>A: describeCertificate(arn)
A-->>W: "{ Status: PENDING_VALIDATION, ResourceRecord }"
Note right of W: store arn and SSL record in DB
loop every 30s to 5m while PENDING
W->>A: describeCertificate(arn)
A-->>W: Status
end
W->>E: "addListenerCertificates(listener, arn)"
E-->>W: "ok (no-op if already attached)"
Note over W,E: Domain ACTIVE
Note over W: Later, domain deleted or HOLD fires cleanup
W->>E: "removeListenerCertificates(listener, arn)"
E-->>W: "ok (no-op if not attached)"
W->>A: deleteCertificate(arn)
A-->>W: "ok (errors if cert in use)"
flowchart TB
Req[Incoming request] --> Strip[proxy.js strips x-stacker-news-subname]
Strip --> Ref[referrerMiddleware]
Ref --> HostCheck{host equals NEXT_PUBLIC_URL.host?}
HostCheck -- yes --> Sec[applySecurityHeaders]
HostCheck -- no --> Map{getDomainMapping returns ACTIVE row?}
Map -- no --> Sec
Map -- yes --> Custom[customDomainMiddleware]
Custom --> Rewrite{path starts with slash-tilde?}
Rewrite -- yes --> Redirect[redirect to cleaned path]
Rewrite -- no --> SubParam{has ?sub and mismatches?}
SubParam -- yes --> RedirectFix[redirect with sub coerced]
SubParam -- no --> IsTerr{path is / or territory path?}
IsTerr -- yes --> RW[rewrite to /tilde-sub/path and set header]
IsTerr -- no --> Pass[NextResponse.next and set header]
Redirect --> Cookies[applyReferrerCookies]
RedirectFix --> Cookies
RW --> Cookies
Pass --> Cookies
Cookies --> Sec
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 832b462. Configure here.
| } | ||
| } | ||
|
|
||
| return applySecurityHeaders(referrerResp) |
There was a problem hiding this comment.
Referrer header lost for custom domain requests
Medium Severity
The x-stacker-news-referrer request header set by referrerMiddleware is silently dropped for custom domain requests. referrerMiddleware builds new request headers (including the referrer) inside its returned NextResponse, but customDomainMiddleware independently creates its own headers from the original request.headers, which never contain the referrer. applyReferrerCookies only transfers cookies, not the referrer request header. This breaks implicit content-referrer attribution on custom domains, since ssrApollo.js reads x-stacker-news-referrer for one-day referral rewards.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 832b462. Configure here.


Description
Based on #2904, updated to Next.js 16
Part of #1942, revives #2145.
Introduces custom domains via Next.js proxy. This PR focuses on domain creation and verification.
The proxy detects custom domains and applies redirects and rewrites to generate a territory-centric SN version. Territory paths lose the
~subNameprefix, and attempts to go outside of the domain's territory are corrected via redirect.1Verification is a multi-step repeating process that involves DNS verification, AWS certificates creation/verification and attachment to the AWS load balancer listener.
AWS is simulated with Localstack on sndev.
More infos are available inside
docs/dev/custom-domains.md(todo: not finished)Screenshots
Domain verification starts right when a custom domain is saved
sndev: DNS is simulated with dnsmasq
sndev: AWS certificates (ACM) are simulated with Localstack and are automatically verified on the next call
custom domain is verified
custom domain overview
Concerns and known issues
SN_TERRITORY_PATHSis hardcoded and may drift@@index([domainName])domainName(line 918: @unique), which implicitly creates a B-tree index. The explicit@@index([domainName])is redundant.STOPPEDstatus can be handled via JS, but at the same time these triggers help us handle a custom domain lifecycle in the safest way possible.Checklist
Are your changes backwards compatible? Please answer below:
Yes, everything is an addition. An exception is the way we handle S3 via localstack, now the container is called AWS and also allows ACM simulations other than S3.
On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, QA OK, domain verification handles errors correctly and follows its step, middleware correctly detects a custom domain and obtains its connected sub.
For frontend changes: Tested on mobile, light and dark mode? Please answer below:
Yes, it uses standard SN colors and styles that fits with theme changes.
Did you introduce any new environment variables? If so, call them out explicitly here:
An endpoint for localstack that can be contacted by our worker container
LOCALSTACK_ENDPOINT=http://aws:4566local DNS server via dnsmasq, with fallback to 1.1.1.1 in both dnsmasq and code.
DOMAINS_DNS_SERVER=172.20.0.2:53tells Next.js what domains (sox.pizza) or wildcards (**.pizza) can be trusted for HMR in dev.
ALLOWED_DEV_ORIGINS=**.sndevcustom domains debug logs are gated by env var, by default they're disabled (0)
NEXT_PUBLIC_CUSTOM_DOMAINS_DEBUG=0Did you use AI?
review
Note
High Risk
Touches request middleware/rewrites, introduces new background jobs, and adds DB triggers plus ACM/ELB certificate management, which can affect routing and production availability if misconfigured.
Overview
Adds custom domains for territories end-to-end: a new GraphQL
domain/setDomainAPI plus UI in territory settings to set/reset a domain and display DNS (CNAME + SSL) verification instructions with polling.Implements a background
domainVerificationworker that repeatedly verifies DNS vianode:dns, provisions/monitors ACM certificates, and attaches/detaches certificates on an ALB listener (mocked ELBv2 in dev); Prisma addsDomain*tables/enums and several triggers/scheduled cleanup jobs to manage lifecycle (HOLD on territory stop/takeover, certificate cleanup, purge long-held domains).Updates the Next.js
proxymiddleware to detect mapped custom domains, injectx-stacker-news-subname, and rewrite/redirect territory routes to hide~subName; the frontend is adjusted to be domain-aware (prefix/nav/link behavior, external sub links) and dev infra/envs are updated (Localstackawscontainer + newdomainscompose profile, new debug/dev origin envs).Reviewed by Cursor Bugbot for commit 832b462. Bugbot is set up for automated code reviews on this repo. Configure here.
Footnotes
I'm not sure if this should be the behavior, maybe it's better to redirect/open a new tab to stacker.news ↩