close
Skip to content

custom domains: middleware and verification#2897

Open
Soxasora wants to merge 157 commits intostackernews:masterfrom
Soxasora:feat/custom-domains-base
Open

custom domains: middleware and verification#2897
Soxasora wants to merge 157 commits intostackernews:masterfrom
Soxasora:feat/custom-domains-base

Conversation

@Soxasora
Copy link
Copy Markdown
Member

@Soxasora Soxasora commented Mar 31, 2026

Description

Based on #2904, updated to Next.js 16

Part of #1942, revives #2145.
Introduces custom domains via Next.js proxy. This PR focuses on domain creation and verification.

The proxy detects custom domains and applies redirects and rewrites to generate a territory-centric SN version. Territory paths lose the ~subName prefix, and attempts to go outside of the domain's territory are corrected via redirect.1

Verification is a multi-step repeating process that involves DNS verification, AWS certificates creation/verification and attachment to the AWS load balancer listener.
AWS is simulated with Localstack on sndev.

More infos are available inside docs/dev/custom-domains.md (todo: not finished)

Screenshots

Screenshot 2026-04-08 at 17 49 36

Domain verification starts right when a custom domain is saved

Screenshot 2026-04-08 at 17 33 21

sndev: DNS is simulated with dnsmasq

Screenshot 2026-04-08 at 17 34 01

sndev: AWS certificates (ACM) are simulated with Localstack and are automatically verified on the next call

Screenshot 2026-04-08 at 17 37 53

custom domain is verified

Screenshot 2026-04-08 at 17 38 30

custom domain overview

Screenshot 2026-04-08 at 17 40 13

Concerns and known issues

  • s3 cors is not dynamic
  • cannot signup or login from scratch
  • polling time of domain verification (territory domain form) can be faster than 30 seconds
  • SN_TERRITORY_PATHS is hardcoded and may drift
    • If new territory-scoped routes are added, this list must be manually updated.
  • Redundant index on domainName
    • *claude thought
    • @@index([domainName])
    • There's already a unique constraint on domainName (line 918: @unique), which implicitly creates a B-tree index. The explicit @@index([domainName]) is redundant.
  • seo thumbnails not working for custom domains but we could open capture.stacker.news to other custom domains.
    • dynamic?
  • there are 5 db triggers and probably we don't need that much obscure behavior
    • triggers like the deletion of domain certificates on territory STOPPED status can be handled via JS, but at the same time these triggers help us handle a custom domain lifecycle in the safest way possible.
    • needs to be checked.

Checklist

Are your changes backwards compatible? Please answer below:
Yes, everything is an addition. An exception is the way we handle S3 via localstack, now the container is called AWS and also allows ACM simulations other than S3.

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, QA OK, domain verification handles errors correctly and follows its step, middleware correctly detects a custom domain and obtains its connected sub.

For frontend changes: Tested on mobile, light and dark mode? Please answer below:
Yes, it uses standard SN colors and styles that fits with theme changes.

Did you introduce any new environment variables? If so, call them out explicitly here:
An endpoint for localstack that can be contacted by our worker container
LOCALSTACK_ENDPOINT=http://aws:4566

local DNS server via dnsmasq, with fallback to 1.1.1.1 in both dnsmasq and code.
DOMAINS_DNS_SERVER=172.20.0.2:53

tells Next.js what domains (sox.pizza) or wildcards (**.pizza) can be trusted for HMR in dev.
ALLOWED_DEV_ORIGINS=**.sndev

custom domains debug logs are gated by env var, by default they're disabled (0)
NEXT_PUBLIC_CUSTOM_DOMAINS_DEBUG=0

Did you use AI?
review


Note

High Risk
Touches request middleware/rewrites, introduces new background jobs, and adds DB triggers plus ACM/ELB certificate management, which can affect routing and production availability if misconfigured.

Overview
Adds custom domains for territories end-to-end: a new GraphQL domain/setDomain API plus UI in territory settings to set/reset a domain and display DNS (CNAME + SSL) verification instructions with polling.

Implements a background domainVerification worker that repeatedly verifies DNS via node:dns, provisions/monitors ACM certificates, and attaches/detaches certificates on an ALB listener (mocked ELBv2 in dev); Prisma adds Domain* tables/enums and several triggers/scheduled cleanup jobs to manage lifecycle (HOLD on territory stop/takeover, certificate cleanup, purge long-held domains).

Updates the Next.js proxy middleware to detect mapped custom domains, inject x-stacker-news-subname, and rewrite/redirect territory routes to hide ~subName; the frontend is adjusted to be domain-aware (prefix/nav/link behavior, external sub links) and dev infra/envs are updated (Localstack aws container + new domains compose profile, new debug/dev origin envs).

Reviewed by Cursor Bugbot for commit 832b462. Bugbot is set up for automated code reviews on this repo. Configure here.

Footnotes

  1. I'm not sure if this should be the behavior, maybe it's better to redirect/open a new tab to stacker.news

Soxasora and others added 30 commits May 1, 2025 18:38
- ACM support
- custom domains crud, resolvers, fragments
- custom domains form, guidelines
- custom domains context
- domain verification every 5 minutes via pgboss
- domain validation schema
- basic custom domains middleware, to be completed
- TODOs tracings
- CustomDomain -> Domain
- DomainVerification table
- CNAME, TXT, SSL verification types
- WIP DomainVerification upsert
…ange status of a Record from its Attempt, multi-purpose dns verification
- use DomainVerificationStatus enum for domains and records
- adapt Territory Form UI to new schema
- return 'records' as an object with its types
- wip: prepare for attempts and certificate usage for prisma
fix:
- fix setDomain mutation transaction
- fix schema typedefs

enhance:
- DNS records guidelines with flex-wrap for longer records

cleanup:
- add comments to worker
- remove console.log on validation values
… HOLD

handle territory changes via triggers
- on territory stop, HOLD the domain
- on territory takeover from another user, delete the domain and its associated records

handle ACM certificates via trigger
- on domain/domainCertificate deletion, ask ACM to delete the certificate via a pgboss job; removes the need to ask ACM in multiple places

clear domains that have been on HOLD for more than 30 days, check every midnight via pgboss schedule

use 'domains' profile for worker jobs
…igration: DomainStatus and RecordStatus enums; cleanup and docs consistency

aws:
- `requestCertificate` now calls `certDetails` once and returns the certificate other than its certificateArn
- DNS validation values for SSL are now obtained by the certificate we got during request ^

schema:
- remove `DomainVerificationStatus`
- `DomainStatus` enum for domain lifecycle
- `RecordStatus` enum for domain verification
- adapt domainVerification worker to use the new enums
Comment thread lib/validate.js Outdated
Comment thread worker/domainVerification.js Outdated
Comment thread lib/validate.js
Comment thread components/territory-domains.js Outdated
Comment thread proxy.js
Comment thread api/resolvers/domain.js
Comment thread proxy.js
// main domain
const SN_MAIN_DOMAIN = new URL(process.env.NEXT_PUBLIC_URL)
// territory paths that needs to be rewritten to ~subname
const SN_TERRITORY_PATHS = ['/new', '/top', '/post', '/edit', '/rss']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing /search from territory paths causes broken scoping

High Severity

SN_TERRITORY_PATHS is missing /search. The search component uses usePrefix(sub) which returns '' on custom domains, producing a /search URL. Since /search isn't in the territory paths list, the middleware won't rewrite it to /~subName/search, so search on custom domains loses territory scoping and becomes a global search instead.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c80b68e. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search is postponed to the custom domains UI/UX PR which will focus on paths, custom domain <-> SN relationship and more.
This PR addresses custom domain access and verification.

Comment thread worker/domainVerification.js Outdated
Comment thread worker/domainVerification.js Outdated
Comment thread api/resolvers/domain.js
Comment thread components/form.js
{append}
</span>
)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyButton append mode never shows copied visual feedback

Low Severity

The new append rendering path in CopyButton renders a static {append} element and never reflects the copied state visually. The default button mode swaps its content to a Thumb icon on success, but the append mode always renders the same ClipboardLine icon regardless of whether the copy succeeded. Users relying on the clipboard icon (used in DomainGuidelines) get no inline feedback beyond the toast notification.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 616ce1f. Configure here.

Copy link
Copy Markdown
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks sane and well organized to me. Hard to tell exactly how it'll interact with all the reverse proxies, but that's what the admin gating is for.

Not blocking, but I had a bit of trouble navigating the state machine.

IMO there a few areas where my confidence is not super high:

  1. state machine related issues
  2. interactions with aws (out of our control mostly until we're in prod)
  3. header handling in the proxy (you made it dead simple but I need to walk through it another time or two)

I'll continue reviewing tonight to gain more confidence in those areas, but with admin gating, this is good to go afaict.

You mentioned you had some minor fixes you wanted to make before merging so I'll wait for your green light.

Comment thread .env.development
# reachable by containers on 172.30.0.2(:53), outside of docker with 0.0.0.0:5353
DOMAINS_DNS_SERVER=172.30.0.2
# custom domains debugging
NEXT_PUBLIC_CUSTOM_DOMAINS_DEBUG=1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this to the service worker?

Copy link
Copy Markdown
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a few non-blocking things in the proxy

Comment thread proxy.js
// NextJS recommends to not add the CSP header to prefetches and static assets
// See https://nextjs.org/docs/app/building-your-application/configuring/content-security-policy
{
source: '/((?!api|_next/static|_error|404|500|offline|_next/image|_next/webpack-hmr|favicon.ico).*)',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we exclude /api, calls to /api/graphql do not strip x-stackernews-subname. This isn't a problem now because we don't read the header outside of getserversideprops, but if that changes we'll be reading a spoofable x-stackernews-subname. I'm not sure why that might change though.

It's at least worth commenting on. We could also go further running a little header stripping middleware when source: '/((?api|_next/static|_error|404|500|offline|_next/image|_next/webpack-hmr|favicon.ico).*)'.

Comment thread proxy.js
// TODO: handle auth sync

// clean up the pathname from any subname
if (pathname.startsWith('/~')) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.pizza.com/~bitcoin is redirected to https://www.pizza.com/bitcoin. It's kind of low severity given it'd have to be manually input, but it might be worth redirecting to https://stacker.news/~bitcoin or 404ing.

If we redirect https://www.pizza.com/~bitcoin to https://stacker.news/~bitcoin in the proxy, then we can probably avoid doing badge rewrites in components.

Not blocking either. Just a suggestion.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.pizza.com/~bitcoin redirects to https://www.pizza.com since that part removes the territory name.

It would be much easier to redirect to stacker.news via proxy indeed. I'll implement this!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to implement this but in dev this can't work due to a Next.js bug: vercel/next.js#44482

It seems like redirecting from www.pizza.com/~bitcoin to localhost:3000/~bitcoin won't work because localhost will be ignored and it will be treated as a relative redirect (www.pizza.com/~bitcoin) causing a redirect loop.
If I instead redirect to, idk, stacker.news it will work.

In production it should work, but the fact that it's untestable and will actually break the middleware in dev... makes this a bit weird to work with.

I think we can postpone this to the UI/UX PR, I like the idea of being able to redirect to stacker news in one place, probably we need to be creative.

@huumn
Copy link
Copy Markdown
Member

huumn commented Apr 20, 2026

I found these mermaid diagrams helpful for understanding how things progress through domain verification, aws calls, and the proxy.

stateDiagram-v2
  [*] --> PENDING: setDomain creates row
  PENDING --> ACTIVE: verifyDomain returns VERIFIED
  PENDING --> HOLD: 48h elapsed OR 3 AWS retries exhausted OR Sub STOPPED
  ACTIVE --> HOLD: Sub STOPPED trigger
  HOLD --> PENDING: setDomain re-verify on same domain
  HOLD --> [*]: clearLongHeldDomains after 30 days
  ACTIVE --> [*]: Sub owner changes
  HOLD --> [*]: Sub owner changes
Loading
sequenceDiagram
  participant W as worker
  participant A as ACM
  participant E as ELBv2
  W->>A: "requestCertificate(domain, IdempotencyToken=domainId)"
  A-->>W: certificateArn
  W->>A: describeCertificate(arn)
  A-->>W: "{ Status: PENDING_VALIDATION, ResourceRecord }"
  Note right of W: store arn and SSL record in DB
  loop every 30s to 5m while PENDING
    W->>A: describeCertificate(arn)
    A-->>W: Status
  end
  W->>E: "addListenerCertificates(listener, arn)"
  E-->>W: "ok (no-op if already attached)"
  Note over W,E: Domain ACTIVE
  Note over W: Later, domain deleted or HOLD fires cleanup
  W->>E: "removeListenerCertificates(listener, arn)"
  E-->>W: "ok (no-op if not attached)"
  W->>A: deleteCertificate(arn)
  A-->>W: "ok (errors if cert in use)"
Loading
flowchart TB
  Req[Incoming request] --> Strip[proxy.js strips x-stacker-news-subname]
  Strip --> Ref[referrerMiddleware]
  Ref --> HostCheck{host equals NEXT_PUBLIC_URL.host?}
  HostCheck -- yes --> Sec[applySecurityHeaders]
  HostCheck -- no --> Map{getDomainMapping returns ACTIVE row?}
  Map -- no --> Sec
  Map -- yes --> Custom[customDomainMiddleware]
  Custom --> Rewrite{path starts with slash-tilde?}
  Rewrite -- yes --> Redirect[redirect to cleaned path]
  Rewrite -- no --> SubParam{has ?sub and mismatches?}
  SubParam -- yes --> RedirectFix[redirect with sub coerced]
  SubParam -- no --> IsTerr{path is / or territory path?}
  IsTerr -- yes --> RW[rewrite to /tilde-sub/path and set header]
  IsTerr -- no --> Pass[NextResponse.next and set header]
  Redirect --> Cookies[applyReferrerCookies]
  RedirectFix --> Cookies
  RW --> Cookies
  Pass --> Cookies
  Cookies --> Sec
Loading

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 832b462. Configure here.

Comment thread proxy.js
}
}

return applySecurityHeaders(referrerResp)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referrer header lost for custom domain requests

Medium Severity

The x-stacker-news-referrer request header set by referrerMiddleware is silently dropped for custom domain requests. referrerMiddleware builds new request headers (including the referrer) inside its returned NextResponse, but customDomainMiddleware independently creates its own headers from the original request.headers, which never contain the referrer. applyReferrerCookies only transfers cookies, not the referrer request header. This breaks implicit content-referrer attribution on custom domains, since ssrApollo.js reads x-stacker-news-referrer for one-day referral rewards.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 832b462. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature new product features that weren't there before territories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants