Technical Content Feed

No more monkey-patching: Better observability with tracing channels

Tue, 21 Apr 2026 00:00:00 GMT

Almost every production application uses a number of different tools and libraries,whether that’s a library to communicate with a database, a cache, or frameworks like Nest.js or Nitro. To be able to observe what’s going on in production, application developers reach out for Application Performance Monitoring (APM) tools like Sentry.

But there’s an inherent problem: the performance data that APM tools need is most often not coming natively from the libraries themselves. The task of getting this data is delegated to APM tools like Sentry or OpenTelemetry, which instrument crucial functionality of a library on their behalf.

What is instrumentation?

The most fundamental requirement to make an application observable is the ability to instrument each of its components and the libraries it uses. Instrumentation is the process of adding code to a program to monitor and analyze its internal operations and generate diagnostic data. It’s exactly what the Sentry SDKs and OpenTelemetry instrumentation are doing under the hood.

Consider a typical HTTP client library. Application developers want to know when a request starts and completes, along with some metadata like URL, status code and headers. Today, libraries handle this inconsistently: some provide custom hooks like emitter.on('request', ...), while others offer vendor-specific middleware to intercept requests. In these cases, Sentry and OpenTelemetry can write plug-ins that emit observability data.

This works, but it puts the burden on the library or framework (e.g. Nuxt) to consciously design an instrumentation API and identify the right places to expose it. Hooks and interceptors allow injecting observability code at the correct spots, but APM maintainers are entirely dependent on library authors to keep those APIs stable over time. On top of that, there is no shared convention (each library exposes different hook shapes and different metadata) so APM maintainers must write and maintain very different plugins for each library.

How server-side JavaScript is instrumented

The traditional approach to JavaScript instrumentation is “monkey-patching”. That’s modifying library code at runtime so that library functions not only do their original job, but also emit observability data. This is only possible in CommonJS (CJS), where modules are mutable and synchronously loaded.

However, the ecosystem is shifting. As server-side JavaScript moves further toward ES Modules (ESM), this approach breaks down. ES modules are immutable and loaded asynchronously, which means you simply can't patch imports at runtime the same way anymore. For further information: the ESM Observability Instrumentation Guide covers this topic in greater detail.

The current workaround (and a way to “patch” imports) is using Module Customization Hooks paired with the --import flag. A popular hook is import-in-the-middle/hook.mjs. It works, but it's brittle, complex, and feels like what it is: a workaround.

Both monkey-patching in CJS and Module Customization Hooks in ESM share the same fundamental flaw: they apply instrumentation “from the outside”. The library itself is passive. The question worth asking is: what if libraries were active participants in their own observability and emit telemetry data themselves?

This would be possible through diagnostics APIs like Tracing Channels.

Libraries should emit their own telemetry

Rather than waiting for APM tools to reach in and grab data, libraries can proactively expose their internal operations using tools built directly into the runtime. The right tool for this is Diagnostics Channels, and more specifically, Tracing Channels. Those features are being developed by the Node.js Diagnostics Working Group.

A huge shoutout to Stephen Belanger, the creator of the diagnostics_channel API in Node.js, who founded the working group and has been instrumental in pushing this topic forward. He's been providing feedback on proposals and acting as a voice of authority, which is sometimes exactly what's needed to convince library maintainers to get on board.

Diagnostics Channels

Diagnostics Channels are a high-performance, synchronous event system built directly into Node.js. They’re also supported in Bun, Deno, and Cloudflare Workers (via the Node.js compatibility flag), making them a cross-runtime primitive.

Their primary use case is one-off events. For example, “a connection was opened” (like node-redis does this here). The limitation is that they don’t inherently represent a full lifecycle. You have to manually link start and stop events to measure duration.

Tracing Channels

Tracing Channels solve exactly that limitation. A Tracing Channel is a bundle of related Diagnostics Channels that automatically creates sub-channels for a complete operation lifecycle: start, end, error, and asyncStart. More importantly, a TracingChannel automatically propagates context across async boundaries. This means APM tools can correlate a database query back to the incoming HTTP request that caused it, without any manual bookkeeping.

Together, they give library and framework authors a standardized way to expose internal operations without coupling to any specific logging or tracing vendor. The library emits structured events and observability tools decide what to do with them.

How libraries can implement Tracing Channels

Tracing Channels have essentially zero cost when unused. If no subscriber is listening, emitting data costs almost nothing. It means library authors can add tracing channels without worrying about penalizing users who don’t need observability. The benefits are that there is no monkey-patching needed anymore and it eliminates the need for users to pass --import flags for preloading in ESM.

Naming and consistency: The channel is the contract

Tracing Channels should always be scoped to the library that emits them, using the npm package name as the namespace. Since package names are globally unique, this keeps channel names collision-free. For example, mysql2 ships mysql2:query which would emit tracing:mysql2:query:start and all other channels. And the unstorage library ships unstorage.get which emits tracing:unstorage.get:start and so on. The untracing package is working to establish broader naming standards across the ecosystem.

Equally important: Always emit a consistent data structure. Sentry and other APM tools can only provide automatic instrumentation if they know what shape your payload will have.

The pattern itself is straightforward. The library wraps its operation in a tracePromise call:

And on the consumer side, an SDK like Sentry subscribes to those events:

The library and the observability tool never need to know about each other. The channel is the contract.

The ecosystem is already moving

In early February 2026, we (Andrei, Jan and Sigrid) from Sentry attended OTel Unplugged EU and brought up the topic “Prepare for better JS ESM Support”, which was voted on the list of top priorities for the OpenTelemetry ecosystem.

So this isn’t a theoretical proposal. A growing number of well-known libraries have already shipped or merged PRs for Diagnostics Channel and Tracing Channel support.

On the framework and HTTP side, undici (Node.js’s built-in HTTP client) has shipped Diagnostics Channels since Node 20.12, and also fastify (docs), nitro (PR) and h3 (PR) have native support. On the database side, unstorage (PR) and mysql2 (Docs) already use Tracing Channels, and pg / pg-pool are actively working on it. Redis clients aren’t far behind either and already support Tracing Channels in ioredis (PR) and node-redis (PR).

None of this happens without the people willing to do the work. A massive shoutout to Sentry engineer Abdelrahman Awad (@logaretm) for driving Tracing Channel implementations across multiple libraries. And a special thanks to Pooya Parsa (@pi0), his openness to collaborate in h3 and nitro was instrumental in formalizing this approach and showing the ecosystem what it could look like.

The vision ahead

We’re still in a “chicken and egg” phase. Libraries need to add channels before APM tools have strong reasons to listen to them, and APM tools need to start listening before authors feel the pressure to add them.

The goal is universal JS observability: a world where Node.js, Bun, and Deno share the same diagnostic patterns, and instrumentation just works without monkey-patching in CJS, without --import flags in ESM, and without fragile workarounds. Libraries become active drivers of observability ensuring they are emitting data they think is the most relevant to their users.

Sample AI traces at 100% without sampling everything

Thu, 09 Apr 2026 00:00:00 GMT

A little while ago, when agents were telling me “You’re absolutely right!”, I was building webvitals.com. You put in a URL, it kicks off an API request to a Next.js API route that invokes an agent with a few tools to scan it and provide AI generated suggestions to improve your… you guessed it… Web Vitals. Do we even care about these anymore?

I had the traceSampleRate set to 100% in development, but in production, I sampled it down to 10% because… well that’s what our instrumentation recommends. Kyle wrote a great blog post explaining that “Watching everything is watching nothing”. But AI is non-deterministic. And when I was debugging an error from a tool call, I realized I was missing very important spans emitted from the Vercel AI SDK because of that sampling strategy.

An agent run with 7 tool calls doesn't get partially sampled. You either capture the whole span tree or you lose it entirely. This is how head-based sampling works.

I was chasing ghosts.

Agent runs are span trees, and sampling is all-or-nothing

A typical agent execution looks like this in Sentry's trace view:

That's 11 spans in a single run. The sampling decision happens once, at the root: the POST /api/chat HTTP transaction. Every child span inherits that decision. If the root is dropped, all 9 spans disappear.

This is fundamentally different from sampling HTTP requests, where dropping one GET /api/users is no big deal because the next one is basically identical.

Agent runs are not identical. Each one makes different decisions, calls different tools, processes different data. An agent that hallucinated on run 67 might work perfectly on run 420. If your sample rate dropped 67, you'll never know what went wrong.

How head-based sampling actually works (and why it matters here)

Both the Sentry JavaScript and Python SDKs use head-based sampling: the decision is made at the start of the trace, before any child spans exist.

In the JavaScript SDK, SentrySampler.shouldSample() is explicit about this:

Non-root spans don't get a vote. If the root span was dropped, tracesSampler is never called for any child, including your gen_ai.request and gen_ai.execute_tool spans. They inherit the parent's fate.

In Python, the same logic lives in Transaction._set_initial_sampling_decision(). The traces_sampler callback receives a sampling_context dict with transaction_context (containing op and name) and parent_sampled. It only fires for root transactions.

This means head-based sampling doesn't support independently sampling gen_ai child spans at a different rate than their parent transaction. There's no "sample 100% of LLM calls but 10% of HTTP requests." If the HTTP request is dropped, the LLM calls inside it are dropped too.

I’d love to walk through a few different scenarios to show the difference in filtering approaches based on wether or not the root span is from an agent or the application.

Scenario 1: The `gen_ai` span IS the root

Sometimes your agent run is the root span. Maybe it’s a cron job thats running an agent, a queue consumer processing an AI task, or a CLI script. In these cases, tracesSampler sees the gen_ai.* operation directly and you can match on it:

JavaScript:

Python:

This is the easy case. The hard case is next.

Scenario 2: The `gen_ai` spans are children of an HTTP transaction

This is the common case in web applications. A user hits POST /api/chat, your framework creates an http.server root span, and somewhere inside that request handler your agent runs. By the time the first gen_ai.request span is created, the sampling decision was already made for the HTTP transaction.

The fix: identify which routes trigger AI calls and sample those routes at 100%.

JavaScript:

Python:

Replace the route strings with whatever paths your AI features live on. If your entire app is AI-powered, skip the tracesSampler and just set tracesSampleRate: 1.0.

The cost math: AI API bills dwarf observability costs

The instinct to sample AI traces at a lower rate usually comes from cost concerns. Let's look at the actual numbers.

What	Cost per event
Claude Sonnet 4 input (1K tokens)	~$0.003
Claude Sonnet 4 output (1K tokens)	~$0.015
Gemini 2.5 Flash input (1K tokens)	~$0.00015
Gemini 2.5 Flash output (1K tokens)	~$0.0006
A typical agent run (3 LLM calls, 2 tool calls)	$0.02-$0.15
Sentry span events for that agent run (~9 spans)	Fraction of a cent

The LLM calls themselves are 10-100x more expensive than the monitoring. You're already paying for the AI call; dropping the observability span to save a fraction of a cent per call is like skipping the dashcam to save on gas.

When 100% tracing isn't feasible: Metrics and Logs as a safety net

If you genuinely can't sample AI routes at 100%, because of, say, massive scale or strict budget restraints, you can still capture the important signals from every AI call using Sentry Metrics and Logs. Both are independent of trace sampling.

JavaScript - emit metrics on every LLM call:

Python - emit metrics on every LLM call:

You can also log every call with structured attributes for searchability:

JavaScript:

Python:

Here's what each telemetry layer gives you:

Signal	Traces (sampled)	Metrics (100%)	Logs (100%)
Full span tree with prompts/responses	Yes	No	No
Token usage distributions (p50, p99)	Partial	Yes	No
Cost attribution by model/user	Partial	Yes	Yes
Error rates by model/endpoint	Partial	Yes	Yes
Latency distributions	Partial	Yes	No
Searchable per-call records	Yes	No	Yes

The recommended approach: Use tracesSampler to capture 100% of AI-related routes. If that's not possible, combine a lower trace rate with metrics and logs emitted on every call. Traces give you the debugging depth; metrics and logs give you the aggregate picture.

Once you're emitting these metrics, you can build custom dashboards that go beyond what the pre-built AI Agents dashboard shows. The Sentry CLI makes this scriptable:

The pre-built dashboard gives you per-model and per-tool aggregates. Custom dashboards answer the business questions: who's driving cost, which features justify their AI spend, and which conversations are spiraling.

The full production config

Here's a complete setup that samples AI routes at 100%, everything else at your baseline, and emits metrics as a safety net:

JavaScript:

Python:

Quick reference

Situation	What to do
AI is the core product	`tracesSampleRate: 1.0` - sample everything
AI is one feature in a larger app	`tracesSampler` with AI routes at 1.0, baseline for the rest
Can't afford 100% on AI routes	Lower trace rate + metrics/logs on every call
Already using `tracesSampler`	Add AI route matching to your existing logic
Sample rate is already 1.0	No change needed

The underlying principle: agent runs are high-value, low-volume (relative to HTTP traffic), and expensive to reproduce. Sample them accordingly.

If you're just getting started with AI monitoring, check out our companion post on the developer's guide to AI agent monitoring, which covers the full setup across 10+ frameworks, the pre-built dashboards, and a real debugging walkthrough.

For framework-specific setup, see our AI monitoring docs. If you're using an AI coding assistant, install the Sentry CLI skill (npx skills add <https://cli.sentry.dev>) to configure your sampling, build custom dashboards, and investigate issues directly from your editor.

AI-driven caching strategies and instrumentation

Fri, 13 Feb 2026 00:00:00 GMT

The things that separate a minimum viable product (MVP) from a production-ready app are polish, final touches, and the Pareto 'last 20%' of work. Most bugs, edge cases, and performance issues won't show up until after launch, when real users start hammering your application. If you're reading this, you're probably at the 80% mark, ready to tackle the rest.

This article covers application caching: how to use it for cutting tail latency, protecting databases, and handling traffic spikes, plus how to monitor it once it's running in production.

This article is part of a series of common pain points when bringing an MVP to production:

Paginating Large Datasets in Production: Why OFFSET Fails and Cursors Win
AI-driven caching strategies and instrumentation (this one)

Building a mental model for caching

Good caching multiplies your performance, scalability, and cost efficiency. Done right, it gives you sub-millisecond responses and absorbs traffic spikes without crushing your origin servers. Done wrong (aggressive caching, bad invalidation, wrong strategies) it creates subtle bugs, stale data, and degraded user experience (UX) that's hard to debug and usually only shows up after it's already affected a lot of users.

Before looking for caching opportunities, you need a mental model for what should and shouldn't be cached. Here's a checklist:

✅ Cache if most are true:

Expensive: slow CPU, slow input/output (IO), heavy DB, big joins/aggregates, external application programming interface (API)
Frequent: called a lot (high requests per minute (RPM)) or sits on hot paths (page load, core API)
Reusable: same inputs repeat (low key cardinality)
Stable-ish: data doesn't change every second (or can tolerate staleness)
Spiky load: bursty traffic where cache absorbs thundering herds
Tail hurts: P95/P99 is bad, and misses correlate with slow requests
Safe to serve stale: user impact low, or can use stale-while-revalidate (SWR)
Invalidation is easy: time to live (TTL) works, or updates have clear triggers
Small-ish payload: memory cost reasonable, serialization cheap

❌ Don't cache (or be very careful) if any are true:

High cardinality keys: per-user / per-page / per-filter explosion → mostly misses (pagination is a special case - see note below)
Highly mutable: correctness demands freshness
Personalized / permissioned: easy to leak data via key mistakes
Hard invalidation: no clear TTL, updates unpredictable
Already fast: saving 5ms isn't worth complexity
Cache stampede risk: expensive recompute + synchronized expiry (needs locking / jitter)

There's a special rule for caching paginated endpoints - cache page 1 + common filters first. Page 1 and a small set of common filters are usually hot and reused, so caching pays off. As page numbers increase, key cardinality explodes and reuse collapses, so deep pages will naturally miss and that's fine. Optimize for protecting the backend and reducing tail latency on the entry points, not for achieving uniform hit rates across all pages.

Finding caching opportunities in production

Once you know what should be cached, the next question is where caching will actually matter. In production systems, good caching candidates show up through pain, usually in three forms.

Backend pain (start here)

For backend and full-stack systems, this is the most actionable signal.

Look for:

Transactions with bad P95/P99
Endpoints with heavy database (DB) time
Repeated queries, joins, aggregates
Fan-out (one request triggering many downstream calls)
Lock contention or connection pool pressure

These are places where caching immediately reduces real work.

User pain (confirmation)

Slow page loads, janky interactions, timeouts. Web Vitals like Time to First Byte (TTFB), Largest Contentful Paint (LCP), and Interaction to Next Paint (INP) help confirm that backend slowness is actually affecting users. They're most useful once you already suspect a backend bottleneck.

Cost pain (the long-term signal)

Even if your users aren't complaining yet, repetition is expensive:

High DB read volume
Paid external API calls
Recomputed rollups and counts

Cost often lags behind performance problems, but it's a strong motivator once traffic grows.

A simple prioritization heuristic is cost density:

requests per minute * time saved per request

An endpoint that's moderately slow but hit consistently is usually a better caching target than a pathological endpoint nobody touches.

Example: a slow paginated endpoint

Consider a paginated endpoint performing a heavy database query with no caching.

In Sentry > Insights > Backend, filtering by API transactions (above the table) surfaces this:

The GET /admin/order-items endpoint has potential for caching. Let's dive into it. I'll pick a slower event and inspect the trace view:

From the screenshot, we can see:

776ms total duration
731ms spent in a single DB span
Multiple joins
LIMIT + OFFSET pagination
Poor TTFB in Web Vitals

Against the checklist:

✅ Expensive (heavy db query, joins)
✅ Frequent (high throughput)
✅ Stable-ish (can tolerate brief staleness)
✅ Tail hurts (bad P95)
✅ Invalidation is easy (writes are controlled)
⚠️ High cardinality key (pagination)

This is a strong candidate for selective caching, not blanket caching.

Applying and instrumenting caching

Sentry comes with Cache Monitoring too. It helps you see your caching hit/miss rates across your application, and inspect specific events captured in production when the cache was either hit or missed.

Instrumenting caches can be done both automatically and manually. If you're using Redis, you can leverage the automatic instrumentation. If not, manual instrumentation is just as easy.

The most straightforward approach is to just ask Seer to do it for you. At the time of publishing this article, Seer's "open-ended questions" feature is private access only, but I'll give you a little sneak peek. You can access it with Cmd + / and straight up ask it to instrument caches for you:

Seer will then open a PR on your repo, so you can merge it and be done with it.

In case you don't have access to this Seer feature yet, this is all you need to do to instrument your caches:

That's it. All we need to do is wrap the redis.get and redis.setex with Sentry.startSpan and provide caching-specific span attributes. If you're not using JavaScript on your backend, you can simply rewrite these functions in your language of choice. As long as you're sending spans that have the correct op and attributes, you'll get caches instrumentation right.

Now we can just use these two functions:

Monitoring and optimizing caches

Once we deploy this we start seeing caches coming in:

The data shows 75% Miss Rate on that endpoint. That's neither good nor bad by itself. The goal is not to reach 0% miss rate. If you do, you're probably hiding bugs. There is no "goal value" for the miss rate you should aim for. The miss rate % should only align with your expectations. 75% Miss Rate on this endpoint might make sense, but there also might be room for optimization. Let's click into the transaction to see actual events:

From the screenshot above we can see that cache hits happen only on page 1, and misses on the other pages. And that's because we followed the caching paginated endpoint advice - only cache page 1 and common filters. Users were visiting multiple pages, but Page 1 accounted for 25% of the visit, hence the 75% Miss Rate. From the Transaction Duration column we can see that Page 1 loaded in under 40ms, while for other pages the users had to wait >700ms.

So our caching implementation is working, and users are experiencing faster page loads. From this point on we'll know that for our /admin/order-items endpoint the normal miss rate sits around 75%. If we introduce a bug later on, for example buggy cache keys (missing params, extra params), new filters or sorting, per-user or per-flag keys creeping in, accidentally including volatile data in keys (timestamps, request IDs, locale), or mess up the TTL, this number is going to shoot up, and we'll see it in the chart. A spike in the chart will indicate to us that we broke caching and the users are experiencing slowdowns.

AI-assisted cache expansion in production

Remember the "cache only page 1 + common filters" rule? We're going to bend it a little bit. If we want to bring down the 75% Miss Rate above, we'll need to expand caching to cover more pages than just page 1, but we have to be careful not to over-expand because we'll bloat our Redis instance.

Here's a practical AI-assisted approach to help make a good cache expansion decision:

You can use Sentry Model Context Protocol (MCP) to pull all the cache.get spans from your project and group them by the cache.key property, and then ask the agent to suggest how we can expand the caching. Looking at the screenshot, we can see that Page 1 remains with the most hits, but Pages 2 - 6 have significant traffic too. Long-tail pages like 7 and 10 have minimal traffic so no need to cache, and there's also some test data that it discarded. It suggested me to expand the cache to Page 3. Let's see how that affects the Miss Rate:

Would you look at that! We're now at 30% Miss Rate, down from 75%. This means roughly only 1 in 3 requests will hit the database. But it's important to keep an eye on Redis memory as well. Pushing caching from Page 1 to Page 3 might bloat our Redis instance, and in that case caching won't be worth it. Redis bloat = hot paths evictions, which in other words means we'd be undoing the performance gains we got from caching in the first place.

Alerting on miss rate deviations

The last thing you want to do is to set up an alert that notifies you (email, Slack) when there's an anomaly in cache misses. Head to Sentry > Issues > Alerts, pick Performance Throughput, and create an alert with the following options:

Make sure you pick your project and environment correctly
You'd want to filter on cache.hit being False, and on your cache.key as well
Set the thresholds to Anomaly
Start with High level of responsiveness, and tune later
For Direction of anomaly movement you'd want Above bounds only so you only get notified on cache miss increases
Lastly, define your action, whether you want an email to yourself, or your team, or a slack message in a specific channel
Name it and hit "Save Rule"

That's it. Now if you accidentally break the caching mechanism, it'll result in a flood of cache misses, and Sentry will pick it up and notify you about it. You're free to filter as you want, and create as many alerts as you want. cache.hit and cache.key are not the only attributes you can filter on. Play with the filter bar to figure out all the ways you can filter on.

Where to go from here

At this point, caching is working. The endpoint is faster, the database is protected, and you have a baseline Miss Rate that reflects normal behaviour. From here on, the work is less about adding caching, and more about making sure it keeps doing what it's supposed to do.

The first thing to watch is Miss Rate deviations, not the absolute number. A stable line that suddenly jumps usually means something changed: a cache key bug, new filters or sorting, increased cardinality, or a TTL or invalidation mistake introduced during a deploy. Those changes tend to show up in cache metrics before users start complaining.

Next, always read Miss Rate together with latency. A higher Miss Rate that doesn't affect P95/P99 is usually harmless. A higher Miss Rate that brings the database spans back into the critical path is a regression worth acting on.

As you expand caching, keep an eye on Redis memory and evictions. Improving hit rates by caching more pages only helps if hot keys stay resident. Memory pressure that causes frequent evictions can quietly undo your gains and make cache behaviour unpredictable.

Finally, revisit cache boundaries as traffic evolves. Usage patterns change. What was a long-tail page last month may become hot after a product change or a new workflow. Cache strategies should evolve with real traffic, not stay frozen around initial assumptions.

If you treat cache metrics as guardrails (baseline Miss Rates, latency correlations, and post-deploy checks) caching becomes a stable part of your system instead of a fragile optimization you're afraid to touch.

Not everything that breaks is an error: a Logs and Next.js story

Tue, 13 Jan 2026 00:00:00 GMT

Stack traces are great, but they only tell you what broke. They rarely tell you why. When an exception fires, you get a snapshot of the moment things went sideways, but the context leading up to that moment? Gone.

That's where logs come in. A well-placed log can be the difference between hours of head-scratching and a five-minute fix. Let me show you what I mean with a real bug I encountered recently.

Protecting an AI-powered Next.js endpoint from bots

I've been working on WebVitals, a Next.js application powered by AI. You enter a domain, and it runs a series of tool calls to fetch performance data, then uses an AI agent to parse the results and give you actionable suggestions for improving your web vitals.

On the frontend, I'm using the AI SDK's useChat hook to handle the conversation:

The /api/chat endpoint is a standard Next.js API route, which means anyone can hit it from anywhere. Since each request costs money (OpenAI isn't free), I needed some protection against bots and malicious actors trying to spike my bill.

Vercel has a neat solution for this: bot protection via their checkBotId function. It looks at the incoming request and determines if it's coming from a bot. Simple, effective, and no CAPTCHAs asking users to identify crosswalks.

A production bug that only affected Firefox and Safari

Everything worked perfectly in local development. Deployed to production, tested in Chrome. Still perfect. Then I opened Firefox.

"Access denied." The same request that worked in Chrome was getting blocked in Firefox. Safari had the same issue.

I checked Sentry. The error was showing up across multiple browsers, but only Firefox and Safari were affected. Chrome users were fine.

I tried fixing it. Multiple releases, multiple attempts. The error kept coming back. The stack trace wasn't helpful, it just showed me that the bot check was returning true for these browsers. But why would Firefox and Safari be flagged as bots when Chrome wasn't?

The stack trace couldn't answer that question.

Adding logs to capture the missing context

This is the kind of problem where you need more context than an error alone can provide. I needed to see what data the checkBotId function was working with when it made its decision.

So I added a log:

Nothing fancy. Just log the bot check result along with the user agent string that was passed to the function. Bot protection typically works by examining the user agent, so this seemed like the right data to capture.

The key here is that Sentry logs are high-cardinality. You can pass any attributes you want, and you'll be able to search and filter by them later. No need to decide upfront which attributes are "important". Just log what might be useful and let Sentry handle the rest.

Using Sentry Logs to identify the root cause

With logs in place, I headed over to Sentry's Logs view and searched for my "Bot ID check result" messages. I added the isBot attribute as a column so I could quickly scan the results. (In Sentry, boolean values show as 0 for false and 1 for true.)

I found a request that passed the bot check: isBot: 0. Looking at the details, the user agent was exactly what you'd expect: a standard Chrome user agent string.

Then I looked at a request that failed: isBot: 1. The user agent was... not what I expected.

Instead of the browser's user agent, I was seeing ai-sdk. The AI SDK was sending its own user agent string instead of the browser's.

This explained everything. When the AI SDK makes requests to the backend, it uses its own user agent. Vercel's bot protection sees ai-sdk and thinks, reasonably, that it's not a real browser. Bot detected. Access denied.

But why only Firefox and Safari? Because something in how those browsers (or my setup in those browsers) was causing the AI SDK's user agent to be used instead of the browser's. Chrome happened to pass through the correct user agent.

To confirm my hunch, I used Sentry's trace connection feature. Everything in Sentry is linked by trace, so I could navigate from the log entry back to the full trace view and see the broader context of the request.

Sure enough, the trace confirmed this was coming from Firefox. Mystery solved.

Fixing the issue once the data told the story

The solution was straightforward. In Vercel's firewall settings, I added a rule to bypass bot protection for requests where the user agent contains ai-sdk.

Saved the rule, published the changes, and tried again in Firefox.

It worked. No more access denied errors. It’s also being tracked in a Github issue on the AI SDK for those who are curious.

What this bug clarified about logging and debugging

This bug would have taken much longer to diagnose without logs. The error itself, "Access denied", told me nothing about why the request was being denied. The stack trace showed me where it happened, but not the data that caused it.

A few takeaways:

Logs provide context that stack traces can't. When you're debugging, you often need to know what the data looked like at a specific point in time. Errors capture the moment of failure; logs capture the journey.
High-cardinality attributes are powerful. Being able to search logs by any attribute: isBot, userAgent, makes it trivial to slice and dice your data. You don't have to predict which attributes will be useful ahead of time.
Trace connection ties everything together. Seeing a log in isolation is useful, but being able to jump from a log to the full trace (and vice versa) gives you the complete picture. In this case, it let me confirm that the AI SDK user agent was indeed coming from Firefox requests.

If you're already using Sentry for error tracking, adding logs is a natural next step. For new projects, you can use the Sentry.logger API directly. If you have existing logging with something like Pino, check out the logging integrations to pipe those logs into Sentry automatically.

Head on over to our Next.js Logs docs to learn more about how to send structured logs from your application to Sentry for debugging and observability. Or just check out our Logs quickstart guide and get up and running in no time.

Not everything that breaks throws an error. Sometimes you just need to see what was happening.

Technical Content Feed

No more monkey-patching: Better observability with tracing channels

What is instrumentation?

How server-side JavaScript is instrumented

Libraries should emit their own telemetry

Diagnostics Channels

Tracing Channels

How libraries can implement Tracing Channels

Naming and consistency: The channel is the contract

The ecosystem is already moving

The vision ahead

Sample AI traces at 100% without sampling everything

Agent runs are span trees, and sampling is all-or-nothing

How head-based sampling actually works (and why it matters here)

Scenario 1: The gen_ai span IS the root

Scenario 2: The gen_ai spans are children of an HTTP transaction

The cost math: AI API bills dwarf observability costs

When 100% tracing isn't feasible: Metrics and Logs as a safety net

The full production config

Quick reference

AI-driven caching strategies and instrumentation

Building a mental model for caching

✅ Cache if most are true:

❌ Don't cache (or be very careful) if any are true:

Finding caching opportunities in production

Backend pain (start here)

User pain (confirmation)

Cost pain (the long-term signal)

Example: a slow paginated endpoint

Applying and instrumenting caching

Monitoring and optimizing caches

AI-assisted cache expansion in production

Alerting on miss rate deviations

Where to go from here

Not everything that breaks is an error: a Logs and Next.js story

Protecting an AI-powered Next.js endpoint from bots

A production bug that only affected Firefox and Safari

Adding logs to capture the missing context

Using Sentry Logs to identify the root cause

Fixing the issue once the data told the story

What this bug clarified about logging and debugging

Scenario 1: The `gen_ai` span IS the root

Scenario 2: The `gen_ai` spans are children of an HTTP transaction