<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Technical Content Feed]]></title><description><![CDATA[Product, Engineering, and Marketing updates from the developers of Sentry.]]></description><link>https://blog.sentry.io</link><generator>GatsbyJS</generator><lastBuildDate>Tue, 21 Apr 2026 18:57:03 GMT</lastBuildDate><item><title><![CDATA[No more monkey-patching: Better observability with tracing channels]]></title><description><![CDATA[Almost every production application uses a number of different tools and libraries,whether that’s a library to communicate with a database, a cache, or framewor...]]></description><link>https://blog.sentry.io/observability-with-tracing-channels/</link><guid isPermaLink="false">https://blog.sentry.io/observability-with-tracing-channels/</guid><pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Almost every production application uses a number of different tools and libraries,whether that’s a library to communicate with a database, a cache, or frameworks like Nest.js or Nitro. To be able to observe what’s going on in production, application developers reach out for &lt;a href=&quot;https://sentry.io/solutions/application-performance-monitoring/&quot;&gt;Application Performance Monitoring&lt;/a&gt; (APM) tools like Sentry. &lt;/p&gt;&lt;p&gt;But there’s an inherent problem: the performance data that APM tools need is most often not coming natively from the libraries themselves. The task of getting this data is delegated to APM tools like Sentry or &lt;a href=&quot;https://blog.sentry.io/send-your-existing-opentelemetry-traces/&quot;&gt;OpenTelemetry&lt;/a&gt;, which instrument crucial functionality of a library on their behalf.&lt;/p&gt;&lt;h2&gt;What is instrumentation?&lt;/h2&gt;&lt;p&gt;The most fundamental requirement to make an application observable is the ability to instrument each of its components and the libraries it uses. &lt;b&gt;Instrumentation&lt;/b&gt; is the process of adding code to a program to monitor and analyze its internal operations and generate diagnostic data. It’s exactly what the Sentry SDKs and OpenTelemetry instrumentation are doing under the hood.&lt;/p&gt;&lt;p&gt;Consider a typical HTTP client library. Application developers want to know when a request starts and completes, along with some metadata like URL, status code and headers. Today, libraries handle this inconsistently: some provide custom hooks like &lt;code&gt;emitter.on(&amp;#39;request&amp;#39;, ...)&lt;/code&gt;, while others offer vendor-specific middleware to intercept requests. In these cases, Sentry and OpenTelemetry can write plug-ins that emit observability data.&lt;/p&gt;&lt;p&gt;This works, but it puts the burden on the library or framework (e.g. Nuxt) to consciously design an instrumentation API and identify the right places to expose it. Hooks and interceptors allow injecting observability code at the correct spots, but APM maintainers are entirely dependent on library authors to keep those APIs stable over time. On top of that, there is no shared convention (each library exposes different hook shapes and different metadata) so APM maintainers must write and maintain very different plugins for each library.&lt;/p&gt;&lt;h2&gt;How server-side JavaScript is instrumented&lt;/h2&gt;&lt;p&gt;The traditional approach to JavaScript instrumentation is “monkey-patching”. That’s modifying library code at runtime so that library functions not only do their original job, but also emit observability data. This is only possible in CommonJS (CJS), where modules are mutable and synchronously loaded.&lt;/p&gt;&lt;p&gt;However, the ecosystem is shifting. As server-side JavaScript moves further toward ES Modules (ESM), this approach breaks down. ES modules are immutable and loaded asynchronously, which means you simply can&amp;#39;t patch imports at runtime the same way anymore. For further information: the &lt;a href=&quot;https://github.com/getsentry/esm-observability-guide&quot;&gt;ESM Observability Instrumentation Guide&lt;/a&gt; covers this topic in greater detail.&lt;/p&gt;&lt;p&gt;The current workaround (and a way to “patch” imports) is using Module Customization Hooks paired with the --import flag. A popular hook is &lt;code&gt;import-in-the-middle/hook.mjs&lt;/code&gt;. It works, but it&amp;#39;s brittle, complex, and feels like what it is: a workaround.&lt;/p&gt;&lt;p&gt;Both monkey-patching in CJS and Module Customization Hooks in ESM share the same fundamental flaw: they apply instrumentation “from the outside”. The library itself is passive. The question worth asking is: &lt;b&gt;what if libraries were active participants in their own observability and emit telemetry data themselves? &lt;/b&gt;&lt;/p&gt;&lt;p&gt;This would be possible through diagnostics APIs like Tracing Channels.&lt;/p&gt;&lt;h2&gt;Libraries should emit their own telemetry&lt;/h2&gt;&lt;p&gt;Rather than waiting for APM tools to reach in and grab data, libraries can proactively expose their internal operations using tools built directly into the runtime. The right tool for this is &lt;b&gt;Diagnostics Channels&lt;/b&gt;, and more specifically, &lt;b&gt;Tracing Channels&lt;/b&gt;. Those features are being developed by the &lt;a href=&quot;https://github.com/nodejs/diagnostics&quot;&gt;Node.js Diagnostics Working Group&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;A huge shoutout to &lt;a href=&quot;https://github.com/qard&quot;&gt;Stephen Belanger&lt;/a&gt;, the creator of the &lt;code&gt;diagnostics_channel&lt;/code&gt; API in Node.js, who founded the working group and has been instrumental in pushing this topic forward. He&amp;#39;s been providing feedback on proposals and acting as a voice of authority, which is sometimes exactly what&amp;#39;s needed to convince library maintainers to get on board.&lt;/p&gt;&lt;h3&gt;Diagnostics Channels&lt;/h3&gt;&lt;p&gt;&lt;a href=&quot;https://nodejs.org/api/diagnostics_channel.html&quot;&gt;Diagnostics Channels&lt;/a&gt; are a high-performance, synchronous event system built directly into Node.js. They’re also supported in Bun, Deno, and Cloudflare Workers (via the Node.js compatibility flag), making them a cross-runtime primitive.&lt;/p&gt;&lt;p&gt;Their primary use case is one-off events. For example, “a connection was opened” (like &lt;code&gt;node-redis&lt;/code&gt; &lt;a href=&quot;https://github.com/redis/node-redis/blob/41c908e6d65419fed6d985a9664427df1f48fb98/docs/diagnostics-channel.md?plain=1#L45-L48&quot;&gt;does this here&lt;/a&gt;). The limitation is that they don’t inherently represent a full lifecycle. You have to manually link &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;stop&lt;/code&gt; events to measure duration.&lt;/p&gt;&lt;h3&gt;Tracing Channels&lt;/h3&gt;&lt;p&gt;&lt;a href=&quot;https://nodejs.org/api/diagnostics_channel.html#class-tracingchannel&quot;&gt;Tracing Channels&lt;/a&gt; solve exactly that limitation. A Tracing Channel is a bundle of related Diagnostics Channels that automatically creates sub-channels for a complete operation lifecycle: &lt;code&gt;start&lt;/code&gt;, &lt;code&gt;end&lt;/code&gt;, &lt;code&gt;error&lt;/code&gt;, and &lt;code&gt;asyncStart&lt;/code&gt;. More importantly, a &lt;code&gt;TracingChannel&lt;/code&gt; automatically propagates context across async boundaries. This means APM tools can correlate a database query back to the incoming HTTP request that caused it, without any manual bookkeeping.&lt;/p&gt;&lt;p&gt;Together, they give library and framework authors a standardized way to expose internal operations without coupling to any specific logging or tracing vendor. The library emits structured events and observability tools decide what to do with them.&lt;/p&gt;&lt;h2&gt;How libraries can implement Tracing Channels&lt;/h2&gt;&lt;p&gt;Tracing Channels have essentially zero cost when unused. If no subscriber is listening, emitting data costs almost nothing. It means library authors can add tracing channels without worrying about penalizing users who don’t need observability. The benefits are that there is no monkey-patching needed anymore and it eliminates the need for users to pass &lt;code&gt;--import&lt;/code&gt; flags for preloading in ESM.&lt;/p&gt;&lt;h3&gt;Naming and consistency: The channel is the contract&lt;/h3&gt;&lt;p&gt;Tracing Channels should always be scoped to the library that emits them, using the npm package name as the namespace. Since package names are globally unique, this keeps channel names collision-free. For example, &lt;code&gt;mysql2&lt;/code&gt; ships &lt;code&gt;mysql2:query&lt;/code&gt; which would emit &lt;code&gt;tracing:mysql2:query:start&lt;/code&gt; and all other channels. And the &lt;code&gt;unstorage&lt;/code&gt; library ships &lt;code&gt;unstorage.get&lt;/code&gt; which emits &lt;code&gt;tracing:unstorage.get:start&lt;/code&gt; and so on. The &lt;a href=&quot;https://github.com/unjs/untracing&quot;&gt;&lt;code&gt;untracing&lt;/code&gt;&lt;/a&gt; package is working to establish broader naming standards across the ecosystem.&lt;/p&gt;&lt;p&gt;Equally important: Always emit a consistent data structure. Sentry and other APM tools can only provide automatic instrumentation if they know what shape your payload will have.&lt;/p&gt;&lt;p&gt;The pattern itself is straightforward. The library wraps its operation in a &lt;code&gt;tracePromise&lt;/code&gt; call:&lt;/p&gt;&lt;p&gt;And on the consumer side, an SDK like Sentry subscribes to those events:&lt;/p&gt;&lt;p&gt;The library and the observability tool never need to know about each other. The channel is the contract.&lt;/p&gt;&lt;h2&gt;The ecosystem is already moving&lt;/h2&gt;&lt;p&gt;In early February 2026, we (&lt;a href=&quot;https://github.com/andreiborza&quot;&gt;Andrei&lt;/a&gt;, &lt;a href=&quot;https://github.com/JPeer264&quot;&gt;Jan&lt;/a&gt; and &lt;a href=&quot;https://github.com/s1gr1d&quot;&gt;Sigrid&lt;/a&gt;) from Sentry attended &lt;a href=&quot;https://opentelemetry.io/blog/2025/otel-unplugged-fosdem/&quot;&gt;OTel Unplugged EU&lt;/a&gt; and brought up the topic “Prepare for better JS ESM Support”, which was voted on the list of top priorities for the OpenTelemetry ecosystem.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;So this isn’t a theoretical proposal. A growing number of well-known libraries have already shipped or merged PRs for Diagnostics Channel and Tracing Channel support.&lt;/p&gt;&lt;p&gt;On the framework and HTTP side, &lt;code&gt;undici&lt;/code&gt; (Node.js’s built-in HTTP client) has &lt;a href=&quot;https://undici-docs.vramana.dev/docs/api/DiagnosticsChannel&quot;&gt;shipped Diagnostics Channels&lt;/a&gt; since Node 20.12, and also &lt;code&gt;fastify&lt;/code&gt; (&lt;a href=&quot;https://fastify.dev/docs/latest/Reference/Hooks/#diagnostics-channel-hooks&quot;&gt;docs&lt;/a&gt;), &lt;code&gt;nitro&lt;/code&gt; (&lt;a href=&quot;https://github.com/nitrojs/nitro/pull/4001&quot;&gt;PR&lt;/a&gt;) and &lt;code&gt;h3&lt;/code&gt; (&lt;a href=&quot;https://github.com/h3js/h3/pull/1251&quot;&gt;PR&lt;/a&gt;) have native support. On the database side, &lt;code&gt;unstorage&lt;/code&gt; (&lt;a href=&quot;https://github.com/unjs/unstorage/pull/707&quot;&gt;PR&lt;/a&gt;) and &lt;code&gt;mysql2&lt;/code&gt; (&lt;a href=&quot;https://sidorares.github.io/node-mysql2/docs/documentation/tracing-channels&quot;&gt;Docs&lt;/a&gt;) already use Tracing Channels, and &lt;code&gt;pg&lt;/code&gt; / &lt;code&gt;pg-pool&lt;/code&gt; are actively working on it. Redis clients aren’t far behind either and already support Tracing Channels in &lt;code&gt;ioredis&lt;/code&gt; (&lt;a href=&quot;https://github.com/redis/ioredis/pull/2089&quot;&gt;PR&lt;/a&gt;) and &lt;code&gt;node-redis&lt;/code&gt; (&lt;a href=&quot;https://github.com/redis/node-redis/pull/3195&quot;&gt;PR&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;None of this happens without the people willing to do the work. A massive shoutout to Sentry engineer &lt;b&gt;Abdelrahman Awad&lt;/b&gt; (&lt;a href=&quot;https://github.com/logaretm&quot;&gt;@logaretm&lt;/a&gt;) for driving Tracing Channel implementations across multiple libraries. And a special thanks to &lt;b&gt;Pooya Parsa&lt;/b&gt; (&lt;a href=&quot;https://github.com/pi0&quot;&gt;@pi0&lt;/a&gt;), his openness to collaborate in &lt;code&gt;h3&lt;/code&gt; and &lt;code&gt;nitro&lt;/code&gt; was instrumental in formalizing this approach and showing the ecosystem what it could look like.&lt;/p&gt;&lt;h2&gt;The vision ahead&lt;/h2&gt;&lt;p&gt;We’re still in a “chicken and egg” phase. Libraries need to add channels before APM tools have strong reasons to listen to them, and APM tools need to start listening before authors feel the pressure to add them.&lt;/p&gt;&lt;p&gt;The goal is &lt;b&gt;universal JS observability&lt;/b&gt;: a world where Node.js, Bun, and Deno share the same diagnostic patterns, and instrumentation just works without monkey-patching in CJS, without &lt;code&gt;--import&lt;/code&gt; flags in ESM, and without fragile workarounds. Libraries become active drivers of observability ensuring they are emitting data they think is the most relevant to their users.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Sample AI traces at 100% without sampling everything]]></title><description><![CDATA[A little while ago, when agents were telling me “You’re absolutely right!”, I was building webvitals.com. You put in a URL, it kicks off an API request to a Nex...]]></description><link>https://blog.sentry.io/sample-ai-traces-at-100-percent-without-sampling-everything/</link><guid isPermaLink="false">https://blog.sentry.io/sample-ai-traces-at-100-percent-without-sampling-everything/</guid><pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A little while ago, when agents were telling me “You’re absolutely right!”, I was building &lt;a href=&quot;https://webvitals.com&quot;&gt;webvitals.com&lt;/a&gt;. You put in a URL, it kicks off an API request to a Next.js API route that invokes an agent with a few tools to scan it and provide AI generated suggestions to improve your… you guessed it… Web Vitals. Do we even care about these anymore?&lt;/p&gt;&lt;p&gt;I had the &lt;code&gt;traceSampleRate&lt;/code&gt; set to 100% in development, but in production, I sampled it down to 10% because… well that’s what our instrumentation recommends. Kyle wrote a great blog post explaining that “&lt;a href=&quot;https://blog.sentry.io/sampling-strategy-sentry/&quot;&gt;Watching everything is watching nothing&lt;/a&gt;”. But AI is non-deterministic. And when I was debugging an error from a tool call, I realized I was missing very important spans emitted from the Vercel AI SDK because of that sampling strategy.&lt;/p&gt;&lt;p&gt;An agent run with 7 tool calls doesn&amp;#39;t get partially sampled. You either capture the whole span tree or you lose it entirely. This is how head-based sampling works.&lt;/p&gt;&lt;p&gt;I was chasing ghosts.&lt;/p&gt;&lt;h2&gt;Agent runs are span trees, and sampling is all-or-nothing&lt;/h2&gt;&lt;p&gt;A typical agent execution looks like this in Sentry&amp;#39;s trace view:&lt;/p&gt;&lt;p&gt;That&amp;#39;s 11 spans in a single run. The sampling decision happens once, at the root: the &lt;code&gt;POST /api/chat&lt;/code&gt; HTTP transaction. Every child span inherits that decision. If the root is dropped, all 9 spans disappear.&lt;/p&gt;&lt;p&gt;This is fundamentally different from sampling HTTP requests, where dropping one &lt;code&gt;GET /api/users&lt;/code&gt; is no big deal because the next one is basically identical.&lt;/p&gt;&lt;p&gt;Agent runs are not identical. Each one makes different decisions, calls different tools, processes different data. An agent that hallucinated on run 67 might work perfectly on run 420. If your sample rate dropped 67, you&amp;#39;ll never know what went wrong.&lt;/p&gt;&lt;h2&gt;How head-based sampling actually works (and why it matters here)&lt;/h2&gt;&lt;p&gt;Both the Sentry JavaScript and Python SDKs use head-based sampling: the decision is made at the start of the trace, before any child spans exist.&lt;/p&gt;&lt;p&gt;In the JavaScript SDK, &lt;a href=&quot;https://github.com/getsentry/sentry-javascript/blob/develop/packages/opentelemetry/src/sampler.ts#L79&quot;&gt;&lt;code&gt;SentrySampler.shouldSample()&lt;/code&gt;&lt;/a&gt; is explicit about this:&lt;/p&gt;&lt;p&gt;Non-root spans don&amp;#39;t get a vote. If the root span was dropped, &lt;code&gt;tracesSampler&lt;/code&gt; is never called for any child, including your &lt;code&gt;gen_ai.request&lt;/code&gt; and &lt;code&gt;gen_ai.execute_tool&lt;/code&gt; spans. They inherit the parent&amp;#39;s fate.&lt;/p&gt;&lt;p&gt;In Python, the same logic lives in &lt;a href=&quot;https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/tracing.py#L1150&quot;&gt;&lt;code&gt;Transaction._set_initial_sampling_decision()&lt;/code&gt;&lt;/a&gt;. The &lt;code&gt;traces_sampler&lt;/code&gt; callback receives a &lt;code&gt;sampling_context&lt;/code&gt; dict with &lt;code&gt;transaction_context&lt;/code&gt; (containing &lt;code&gt;op&lt;/code&gt; and &lt;code&gt;name&lt;/code&gt;) and &lt;code&gt;parent_sampled&lt;/code&gt;. It only fires for root transactions.&lt;/p&gt;&lt;p&gt;This means head-based sampling doesn&amp;#39;t support &lt;b&gt;independently sampling gen_ai child spans at a different rate than their parent transaction.&lt;/b&gt; There&amp;#39;s no &amp;quot;sample 100% of LLM calls but 10% of HTTP requests.&amp;quot; If the HTTP request is dropped, the LLM calls inside it are dropped too.&lt;/p&gt;&lt;p&gt;I’d love to walk through a few different scenarios to show the difference in filtering approaches based on wether or not the root span is from an agent or the application.&lt;/p&gt;&lt;h2&gt;Scenario 1: The &lt;code&gt;gen_ai&lt;/code&gt; span IS the root&lt;/h2&gt;&lt;p&gt;Sometimes your agent run &lt;i&gt;is&lt;/i&gt; the root span. Maybe it’s a cron job thats running an agent, a queue consumer processing an AI task, or a CLI script. In these cases, &lt;code&gt;tracesSampler&lt;/code&gt; sees the &lt;code&gt;gen_ai.*&lt;/code&gt; operation directly and you can match on it:&lt;/p&gt;&lt;p&gt;&lt;b&gt;JavaScript:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Python:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;This is the easy case. The hard case is next.&lt;/p&gt;&lt;h2&gt;Scenario 2: The &lt;code&gt;gen_ai&lt;/code&gt; spans are children of an HTTP transaction&lt;/h2&gt;&lt;p&gt;This is the common case in web applications. A user hits &lt;code&gt;POST /api/chat&lt;/code&gt;, your framework creates an &lt;code&gt;http.server&lt;/code&gt; root span, and somewhere inside that request handler your agent runs. By the time the first &lt;code&gt;gen_ai.request&lt;/code&gt; span is created, the sampling decision was already made for the HTTP transaction.&lt;/p&gt;&lt;p&gt;The fix: &lt;b&gt;identify which routes trigger AI calls and sample those routes at 100%.&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;JavaScript:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Python:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Replace the route strings with whatever paths your AI features live on. If your entire app is AI-powered, skip the &lt;code&gt;tracesSampler&lt;/code&gt; and just set &lt;code&gt;tracesSampleRate: 1.0&lt;/code&gt;.&lt;/p&gt;&lt;h2&gt;The cost math: AI API bills dwarf observability costs&lt;/h2&gt;&lt;p&gt;The instinct to sample AI traces at a lower rate usually comes from cost concerns. Let&amp;#39;s look at the actual numbers.&lt;/p&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;What&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;Cost per event&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Claude Sonnet 4 input (1K tokens)&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;~$0.003&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Claude Sonnet 4 output (1K tokens)&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;~$0.015&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Gemini 2.5 Flash input (1K tokens)&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;~$0.00015&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Gemini 2.5 Flash output (1K tokens)&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;~$0.0006&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;A typical agent run (3 LLM calls, 2 tool calls)&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;$0.02-$0.15&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;b&gt;Sentry span events for that agent run (~9 spans)&lt;/b&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;b&gt;Fraction of a cent&lt;/b&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p&gt;The LLM calls themselves are 10-100x more expensive than the monitoring. You&amp;#39;re already paying for the AI call; dropping the observability span to save a fraction of a cent per call is like skipping the dashcam to save on gas.&lt;/p&gt;&lt;h2&gt;When 100% tracing isn&amp;#39;t feasible: Metrics and Logs as a safety net&lt;/h2&gt;&lt;p&gt;If you genuinely can&amp;#39;t sample AI routes at 100%, because of, say, massive scale or strict budget restraints, you can still capture the important signals from every AI call using Sentry &lt;a href=&quot;https://docs.sentry.io/platforms/python/metrics/&quot;&gt;Metrics&lt;/a&gt; and &lt;a href=&quot;https://docs.sentry.io/platforms/javascript/guides/node/logs/&quot;&gt;Logs&lt;/a&gt;. Both are independent of trace sampling.&lt;/p&gt;&lt;p&gt;&lt;b&gt;JavaScript - emit metrics on every LLM call:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Python - emit metrics on every LLM call:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;You can also log every call with structured attributes for searchability:&lt;/p&gt;&lt;p&gt;&lt;b&gt;JavaScript:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Python:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Here&amp;#39;s what each telemetry layer gives you:&lt;/p&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;Signal&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;Traces (sampled)&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;Metrics (100%)&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;Logs (100%)&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Full span tree with prompts/responses&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Token usage distributions (p50, p99)&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Partial&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Cost attribution by model/user&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Partial&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Error rates by model/endpoint&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Partial&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Latency distributions&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Partial&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Searchable per-call records&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;No&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Yes&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p&gt;&lt;b&gt;The recommended approach:&lt;/b&gt; Use &lt;code&gt;tracesSampler&lt;/code&gt; to capture 100% of AI-related routes. If that&amp;#39;s not possible, combine a lower trace rate with metrics and logs emitted on every call. Traces give you the debugging depth; metrics and logs give you the aggregate picture.&lt;/p&gt;&lt;p&gt;Once you&amp;#39;re emitting these metrics, you can build custom dashboards that go beyond what the &lt;a href=&quot;https://docs.sentry.io/ai/monitoring/agents/dashboards/&quot;&gt;pre-built AI Agents dashboard&lt;/a&gt; shows. The &lt;a href=&quot;https://cli.sentry.dev/&quot;&gt;Sentry CLI&lt;/a&gt; makes this scriptable:&lt;/p&gt;&lt;p&gt;The pre-built dashboard gives you per-model and per-tool aggregates. Custom dashboards answer the business questions: &lt;i&gt;who&amp;#39;s driving cost, which features justify their AI spend, and which conversations are spiraling.&lt;/i&gt;&lt;/p&gt;&lt;h2&gt;The full production config&lt;/h2&gt;&lt;p&gt;Here&amp;#39;s a complete setup that samples AI routes at 100%, everything else at your baseline, and emits metrics as a safety net:&lt;/p&gt;&lt;p&gt;&lt;b&gt;JavaScript:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Python:&lt;/b&gt;&lt;/p&gt;&lt;h2&gt;Quick reference&lt;/h2&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;Situation&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;th&gt;&lt;p&gt;&lt;b&gt;What to do&lt;/b&gt;&lt;/p&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;AI is the core product&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;code&gt;tracesSampleRate: 1.0 &lt;/code&gt;- sample everything&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;AI is one feature in a larger app&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;&lt;code&gt;tracesSampler&lt;/code&gt;&lt;/p&gt;&lt;p&gt; with AI routes at 1.0, baseline for the rest&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Can&amp;#39;t afford 100% on AI routes&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Lower trace rate + metrics/logs on every call&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Already using &lt;code&gt;tracesSampler&lt;/code&gt;&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;Add AI route matching to your existing logic&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;Sample rate is already 1.0&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;No change needed&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p&gt;The underlying principle: agent runs are high-value, low-volume (relative to HTTP traffic), and expensive to reproduce. Sample them accordingly.&lt;/p&gt;&lt;p&gt;If you&amp;#39;re just getting started with AI monitoring, check out our companion post on &lt;a href=&quot;https://blog.sentry.io/ai-agent-observability-developers-guide-to-agent-monitoring/&quot;&gt;the developer&amp;#39;s guide to AI agent monitoring&lt;/a&gt;, which covers the full setup across 10+ frameworks, the pre-built dashboards, and a real debugging walkthrough.&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;&lt;i&gt;For framework-specific setup, see our &lt;/i&gt;&lt;a href=&quot;https://docs.sentry.io/ai/monitoring/agents/&quot;&gt;&lt;i&gt;AI monitoring docs&lt;/i&gt;&lt;/a&gt;&lt;i&gt;. If you&amp;#39;re using an AI coding assistant, install the &lt;/i&gt;&lt;a href=&quot;https://cli.sentry.dev/agentic-usage/&quot;&gt;&lt;i&gt;Sentry CLI skill&lt;/i&gt;&lt;/a&gt;&lt;i&gt; (&lt;/i&gt;&lt;i&gt;&lt;code&gt;npx skills add &amp;lt;https://cli.sentry.dev&lt;/code&gt;&lt;/i&gt;&lt;i&gt;&amp;gt;) to configure your sampling, build custom dashboards, and investigate issues directly from your editor.&lt;/i&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[AI-driven caching strategies and instrumentation]]></title><description><![CDATA[The things that separate a minimum viable product (MVP) from a production-ready app are polish, final touches, and the Pareto 'last 20%' of work. Most bugs, edg...]]></description><link>https://blog.sentry.io/ai-driven-caching-strategies-instrumentation/</link><guid isPermaLink="false">https://blog.sentry.io/ai-driven-caching-strategies-instrumentation/</guid><pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The things that separate a minimum viable product (MVP) from a production-ready app are polish, final touches, and the Pareto &amp;#39;last 20%&amp;#39; of work. Most bugs, edge cases, and &lt;a href=&quot;https://sentry.io/solutions/application-performance-monitoring/&quot;&gt;performance issues&lt;/a&gt; won&amp;#39;t show up until after launch, when real users start hammering your application. If you&amp;#39;re reading this, you&amp;#39;re probably at the 80% mark, ready to tackle the rest.&lt;/p&gt;&lt;p&gt;This article covers application caching: how to use it for cutting tail latency, protecting databases, and handling traffic spikes, plus how to monitor it once it&amp;#39;s running in production.&lt;/p&gt;&lt;p&gt;This article is part of a series of common pain points when bringing an MVP to production:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href=&quot;https://blog.sentry.io/paginating-large-datasets-in-production-why-offset-fails-and-cursors-win/&quot;&gt;Paginating Large Datasets in Production: Why OFFSET Fails and Cursors Win&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;AI-driven caching strategies and instrumentation (this one)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Building a mental model for caching&lt;/h2&gt;&lt;p&gt;Good caching multiplies your performance, scalability, and cost efficiency. Done right, it gives you sub-millisecond responses and absorbs traffic spikes without crushing your origin servers. Done wrong (aggressive caching, bad invalidation, wrong strategies) it creates subtle bugs, stale data, and degraded user experience (UX) that&amp;#39;s hard to debug and usually only shows up after it&amp;#39;s already affected a lot of users.&lt;/p&gt;&lt;p&gt;Before looking for caching opportunities, you need a mental model for what should and shouldn&amp;#39;t be cached. Here&amp;#39;s a checklist:&lt;/p&gt;&lt;h3&gt;✅ Cache if most are true:&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Expensive&lt;/b&gt;: slow CPU, slow input/output (IO), heavy DB, big joins/aggregates, external application programming interface (API)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Frequent&lt;/b&gt;: called a lot (high requests per minute (RPM)) or sits on hot paths (page load, core API)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Reusable&lt;/b&gt;: same inputs repeat (low key cardinality)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Stable-ish&lt;/b&gt;: data doesn&amp;#39;t change every second (or can tolerate staleness)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Spiky load&lt;/b&gt;: bursty traffic where cache absorbs thundering herds&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Tail hurts&lt;/b&gt;: P95/P99 is bad, and misses correlate with slow requests&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Safe to serve stale&lt;/b&gt;: user impact low, or can use stale-while-revalidate (SWR)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Invalidation is easy&lt;/b&gt;: time to live (TTL) works, or updates have clear triggers&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Small-ish payload&lt;/b&gt;: memory cost reasonable, serialization cheap&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;❌ Don&amp;#39;t cache (or be very careful) if any are true:&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;High cardinality keys&lt;/b&gt;: per-user / per-page / per-filter explosion → mostly misses (pagination is a special case - see note below)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Highly mutable&lt;/b&gt;: correctness demands freshness&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Personalized / permissioned&lt;/b&gt;: easy to leak data via key mistakes&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Hard invalidation&lt;/b&gt;: no clear TTL, updates unpredictable&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Already fast&lt;/b&gt;: saving 5ms isn&amp;#39;t worth complexity&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Cache stampede risk&lt;/b&gt;: expensive recompute + synchronized expiry (needs locking / jitter)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;There&amp;#39;s a special rule for caching paginated endpoints - &lt;b&gt;cache page 1 + common filters first&lt;/b&gt;. Page 1 and a small set of common filters are usually hot and reused, so caching pays off. As page numbers increase, key cardinality explodes and reuse collapses, so deep pages will naturally miss and that&amp;#39;s fine. Optimize for protecting the backend and reducing tail latency on the entry points, not for achieving uniform hit rates across all pages.&lt;/p&gt;&lt;h2&gt;Finding caching opportunities in production&lt;/h2&gt;&lt;p&gt;Once you know what &lt;i&gt;should&lt;/i&gt; be cached, the next question is where caching will actually matter. In production systems, good caching candidates show up through pain, usually in three forms.&lt;/p&gt;&lt;h3&gt;Backend pain (start here)&lt;/h3&gt;&lt;p&gt;For &lt;a href=&quot;https://docs.sentry.io/product/insights/backend/&quot;&gt;backend and full-stack systems,&lt;/a&gt; this is the most actionable signal.&lt;/p&gt;&lt;p&gt;Look for:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Transactions with bad P95/P99&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Endpoints with heavy database (DB) time&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Repeated queries, joins, aggregates&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Fan-out (one request triggering many downstream calls)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Lock contention or connection pool pressure&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;These are places where caching immediately reduces real work.&lt;/p&gt;&lt;h3&gt;User pain (confirmation)&lt;/h3&gt;&lt;p&gt;Slow page loads, janky interactions, timeouts. &lt;a href=&quot;https://sentry.io/for/web-vitals/&quot;&gt;Web Vitals&lt;/a&gt; like Time to First Byte (&lt;a href=&quot;https://webvitals.com/ttfb&quot;&gt;TTFB&lt;/a&gt;), Largest Contentful Paint (&lt;a href=&quot;https://webvitals.com/lcp&quot;&gt;LCP&lt;/a&gt;), and Interaction to Next Paint (&lt;a href=&quot;https://webvitals.com/inp&quot;&gt;INP&lt;/a&gt;) help confirm that backend slowness is actually affecting users. They&amp;#39;re most useful once you already suspect a backend bottleneck.&lt;/p&gt;&lt;h3&gt;Cost pain (the long-term signal)&lt;/h3&gt;&lt;p&gt;Even if your users aren&amp;#39;t complaining yet, repetition is expensive:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;High DB read volume&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Paid external API calls&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Recomputed rollups and counts&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Cost often lags behind performance problems, but it&amp;#39;s a strong motivator once traffic grows.&lt;/p&gt;&lt;p&gt;A simple prioritization heuristic is &lt;b&gt;cost density&lt;/b&gt;:&lt;/p&gt;&lt;p&gt;requests per minute * time saved per request&lt;/p&gt;&lt;p&gt;An endpoint that&amp;#39;s moderately slow but hit consistently is usually a better caching target than a pathological endpoint nobody touches.&lt;/p&gt;&lt;h2&gt;Example: a slow paginated endpoint&lt;/h2&gt;&lt;p&gt;Consider a paginated endpoint performing a heavy database query with no caching.&lt;/p&gt;&lt;p&gt;In &lt;b&gt;Sentry &amp;gt; Insights &amp;gt; Backend&lt;/b&gt;, filtering by API transactions (above the table) surfaces this:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;The &lt;code&gt;GET /admin/order-items&lt;/code&gt; endpoint has potential for caching. Let&amp;#39;s dive into it. I&amp;#39;ll pick a slower event and inspect the &lt;a href=&quot;https://docs.sentry.io/concepts/key-terms/tracing/trace-view/&quot;&gt;trace view&lt;/a&gt;:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;From the screenshot, we can see:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;776ms total duration&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;731ms spent in a single DB span&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Multiple joins&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;LIMIT + OFFSET pagination&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Poor TTFB in Web Vitals&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Against the checklist:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;✅ Expensive (heavy db query, joins)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;✅ Frequent (high throughput)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;✅ Stable-ish (can tolerate brief staleness)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;✅ Tail hurts (bad P95)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;✅ Invalidation is easy (writes are controlled)&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;⚠️ High cardinality key (pagination)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;This is a strong candidate for &lt;b&gt;selective caching&lt;/b&gt;, not blanket caching.&lt;/p&gt;&lt;h2&gt;Applying and instrumenting caching&lt;/h2&gt;&lt;p&gt;Sentry comes with &lt;a href=&quot;https://docs.sentry.io/product/insights/backend/caches/&quot;&gt;Cache Monitoring&lt;/a&gt; too. It helps you see your caching hit/miss rates across your application, and inspect specific events captured in production when the cache was either hit or missed.&lt;/p&gt;&lt;p&gt;Instrumenting caches can be done both automatically and manually. If you&amp;#39;re using Redis, you can leverage the automatic instrumentation. If not, manual instrumentation is just as easy.&lt;/p&gt;&lt;p&gt;The most straightforward approach is to just ask Seer to do it for you. At the time of publishing this article, Seer&amp;#39;s &amp;quot;open-ended questions&amp;quot; feature is private access only, but I&amp;#39;ll give you a little sneak peek. You can access it with &lt;code&gt;Cmd + /&lt;/code&gt; and straight up ask it to instrument caches for you:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;Seer will then open a PR on your repo, so you can merge it and be done with it.&lt;/p&gt;&lt;p&gt;In case you don&amp;#39;t have access to this Seer feature yet, this is all you need to do to instrument your caches:&lt;/p&gt;&lt;p&gt;That&amp;#39;s it. All we need to do is wrap the &lt;code&gt;redis.get&lt;/code&gt; and &lt;code&gt;redis.setex&lt;/code&gt; with &lt;code&gt;Sentry.startSpan&lt;/code&gt; and provide caching-specific span attributes. If you&amp;#39;re not using JavaScript on your backend, you can simply rewrite these functions in your language of choice. As long as you&amp;#39;re sending spans that have the correct &lt;code&gt;op&lt;/code&gt; and &lt;code&gt;attributes&lt;/code&gt;, you&amp;#39;ll get caches instrumentation right.&lt;/p&gt;&lt;p&gt;Now we can just use these two functions:&lt;/p&gt;&lt;h2&gt;Monitoring and optimizing caches&lt;/h2&gt;&lt;p&gt;Once we deploy this we start seeing caches coming in:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;The data shows 75% Miss Rate on that endpoint. That&amp;#39;s neither good nor bad by itself. The goal is not to reach 0% miss rate. If you do, you&amp;#39;re probably hiding bugs. There is no &amp;quot;goal value&amp;quot; for the miss rate you should aim for. The miss rate % should only align with your expectations. 75% Miss Rate on this endpoint might make sense, but there also might be room for optimization. Let&amp;#39;s click into the transaction to see actual events:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;From the screenshot above we can see that cache hits happen only on page 1, and misses on the other pages. And that&amp;#39;s because we followed the caching paginated endpoint advice - only cache page 1 and common filters. Users were visiting multiple pages, but Page 1 accounted for 25% of the visit, hence the 75% Miss Rate. From the Transaction Duration column we can see that Page 1 loaded in under 40ms, while for other pages the users had to wait &amp;gt;700ms.&lt;/p&gt;&lt;p&gt;So our caching implementation is working, and users are experiencing faster page loads. From this point on we&amp;#39;ll know that for our &lt;code&gt;/admin/order-items&lt;/code&gt; endpoint the normal miss rate sits around 75%. If we introduce a bug later on, for example buggy cache keys (missing params, extra params), new filters or sorting, per-user or per-flag keys creeping in, accidentally including volatile data in keys (timestamps, request IDs, locale), or mess up the TTL, this number is going to shoot up, and we&amp;#39;ll see it in the chart. A spike in the chart will indicate to us that we broke caching and the users are experiencing slowdowns.&lt;/p&gt;&lt;h2&gt;AI-assisted cache expansion in production&lt;/h2&gt;&lt;p&gt;Remember the &amp;quot;cache only page 1 + common filters&amp;quot; rule? We&amp;#39;re going to bend it a little bit. If we want to bring down the 75% Miss Rate above, we&amp;#39;ll need to expand caching to cover more pages than just page 1, but we have to be careful not to over-expand because we&amp;#39;ll bloat our Redis instance.&lt;/p&gt;&lt;p&gt;Here&amp;#39;s a practical AI-assisted approach to help make a good cache expansion decision:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;You can use &lt;a href=&quot;https://mcp.sentry.dev/&quot;&gt;Sentry Model Context Protocol (MCP)&lt;/a&gt; to pull all the &lt;code&gt;cache.get&lt;/code&gt; spans from your project and group them by the &lt;code&gt;cache.key&lt;/code&gt; property, and then ask the agent to suggest how we can expand the caching. Looking at the screenshot, we can see that Page 1 remains with the most hits, but Pages 2 - 6 have significant traffic too. Long-tail pages like 7 and 10 have minimal traffic so no need to cache, and there&amp;#39;s also some test data that it discarded. It suggested me to expand the cache to Page 3. Let&amp;#39;s see how that affects the Miss Rate:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;Would you look at that! We&amp;#39;re now at 30% Miss Rate, down from 75%. This means roughly only 1 in 3 requests will hit the database. But it&amp;#39;s important to keep an eye on Redis memory as well. Pushing caching from Page 1 to Page 3 might bloat our Redis instance, and in that case caching won&amp;#39;t be worth it. Redis bloat = hot paths evictions, which in other words means we&amp;#39;d be undoing the performance gains we got from caching in the first place.&lt;/p&gt;&lt;h2&gt;Alerting on miss rate deviations&lt;/h2&gt;&lt;p&gt;The last thing you want to do is to set up an alert that notifies you (email, Slack) when there&amp;#39;s an anomaly in cache misses. Head to &lt;b&gt;Sentry &amp;gt; Issues &amp;gt; Alerts&lt;/b&gt;, pick &lt;b&gt;Performance Throughput&lt;/b&gt;, and &lt;a href=&quot;https://docs.sentry.io/product/alerts/&quot;&gt;create an alert&lt;/a&gt; with the following options:&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Make sure you pick your project and environment correctly&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;You&amp;#39;d want to filter on &lt;code&gt;cache.hit&lt;/code&gt; being &lt;code&gt;False&lt;/code&gt;, and on your &lt;code&gt;cache.key&lt;/code&gt; as well&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Set the thresholds to &lt;b&gt;Anomaly&lt;/b&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Start with &lt;b&gt;High&lt;/b&gt; level of responsiveness, and tune later&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;For &lt;b&gt;Direction of anomaly movement&lt;/b&gt; you&amp;#39;d want &lt;b&gt;Above bounds only&lt;/b&gt; so you only get notified on cache miss increases&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Lastly, define your action, whether you want an email to yourself, or your team, or a slack message in a specific channel&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Name it and hit &amp;quot;Save Rule&amp;quot;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;That&amp;#39;s it. Now if you accidentally break the caching mechanism, it&amp;#39;ll result in a flood of cache misses, and Sentry will pick it up and notify you about it. You&amp;#39;re free to filter as you want, and create as many alerts as you want. &lt;code&gt;cache.hit&lt;/code&gt; and &lt;code&gt;cache.key&lt;/code&gt; are not the only attributes you can filter on. Play with the filter bar to figure out all the ways you can filter on.&lt;/p&gt;&lt;h2&gt;Where to go from here&lt;/h2&gt;&lt;p&gt;At this point, caching is working. The endpoint is faster, the database is protected, and you have a baseline Miss Rate that reflects normal behaviour. From here on, the work is less about adding caching, and more about making sure it keeps doing what it&amp;#39;s supposed to do.&lt;/p&gt;&lt;p&gt;The first thing to watch is &lt;b&gt;Miss Rate deviations&lt;/b&gt;, not the absolute number. A stable line that suddenly jumps usually means something changed: a cache key bug, new filters or sorting, increased cardinality, or a TTL or invalidation mistake introduced during a deploy. Those changes tend to show up in cache metrics before users start complaining.&lt;/p&gt;&lt;p&gt;Next, always &lt;b&gt;read Miss Rate together with latency&lt;/b&gt;. A higher Miss Rate that doesn&amp;#39;t affect P95/P99 is usually harmless. A higher Miss Rate that brings the database spans back into the critical path is a regression worth acting on.&lt;/p&gt;&lt;p&gt;As you expand caching, &lt;b&gt;keep an eye on Redis memory and evictions&lt;/b&gt;. Improving hit rates by caching more pages only helps if hot keys stay resident. Memory pressure that causes frequent evictions can quietly undo your gains and make cache behaviour unpredictable.&lt;/p&gt;&lt;p&gt;Finally, &lt;b&gt;revisit cache boundaries as traffic evolves&lt;/b&gt;. Usage patterns change. What was a long-tail page last month may become hot after a product change or a new workflow. Cache strategies should evolve with real traffic, not stay frozen around initial assumptions.&lt;/p&gt;&lt;p&gt;If you treat cache metrics as guardrails (baseline Miss Rates, latency correlations, and post-deploy checks) caching becomes a stable part of your system instead of a fragile optimization you&amp;#39;re afraid to touch.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Not everything that breaks is an error: a Logs and Next.js story]]></title><description><![CDATA[Stack traces are great, but they only tell you what broke. They rarely tell you why. When an exception fires, you get a snapshot of the moment things went sidew...]]></description><link>https://blog.sentry.io/not-everything-that-breaks-is-an-error-a-logs-and-next-js-story/</link><guid isPermaLink="false">https://blog.sentry.io/not-everything-that-breaks-is-an-error-a-logs-and-next-js-story/</guid><pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Stack traces are great, but they only tell you &lt;i&gt;what&lt;/i&gt; broke. They rarely tell you &lt;i&gt;why&lt;/i&gt;. When an exception fires, you get a snapshot of the moment things went sideways, but the context leading up to that moment? Gone.&lt;/p&gt;&lt;p&gt;That&amp;#39;s where &lt;a href=&quot;https://sentry.io/product/logs/&quot;&gt;logs&lt;/a&gt; come in. A well-placed log can be the difference between hours of head-scratching and a five-minute fix. Let me show you what I mean with a real bug I encountered recently.&lt;/p&gt;&lt;h2&gt;Protecting an AI-powered Next.js endpoint from bots&lt;/h2&gt;&lt;p&gt;I&amp;#39;ve been working on &lt;a href=&quot;https://webvitals.com/&quot;&gt;WebVitals&lt;/a&gt;, a Next.js application powered by AI. You enter a domain, and it runs a series of tool calls to fetch performance data, then uses an AI agent to parse the results and give you actionable suggestions for improving your web vitals.&lt;/p&gt;&lt;p&gt;On the frontend, I&amp;#39;m using the AI SDK&amp;#39;s &lt;code&gt;useChat&lt;/code&gt; hook to handle the conversation:&lt;/p&gt;&lt;p&gt;The &lt;code&gt;/api/chat&lt;/code&gt; endpoint is a standard Next.js API route, which means anyone can hit it from anywhere. Since each request costs money (OpenAI isn&amp;#39;t free), I needed some protection against bots and malicious actors trying to spike my bill.&lt;/p&gt;&lt;p&gt;Vercel has a neat solution for this: bot protection via their &lt;code&gt;checkBotId&lt;/code&gt; function. It looks at the incoming request and determines if it&amp;#39;s coming from a bot. Simple, effective, and no CAPTCHAs asking users to identify crosswalks.&lt;/p&gt;&lt;h2&gt;A production bug that only affected Firefox and Safari&lt;/h2&gt;&lt;p&gt;Everything worked perfectly in local development. Deployed to production, tested in Chrome. Still perfect. Then I opened Firefox.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;&amp;quot;Access denied.&amp;quot; The same request that worked in Chrome was getting blocked in Firefox. Safari had the same issue.&lt;/p&gt;&lt;p&gt;I checked Sentry. The error was showing up across multiple browsers, but only Firefox and Safari were affected. Chrome users were fine.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;I tried fixing it. Multiple releases, multiple attempts. The error kept coming back. The stack trace wasn&amp;#39;t helpful, it just showed me that the bot check was returning &lt;code&gt;true&lt;/code&gt; for these browsers. But &lt;i&gt;why&lt;/i&gt; would Firefox and Safari be flagged as bots when Chrome wasn&amp;#39;t?&lt;/p&gt;&lt;p&gt;The stack trace couldn&amp;#39;t answer that question.&lt;/p&gt;&lt;h2&gt;Adding logs to capture the missing context&lt;/h2&gt;&lt;p&gt;This is the kind of problem where you need more context than an error alone can provide. I needed to see what data the &lt;code&gt;checkBotId&lt;/code&gt; function was working with when it made its decision.&lt;/p&gt;&lt;p&gt;So I added a log:&lt;/p&gt;&lt;p&gt;Nothing fancy. Just log the bot check result along with the user agent string that was passed to the function. Bot protection typically works by examining the user agent, so this seemed like the right data to capture.&lt;/p&gt;&lt;p&gt;The key here is that Sentry logs are high-cardinality. You can pass any attributes you want, and you&amp;#39;ll be able to search and filter by them later. No need to decide upfront which attributes are &amp;quot;important&amp;quot;. Just log what might be useful and let Sentry handle the rest.&lt;/p&gt;&lt;h2&gt;Using Sentry Logs to identify the root cause&lt;/h2&gt;&lt;p&gt;With logs in place, I headed over to Sentry&amp;#39;s Logs view and searched for my &amp;quot;Bot ID check result&amp;quot; messages. I added the &lt;code&gt;isBot&lt;/code&gt; attribute as a column so I could quickly scan the results. (In Sentry, boolean values show as 0 for false and 1 for true.)&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;I found a request that passed the bot check: &lt;code&gt;isBot: 0&lt;/code&gt;. Looking at the details, the user agent was exactly what you&amp;#39;d expect: a standard Chrome user agent string.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;Then I looked at a request that failed: &lt;code&gt;isBot: 1&lt;/code&gt;. The user agent was... not what I expected.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;Instead of the browser&amp;#39;s user agent, I was seeing &lt;code&gt;ai-sdk&lt;/code&gt;. The AI SDK was sending its own user agent string instead of the browser&amp;#39;s.&lt;/p&gt;&lt;p&gt;This explained everything. When the AI SDK makes requests to the backend, it uses its own user agent. Vercel&amp;#39;s bot protection sees &lt;code&gt;ai-sdk&lt;/code&gt; and thinks, reasonably, that it&amp;#39;s not a real browser. Bot detected. Access denied.&lt;/p&gt;&lt;p&gt;But why only Firefox and Safari? Because something in how those browsers (or my setup in those browsers) was causing the AI SDK&amp;#39;s user agent to be used instead of the browser&amp;#39;s. Chrome happened to pass through the correct user agent.&lt;/p&gt;&lt;p&gt;To confirm my hunch, I used Sentry&amp;#39;s trace connection feature. Everything in Sentry is linked by trace, so I could navigate from the log entry back to the full trace view and see the broader context of the request.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;Sure enough, the trace confirmed this was coming from Firefox. Mystery solved.&lt;/p&gt;&lt;h2&gt;Fixing the issue once the data told the story&lt;/h2&gt;&lt;p&gt;The solution was straightforward. In Vercel&amp;#39;s firewall settings, I added a rule to bypass bot protection for requests where the user agent contains &lt;code&gt;ai-sdk&lt;/code&gt;.&lt;/p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; loading=&quot;lazy&quot; /&gt;&lt;p&gt;Saved the rule, published the changes, and tried again in Firefox.&lt;/p&gt;&lt;p&gt;It worked. No more access denied errors. It’s also being tracked in a &lt;a href=&quot;https://github.com/vercel/ai/issues/9256&quot;&gt;Github issue&lt;/a&gt; on the AI SDK for those who are curious.&lt;/p&gt;&lt;h2&gt;What this bug clarified about logging and debugging&lt;/h2&gt;&lt;p&gt;This bug would have taken much longer to diagnose without logs. The error itself, &amp;quot;Access denied&amp;quot;, told me nothing about &lt;i&gt;why&lt;/i&gt; the request was being denied. The stack trace showed me &lt;i&gt;where&lt;/i&gt; it happened, but not the data that caused it.&lt;/p&gt;&lt;p&gt;A few takeaways:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Logs provide context that stack traces can&amp;#39;t.&lt;/b&gt; When you&amp;#39;re debugging, you often need to know what the data looked like at a specific point in time. Errors capture the moment of failure; logs capture the journey.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;High-cardinality attributes are powerful.&lt;/b&gt; Being able to search logs by any attribute: &lt;code&gt;isBot&lt;/code&gt;, &lt;code&gt;userAgent&lt;/code&gt;, makes it trivial to slice and dice your data. You don&amp;#39;t have to predict which attributes will be useful ahead of time.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&lt;b&gt;Trace connection ties everything together.&lt;/b&gt; Seeing a log in isolation is useful, but being able to jump from a log to the full trace (and vice versa) gives you the complete picture. In this case, it let me confirm that the AI SDK user agent was indeed coming from Firefox requests.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If you&amp;#39;re already using Sentry for &lt;a href=&quot;https://sentry.io/product/error-monitoring/&quot;&gt;error tracking&lt;/a&gt;, adding logs is a natural next step. For new projects, you can use the &lt;code&gt;Sentry.logger&lt;/code&gt; API directly. If you have existing logging with something like Pino, check out the &lt;a href=&quot;https://docs.sentry.io/platforms/javascript/guides/nextjs/logs/#integrations&quot;&gt;logging integrations&lt;/a&gt; to pipe those logs into Sentry automatically.&lt;/p&gt;&lt;p&gt;Head on over to our &lt;a href=&quot;https://docs.sentry.io/platforms/javascript/guides/nextjs/logs/&quot;&gt;Next.js Logs docs&lt;/a&gt; to learn more about how to send structured logs from your application to Sentry for debugging and observability. Or just check out our &lt;a href=&quot;https://sentry.io/quickstart/logs/?sdk=nextjs&quot;&gt;Logs quickstart guide&lt;/a&gt; and get up and running in no time.&lt;/p&gt;&lt;p&gt;Not everything that breaks throws an error. Sometimes you just need to see what was happening.&lt;/p&gt;</content:encoded></item></channel></rss>