<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://bytecodealliance.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://bytecodealliance.org/" rel="alternate" type="text/html" /><updated>2026-04-02T19:11:46+00:00</updated><id>https://bytecodealliance.org/feed.xml</id><title type="html">Bytecode Alliance</title><subtitle>Welcome to the Bytecode Alliance</subtitle><entry><title type="html">Our Next Plumbers Summit event - February 25 &amp;amp; 26, 2026</title><link href="https://bytecodealliance.org/articles/plumbers-summit-feb2026" rel="alternate" type="text/html" title="Our Next Plumbers Summit event - February 25 &amp;amp; 26, 2026" /><published>2026-02-11T00:00:00+00:00</published><updated>2026-02-11T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/plumbers-summit-feb2026</id><content type="html" xml:base="https://bytecodealliance.org/articles/plumbers-summit-feb2026"><![CDATA[<p>The Bytecode Alliance is pleased to invite you to the next installment in our ongoing Plumbers 
Summits event series, each designed to bring our members and community together to collectively contribute 
to the strategic planning for the upcoming year. Our next event will be held Wednesday and Thursday, 
February 25 and 26, 2026. This will be an all online event, supporting full remote participation for 
anyone anywhere, with sessions recorded so they can be watched anytime afterward via our
<a href="https://www.youtube.com/@bytecodealliance">YouTube channel</a>
<!--end_excerpt--></p>

<h2 id="agenda">Agenda</h2>
<p>Participants will include the maintainers of Bytecode Alliance hosted projects as well as the teams working on Component and WASI support in many popular programming languages including Rust, JavaScript, Go, C#, and C++. The event will feature a detailed overview of current plans and the technical roadmap for WebAssembly, talks and demos on the latest developments in hosted projects, an update on Wasm tooling, as well as interactive sessions where both platform developers and users can collaborate on the future of these projects. There’ll also be an opportunity for sharing brief lightning talks in areas of special interest to attendees.</p>

<h2 id="event-details">Event Details</h2>
<p><strong>Date:</strong> Wednesday, February 25 and Thursday, February 26, 2026<br />
<strong>Time:</strong> 8am to 11am US Pacific Time each day<br />
<strong>Location:</strong> Online via live video stream</p>

<p>More details on the event agenda are forthcoming. Please let us know if you have any questions about attending either in person or remotely, and look for further news via our event <a href="https://bytecodealliance.zulipchat.com/#narrow/channel/422786-Events/topic/Plumbers.20Summit.20-.20February.202026/with/573349202">stream</a> on the Bytecode Alliance <a href="https://bytecodealliance.zulipchat.com">Zulip server</a>.</p>]]></content><author><name>David Bryant</name></author><summary type="html"><![CDATA[The Bytecode Alliance is pleased to invite you to the next installment in our ongoing Plumbers Summits event series, each designed to bring our members and community together to collectively contribute to the strategic planning for the upcoming year. Our next event will be held Wednesday and Thursday, February 25 and 26, 2026. This will be an all online event, supporting full remote participation for anyone anywhere, with sessions recorded so they can be watched anytime afterward via our YouTube channel]]></summary></entry><entry><title type="html">10 Years of Wasm: A Retrospective</title><link href="https://bytecodealliance.org/articles/ten-years-of-webassembly-a-retrospective" rel="alternate" type="text/html" title="10 Years of Wasm: A Retrospective" /><published>2026-01-22T00:00:00+00:00</published><updated>2026-01-22T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/ten-years-of-webassembly-a-retrospective</id><content type="html" xml:base="https://bytecodealliance.org/articles/ten-years-of-webassembly-a-retrospective"><![CDATA[<p>In April of 2015, Luke Wagner made the first commits to a new repository called <code class="language-plaintext highlighter-rouge">WebAssembly/design</code>, adding a <a href="https://github.com/WebAssembly/design/commit/12ee148fb5cfa33331dbffadae06752b1759a7bf">high-level design document</a> for a “binary format to serve as a web compilation target.”</p>

<p>Four years later, in December 2019, the World Wide Web Consortium (W3C) <a href="https://www.w3.org/press-releases/2019/wasm/">officially embraced WebAssembly (Wasm) as the “fourth language of the web”</a>. Today, Wasm is used in web applications like <a href="https://web.dev/case-studies/earth-webassembly">Google Earth</a> and <a href="https://web.dev/articles/ps-on-the-web">Adobe Photoshop</a>, streaming video services like <a href="https://www.amazon.science/blog/how-prime-video-updates-its-app-for-more-than-8-000-device-types">Amazon Prime Video</a> and <a href="https://medium.com/disney-streaming/introducing-the-disney-application-development-kit-adk-ad85ca139073">Disney Plus</a>, and game engines like <a href="https://docs.unity3d.com/6000.0/Documentation/Manual/webassembly-2023.html">Unity</a>. Wasm runs on countless embedded devices, and cloud computing providers use Wasm to provide Functions-as-a-Service (FaaS) and serverless functionality.</p>

<p>The road to standardization (and understated ubiquity) would require the teams behind the major browsers to unite behind an approach to solve a problem that many had tried, and failed, to solve before.</p>

<p>For the ten-year anniversary of the project (well, ten and change), I spoke to many of the co-designers who were there at the beginning. They were generous enough to share their memories of the project’s origins, and what they imagine for the next ten years of WebAssembly.</p>

<h2 id="the-trusted-call-stack">The trusted call stack</h2>

<p>In March of 2013, a group of Mozilla engineers including Wagner, Alon Zakai, and Dave Herman released <a href="http://asmjs.org/">asm.js</a>. asm.js defined a subset of the existing JavaScript language that implicitly embeds enough static type information to allow a browser’s existing JS engine to achieve much better performance once the optimizer knew to look for it. “Super hacky,” in Wagner’s words.</p>

<p>Meanwhile, Google was developing <a href="https://developer.chrome.com/docs/native-client/">Native Client (NaCl)</a> and its successor  <a href="https://chrome.jscn.org/docs/native-client/nacl-and-pnacl/#portable-native-client-pnacl">Portable Native Client (PNaCl)</a>, to sandbox and run native code in Chrome.</p>

<p>JF Bastien was on the Chrome team at the time, where he helped finalize the Armv7 version of NaCl. According to Bastien, NaCl was a secure sandbox, but it wasn’t portable, and didn’t “fully respect the ethos of the web.”</p>

<p>PNaCl placed untrusted code in a separate process from the rest of a web page and its JavaScript—a reasonable choice for sandboxing, but one that made it difficult for JavaScript to call PNaCl code or vice versa. The connection between PNaCl and the rest of the browser relied on message passing, which required an entirely separate API surface for graphics, audio, and networking. It also forced asynchrony as a programming model, a huge lift to execute successfully. This created obstacles to broader adoption. If PNaCl was going to serve as the basis for a multi-browser approach to native code on the web, what standards body would govern the APIs?</p>

<p>asm.js took a different approach, which Dan Gohman—then at Mozilla—describes as a “trusted call stack,” where compiled code and JavaScript could share the same stack.</p>

<p>“That means asm.js was able to coexist with JavaScript,” he says. “You can call into JavaScript, and JavaScript can call into you.” This design decision—later inherited by WebAssembly—would prove foundational, enabling everything from seamless browser integration to calling functions across isolation boundaries in the Component Model.</p>

<h2 id="were-going-to-tell-each-others-managers-that-the-other-ones-on-board">“We’re going to tell each other’s managers that the other one’s on board.”</h2>

<p>By the end of 2013, developers were already compiling C++ games to asm.js using the <a href="https://emscripten.org/">Emscripten</a> toolchain, created by Alon Zakai. The Mozilla team was contemplating whether and how to integrate asm.js into the browser as a solution for running native code.</p>

<p>At Google, the V8 team used asm.js workloads as one of the benchmarks for a new optimizing compiler called TurboFan. Ben Titzer led the TurboFan effort. Like many of the Wasm co-designers, he was social with engineers at Mozilla, Apple, and Microsoft. Sometimes people moved from one company’s browser team to another, and working in the space led naturally to collaboration and more casual socialization. “Drinking and talking about the web,” as Bastien says.</p>

<p>One day, Titzer got into conversation with Wagner about the Mozilla team’s plans for asm.js.</p>

<p>“We’re talking to the Google folks,” Wagner says, “and they were like, ‘We hate this. It’s weird, it’s gross, it’s ad hoc—why would you even do this? If you want to do this, just do a real bytecode.’ And we said: well, we <em>could</em>, and this would be the polyfill of it.”</p>

<p>“[asm.js] fundamentally depended on things like array buffers being efficient,” Titzer recalls. “I remember having a conversation with Luke about resizable array buffers and whether they could be detached. He was basically trying to convince us that this is a good thing. And he had mentioned off-handedly that maybe Mozilla were thinking that asm.js wasn’t the right way to go. Maybe we should design a bytecode. And my ears perked up at that.”</p>

<p>Titzer and Wagner agreed to work together on the project, but now they needed to secure broader buy-in. Together, Wagner says, he and Titzer made a plan: “We’re going to tell each other’s managers that the other one’s on board.”</p>

<p>Titzer began building prototypes, and by late 2014, the V8 team’s interest gave the effort critical momentum. Soon the PNaCl team at Google signed on, and Bastien became one of the project’s key organizers.</p>

<h2 id="neither-web-nor-assembly">Neither web, nor assembly?</h2>

<p>When someone defines WebAssembly, odds are even that they’ll adapt the old joke about the Holy Roman Empire to say the technology is “neither web, nor assembly.” That is to say, it’s neither specific to the web nor <em>strictly</em> an assembly language, but rather a bytecode format targeting a virtual instruction set architecture.</p>

<p>So where, exactly, did the name come from?</p>

<p>“We wanted <em>asm</em> in it because of the asm.js heritage,” Wagner says, “and we wanted <em>web</em> because all the cool standards of the time had <em>web</em>, like WebGL, WebGPU…we wanted to be very clear that this is a a pro-web thing.”</p>

<p>The co-designers briefly considered “WebAsm” but (perhaps wisely) passed on that one. So “Asm” was spelled out into “Assembly.” WebAssembly.</p>

<p>Bastien recalls internal resistance to the name. “We know it’s going to be used outside the web,” he says. “We’re <em>designing</em> it to be used outside the web.” But no other suggestions were forthcoming, and “WebAssembly” stuck.</p>

<p>This writer has certainly used the “neither web nor assembly” line more than once, and so did several of the Wasm co-designers I interviewed, but Wagner gently pushes back on the characterization.</p>

<p>“Setting aside the asm.js path dependency, perhaps ‘bytecode’ or ‘intermediate language’ would’ve been a bit more accurate,” he says, “but when people say it’s not ‘web’ because it’s being used outside the web… well, what’s the definition of the web? Is it only things in browsers? The W3C paints a much broader picture of the <a href="https://www.w3.org/standards/">open web platform</a> that I think covers a lot more of the places where WebAssembly runs today and where we want it to run in the future.”</p>

<h2 id="ship-as-fast-as-you-humanly-can-before-this-whole-coalition-falls-apart">“Ship as fast as you humanly can before this whole coalition falls apart.”</h2>

<p>With the Chrome and Firefox teams on the same page, the co-designers turned to the teams at Apple and Microsoft.</p>

<p>Microsoft’s Chakra team, which powered the Edge browser’s JavaScript engine, had already implemented asm.js optimizations—Wagner had personally relicensed Mozilla source code to make adoption easier. After some “intense Q&amp;A” (in Wagner’s words), the Chakra team got on board. At Apple, JavaScriptCore team lead Fil Pizlo (the same Fil of <a href="https://fil-c.org/">Fil-C</a>) was instrumental in securing buy-in.</p>

<p>The four browser engines—Mozilla’s SpiderMonkey, Google’s V8, Microsoft’s Chakra, and Apple’s JavaScriptCore—would ship WebAssembly support within months of each other. Bastien, who chaired the WebAssembly Community Group during this period, helped set up the organizational structure and operating pace for the W3C standardization process. Before Wagner’s first public commit, the team hashed out the basic shape of the project in a shared Google Doc. Wagner then transcribed those agreements into public markdown files.</p>

<p>The formal announcement was coordinated: on June 17, 2015, all four browsers simultaneously released blogs linking to each other. Brendan Eich <a href="https://brendaneich.com/2015/06/from-asm-js-to-webassembly/">posted his own blog</a>, giving the project the imprimatur of JavaScript’s creator, and riffing on his trademark close to presentations:</p>

<blockquote>
  <p>I usually finish with a joke: “Always bet on JS”. I look forward to working “and wasm” into that line — no joke.</p>
</blockquote>

<p>As the project progressed, Lin Clark’s communication was instrumental in building community understanding, such as in Mozilla blogs like <a href="https://hacks.mozilla.org/2017/02/creating-and-working-with-webassembly-modules/">Creating and working with WebAssembly modules</a>.</p>

<p>For the group working on Wasm, the pressure to ship was intense. “Ship as fast as you humanly can before this whole coalition falls apart,” was the prevailing sentiment, according to Wagner. In retrospect, the urgency proved prescient. Had WebAssembly been delayed, the <a href="https://spectreattack.com/">Spectre vulnerability</a>—disclosed in early 2018—might have <a href="https://blog.mozilla.org/security/2018/01/03/mitigations-landing-new-class-timing-attack">complicated</a> the <a href="https://developer.chrome.com/blog/meltdown-spectre#high-resolution_timers">threading story</a> and handed ammunition to those who preferred PNaCl’s out-of-process isolation model. Firefox shipped first in March 2017, with Chrome following weeks later. By the end of the year, all four major browsers supported WebAssembly.</p>

<h2 id="treating-it-as-a-real-thing">Treating it as a real thing</h2>

<p>Wagner remembers discovering that Facebook had quietly integrated asm.js into their site to compress JPEGs before upload. “They didn’t tell us,” he says. “They just did it.”</p>

<p>As Wasm passed from idea to spec to reality, more and more organizations were getting interested. At the second in-person meeting between browser vendors, before the spec was anywhere near complete, a representative from Zynga showed up. Best known at the time for Farmville and other Facebook games, Zynga had built a billion-dollar business on Flash. With Flash on the way to deprecation, they were looking for an alternative.</p>

<p>Gaming had been part of the conversation from the beginning—early demos featured Unity’s “Angry Bots” running across multiple browsers. Now the growing interest of other web application teams was informing the development of the project. Adobe engineer Sean Parent provided crucial early feedback on the need for features like threads and robust compute capabilities, driven by the effort to bring Photoshop to the web.</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/hSeB9I_mK6A?si=qeryPFGP1EmUeOux" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<p>“I realized that not only was it Zynga, not only was it Unity, but also Adobe wants to ship Photoshop and Google Earth wants to ship a new version of Google Earth,” Titzer says. “I realized that there’s all these huge applications that want to come onto the platform, and they’re treating it as a real thing. That, I think, is the point where I realized this is actually going to be something that makes a huge difference.”</p>

<h2 id="name-an-api-in-the-world-and-its-in-scope">“Name an API in the world and it’s in scope.”</h2>

<p>asm.js helped answer the hardest questions for the WebAssembly MVP: What is the security model? What features are in scope? When someone proposed adding coroutines or stack switching, the response was simple, according to Gohman: “That’s not in asm.js. Out of scope. End of story.” While the team allowed “a few things that weren’t first-class in asm.js,” in Bastien’s words, asm.js served as a guiding light.</p>

<p>The next phase of WebAssembly’s evolution offered no such scaffold. With the core spec shipped and browsers onboard, attention turned to running WebAssembly outside the browser—on servers, at the edge, in embedded systems. This meant defining WASI, the <a href="https://wasi.dev/">WebAssembly System Interface</a>, and eventually the <a href="https://component-model.bytecodealliance.org/introduction.html">Component Model</a>. Together, these specifications could allow Wasm binaries to communicate with one another. These “Wasm components” would be able to securely interoperate regardless of the language they were written in.</p>

<p>The design space was suddenly vast.</p>

<p>“What surprised me most is how hard it was to figure out what to do for Wasm outside the browser, if not copy POSIX,” Wagner says. The Unix-style approach was tempting—just give WebAssembly modules access to files, sockets, and processes in the familiar way. But Wagner saw a trap. “If you just copy POSIX, you’re just going to have to reimplement containers but with Wasm on the inside,”  which is not an unambiguous win: it somewhat improves portability, but imposes execution overhead. And if it’s not a significant improvement, why do a bunch of work on it?</p>

<p>Gohman, who went on to lead much of the WASI design work, recalls the early days as intimidating. “You add WASI to the mix and it’s like, now we’re going to add APIs to everything,” he says. “Graphics, networking, input devices—everything you can do from a browser, but also now we’re in servers too. We’re going to talk about databases. Name an API in the world and it’s in scope for WASI.”</p>

<p>The challenge was compounded by the need to design cross-language APIs without the scaffold of existing standards. Should WASI present a C-style interface? A JavaScript-style one? An RPC protocol? The answer, eventually, was none of the above: the team developed <a href="https://component-model.bytecodealliance.org/design/wit.html">WebAssembly Interface Type (WIT)</a>, an interface definition language that can generate idiomatic bindings for any target language.</p>

<p>In March 2019, Mozilla <a href="https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/">announced WASI</a> and caught the attention of Solomon Hykes, creator of Docker. Hykes famously posted on Twitter:</p>

<p><img src="/images/wasi_tweet.png" alt="If WASM+WASI existed in 2008, we wouldn't have needed to create Docker. That's how important it is." /></p>

<p>The first iteration, now called WASI Preview 1, provided basic capabilities like file I/O and environment variables, but lacked networking and threading. Lin Clark continued to help communicate the vision for the project in blogs like <a href="https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/">Standardizing WASI: A system interface to run WebAssembly outside the web</a>.</p>

<p>Five years later, in January 2024, the WASI Subgroup <a href="https://bytecodealliance.org/articles/WASI-0.2">launched WASI 0.2</a>—also known as Preview 2—which incorporated the Component Model and expanded the available APIs.</p>

<p>WASI 0.3 is on the horizon in 2026, bringing native async and cooperative threads, with a 1.0 release set to follow.</p>

<h2 id="the-next-ten-years">The next ten years</h2>

<p>Titzer is now Associate Research Professor in the Software and Societal Systems Department at Carnegie Mellon, where he has turned his attention to embedded systems and artificial intelligence—two domains where WebAssembly’s core properties might prove transformative. He’s been working on projects integrating WebAssembly into industrial controllers and cyber-physical systems.</p>

<p>“[Industrial automation companies] have the mobile code problem,” he explains, referring to <a href="https://owasp.org/www-community/vulnerabilities/Unsafe_Mobile_Code">software that may be transmitted across a network and then executed on a remote machine</a>. “At its core, if Wasm solved any problem, it’s running untrusted mobile code.” The same principle applies to sandboxing AI-generated applications. “You’ve got AI generating code—who knows what it does? Do you trust this code? No.”</p>

<p>Bastien agrees. “AI coding agents are are pretty insecure right now, especially third-party plugins. Forget just injection, right? Like, I’m going to run a bunch of code I don’t trust. Wasm is a pretty interesting fit.”</p>

<p>Meanwhile, some of Wasm’s innovations gleaned outside the browser context may return home. Wagner sees the Component Model improving the quality of web developers’ experience compiling their language of choice to run in the browser, either mixed into their existing mostly-JS web app, or implementing the whole web app itself.</p>

<p>Today, WebAssembly runs in billions of users’ browsers, as well as edge networks, clouds, and embedded systems. The project has achieved standardization and understated ubiquity. It’s almost certainly running in one of your most commonly used apps, on one of your everyday devices, right now. What and where could Wasm be in ten years? The fundamentals of the architecture, going all the way back to asm.js, stuck a toe in the door of a vast possibility space.</p>

<p>In Gohman’s view, WebAssembly represents “one of the few chances that the computing industry has at actually building an execution environment that’s truly cloud native. Wasm combines an architecture which differs from what traditional operating systems are designed around, starting with the trusted call stack, and broad relevance, starting with the Web.” It will take persistence, but for perhaps the first time in fifty years, he says, there’s a chance to innovate at the boundary between kernel and user space.</p>

<p>“It’s gonna be a long road,” he says. “We’re going to build a lot of cool stuff. We’re going to have a lot of fun.”</p>]]></content><author><name>Eric Gregory</name></author><summary type="html"><![CDATA[In April of 2015, Luke Wagner made the first commits to a new repository called WebAssembly/design, adding a high-level design document for a “binary format to serve as a web compilation target.”]]></summary></entry><entry><title type="html">A Function Inliner for Wasmtime and Cranelift</title><link href="https://bytecodealliance.org/articles/inliner" rel="alternate" type="text/html" title="A Function Inliner for Wasmtime and Cranelift" /><published>2025-11-19T00:00:00+00:00</published><updated>2025-11-19T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/inliner</id><content type="html" xml:base="https://bytecodealliance.org/articles/inliner"><![CDATA[<p>Function inlining is one of the most important compiler optimizations, not
because of its direct effects, but because of the follow-up optimizations it
unlocks. It may reveal, for example, that an otherwise-unknown function
parameter value is bound to a constant argument, which makes a conditional
branch unconditional, which in turn exposes that the function will always return
the same value. Inlining is the catalyst of modern compiler optimization.</p>

<!--end_excerpt-->

<blockquote>
  <p><em>Note: This is cross-posted from <a href="https://fitzgen.com/2025/11/19/inliner.html">my personal
blog</a>.</em></p>
</blockquote>

<p><a href="https://wasmtime.dev/">Wasmtime</a> is a WebAssembly runtime that focuses on safety and fast Wasm
execution. But despite that focus on speed, Wasmtime has historically chosen not
to perform inlining in its optimizing compiler backend, <a href="https://cranelift.dev/">Cranelift</a>. There were
two reasons for this surprising decision: first, Cranelift is a per-function
compiler designed such that Wasmtime can compile all of a Wasm module’s
functions in parallel. Inlining is inter-procedural and requires synchronization
between function compilations; that synchronization reduces parallelism. Second,
Wasm modules are generally produced by an optimizing toolchain, like LLVM, that
already did all the beneficial inlining. Any calls remaining in the module will
not benefit from inlining — perhaps they are on slow paths marked
<code class="language-plaintext highlighter-rouge">[[unlikely]]</code> or the callee is annotated with <code class="language-plaintext highlighter-rouge">#[inline(never)]</code>. But
WebAssembly’s <a href="https://github.com/WebAssembly/component-model/">component model</a> changes this calculus.</p>

<p>With the component model, developers can <a href="https://github.com/WebAssembly/component-model/blob/main/design/mvp/Linking.md">compose</a> multiple Wasm modules —
each produced by different toolchains — into a single program. Those
toolchains only had a local view of the call graph, limited to their own module,
and they couldn’t see cross-module or <a href="https://github.com/bytecodealliance/wasmtime/blob/b900d7460e03e2b7b5e87211900ef5a2691b41de/crates/environ/src/component/translate/adapt.rs#L1-L24">fused adapter</a> function definitions. None
of them, therefore, had an opportunity to inline calls to such functions. Only
the Wasm runtime’s compiler, which has the final, complete call graph and
function definitions in hand, has that opportunity.</p>

<p>Therefore we implemented function inlining in Wasmtime and Cranelift. Its
initial implementation landed in Wasmtime version 36, however, it remains
off-by-default and is still baking. You can test it out via the <code class="language-plaintext highlighter-rouge">-C inlining=y</code>
command-line flag or the
<a href="https://docs.rs/wasmtime/36.0.0/wasmtime/struct.Config.html#method.compiler_inlining"><code class="language-plaintext highlighter-rouge">wasmtime::Config::compiler_inlining</code></a> method. The rest of
this article describes function inlining in more detail, digs into the guts of
our implementation and rationale for its design choices, and finally looks at
some early performance results.</p>

<h2 id="function-inlining">Function Inlining</h2>

<p>Function inlining is a compiler optimization where a call to a function <code class="language-plaintext highlighter-rouge">f</code> is
replaced by a copy of <code class="language-plaintext highlighter-rouge">f</code>’s body. This removes function call overheads (spilling
caller-save registers, setting up the call frame, etc…) which can be
beneficial on its own. But inlining’s main benefits are indirect: it enables
subsequent optimization of <code class="language-plaintext highlighter-rouge">f</code>’s body in the context of the call site. That
context is important — a parameter’s previously unknown value might be
bound to a constant argument and exposing that to the optimizer might cascade
into a large code clean up.</p>

<p>Consider the following example, where function <code class="language-plaintext highlighter-rouge">g</code> calls function <code class="language-plaintext highlighter-rouge">f</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">bool</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">x</span> <span class="o">&lt;</span> <span class="nn">u32</span><span class="p">::</span><span class="n">MAX</span> <span class="o">/</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">g</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>
    <span class="k">if</span> <span class="nf">f</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">a</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>After inlining the call to <code class="language-plaintext highlighter-rouge">f</code>, function <code class="language-plaintext highlighter-rouge">g</code> looks something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">g</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>

    <span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">f_result</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&lt;</span> <span class="nn">u32</span><span class="p">::</span><span class="n">MAX</span> <span class="o">/</span> <span class="mi">2</span><span class="p">;</span>

    <span class="k">if</span> <span class="n">f_result</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">a</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now the whole subexpression that defines <code class="language-plaintext highlighter-rouge">f_result</code> only depends on constant
values, so the optimizer can replace that subexpression with its known value:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">g</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>

    <span class="k">let</span> <span class="n">f_result</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
    <span class="k">if</span> <span class="n">f_result</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">a</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This reveals that the <code class="language-plaintext highlighter-rouge">if</code>-<code class="language-plaintext highlighter-rouge">else</code> conditional will, in fact, unconditionally
transfer control to the consequent, and <code class="language-plaintext highlighter-rouge">g</code> can be simplified into the
following:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">g</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In isolation, inlining <code class="language-plaintext highlighter-rouge">f</code> was a marginal transformation. When considered
holistically, however, it unlocked a plethora of subsequent simplifications that
ultimately led to <code class="language-plaintext highlighter-rouge">g</code> returning a constant value rather than computing anything
at run-time.</p>

<h2 id="implementation">Implementation</h2>

<p>Cranelift’s unit of compilation is a single function, which Wasmtime leverages
to compile each function in a Wasm module in parallel, speeding up compile times
on multi-core systems. But inlining a function at a particular call site
requires that function’s definition, which implies parallelism-hurting
synchronization or some other compromise, like additional read-only copies of
function bodies. So this was the first goal of our implementation: to preserve
as much parallelism as possible.</p>

<p>Additionally, although Cranelift is primarily developed for Wasmtime by
Wasmtime’s developers, it is independent from Wasmtime. It is a reusable library
and is reused, for example, by the Rust project as <a href="https://github.com/rust-lang/rustc_codegen_cranelift">an alternative backend for
<code class="language-plaintext highlighter-rouge">rustc</code></a>. But a large part of inlining, in practice, are the heuristics
for deciding when inlining a call is likely beneficial, and those heuristics can
be domain specific. Wasmtime generally wants to leave most calls out-of-line,
inlining only cross-module calls, while <code class="language-plaintext highlighter-rouge">rustc</code> wants something much more
aggressive to boil away its <a href="https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html"><code class="language-plaintext highlighter-rouge">Iterator</code></a> combinators and the like. So our second
implementation goal was to separate how we inline a function call from the
decision of whether to inline that call.</p>

<p>These goals led us to a layered design where Cranelift has an optional inlining
pass, but the Cranelift embedder (e.g. Wasmtime) must provide a callback to
it. The inlining pass invokes the callback for each call site, the callback
returns a command of either “leave the call as-is” or “here is a function body,
replace the call with it”. Cranelift is responsible for the inlining
transformation and the embedder is responsible for deciding whether to inline a
function call and, if so, getting that function’s body (along with whatever
synchronization that requires).</p>

<p>The <a href="https://github.com/bytecodealliance/wasmtime/blob/7fb8b55a8d3003a926753f4c3fcd676c813ebf98/cranelift/codegen/src/inline.rs#L332">mechanics</a> of the inlining transformation — wiring arguments to
parameters, renaming values, and copying instructions and basic blocks into the
caller — are, well, mechanical. Cranelift makes extensive uses of arenas
for various entities in its <abbr title="intermediate representation">IR</abbr>,
and we begin by appending the callee’s arenas to the caller’s arenas, renaming
entity references from the callee’s arena indices to their new indices in the
caller’s arenas as we do so. Next we copy the callee’s block layout into the
caller and replace the original <code class="language-plaintext highlighter-rouge">call</code> instruction with a <code class="language-plaintext highlighter-rouge">jump</code> to the caller’s
inlined version of the callee’s entry block. Cranelift uses block parameters,
rather than phi nodes, so the call arguments simply become <code class="language-plaintext highlighter-rouge">jump</code>
arguments. Finally, we translate each instruction from the callee into the
caller. This is done via a pre-order traversal to ensure that we process value
definitions before value uses, simplifying instruction operand rewriting. The
changes to Wasmtime’s compilation orchestration are more interesting.</p>

<p>The following pseudocode describes Wasmtime’s compilation orchestration before
Cranelift gained an inlining pass and also when inlining is disabled:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Compile each function in parallel.</span>
<span class="k">let</span> <span class="n">objects</span> <span class="o">=</span> <span class="n">parallel</span> <span class="n">map</span> <span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">wasm</span><span class="py">.functions</span> <span class="p">{</span>
    <span class="nf">compile</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
<span class="p">};</span>

<span class="c1">// Combine the functions into one region of executable memory, resolving</span>
<span class="c1">// relocations by mapping function references to PC-relative offsets.</span>
<span class="k">return</span> <span class="nf">link</span><span class="p">(</span><span class="n">objects</span><span class="p">)</span>
</code></pre></div></div>

<p>The naive way to update that process to use Cranelift’s inlining pass might look
something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Optionally perform some pre-inlining optimizations in parallel.</span>
<span class="n">parallel</span> <span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">wasm</span><span class="py">.functions</span> <span class="p">{</span>
    <span class="nf">pre_optimize</span><span class="p">(</span><span class="n">func</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Do inlining sequentially.</span>
<span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">wasm</span><span class="py">.functions</span> <span class="p">{</span>
    <span class="n">func</span><span class="nf">.inline</span><span class="p">(|</span><span class="n">f</span><span class="p">|</span> <span class="k">if</span> <span class="nf">should_inline</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
        <span class="nf">Some</span><span class="p">(</span><span class="n">wasm</span><span class="py">.functions</span><span class="p">[</span><span class="n">f</span><span class="p">])</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="nb">None</span>
    <span class="p">})</span>
<span class="p">}</span>

<span class="c1">// And then proceed as before.</span>
<span class="k">let</span> <span class="n">objects</span> <span class="o">=</span> <span class="n">parallel</span> <span class="n">map</span> <span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">wasm</span><span class="py">.functions</span> <span class="p">{</span>
    <span class="nf">compile</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
<span class="p">};</span>
<span class="k">return</span> <span class="nf">link</span><span class="p">(</span><span class="n">objects</span><span class="p">)</span>
</code></pre></div></div>

<p>Inlining is performed sequentially, rather than in parallel, which is a
bummer. But if we tried to make that loop parallel by logically running each
function’s inlining pass in its own thread, then a callee function we are
inlining might or might not have had its transitive function calls inlined
already depending on the whims of the scheduler. That leads to non-deterministic
output, and our compilation must be deterministic, so it’s a
non-starter.<sup id="fnref:determinism" role="doc-noteref"><a href="#fn:determinism" class="footnote" rel="footnote">1</a></sup> But whether a function has already had transitive
inlining done or not leads to another problem.</p>

<p>With this naive approach, we are either limited to one layer of inlining or else
potentially duplicating inlining effort, repeatedly inlining <code class="language-plaintext highlighter-rouge">e</code> into <code class="language-plaintext highlighter-rouge">f</code> each
time we inline <code class="language-plaintext highlighter-rouge">f</code> into <code class="language-plaintext highlighter-rouge">g</code>, <code class="language-plaintext highlighter-rouge">h</code>, and <code class="language-plaintext highlighter-rouge">i</code>. This is because <code class="language-plaintext highlighter-rouge">f</code> may come before
or after <code class="language-plaintext highlighter-rouge">g</code> in our <code class="language-plaintext highlighter-rouge">wasm.functions</code> list. We would prefer it if <code class="language-plaintext highlighter-rouge">f</code> already
contained <code class="language-plaintext highlighter-rouge">e</code> and was already optimized accordingly, so that every caller of <code class="language-plaintext highlighter-rouge">f</code>
didn’t have to redo that same work when inlining calls to <code class="language-plaintext highlighter-rouge">f</code>.</p>

<p>This suggests we should <a href="https://en.wikipedia.org/wiki/Topological_sorting">topologically sort</a> our functions based on their call
graph, so that we inline in a bottom-up manner, from leaf functions (those that
do not call any others) towards root functions (those that are not called by any
others, typically <code class="language-plaintext highlighter-rouge">main</code> and other top-level exported functions). Given a
topological sort, we know that whenever we are inlining <code class="language-plaintext highlighter-rouge">f</code> into <code class="language-plaintext highlighter-rouge">g</code> either (a)
<code class="language-plaintext highlighter-rouge">f</code> has already had its own inlining done or (b) <code class="language-plaintext highlighter-rouge">f</code> and <code class="language-plaintext highlighter-rouge">g</code> participate in a
cycle. Case (a) is ideal: we aren’t repeating any work because it’s already been
done. Case (b), when we find cycles, means that <code class="language-plaintext highlighter-rouge">f</code> and <code class="language-plaintext highlighter-rouge">g</code> are mutually
recursive. We cannot fully inline recursive calls in general (just as you cannot
fully unroll a loop in general) so we will simply avoid inlining these
calls.<sup id="fnref:chains" role="doc-noteref"><a href="#fn:chains" class="footnote" rel="footnote">2</a></sup> So topological sort avoids repeating work, but our inlining
phase is still sequential.</p>

<p>At the heart of our proposed topological sort is a call graph traversal that
visits callees before callers. To parallelize inlining, you could imagine that,
while traversing the call graph, we track how many still-uninlined callees each
caller function has. Then we batch all functions whose associated counts are
currently zero (i.e. they aren’t waiting on anything else to be inlined first)
into a layer and process them in parallel. Next, we decrement each of their
callers’ counts and collect the next layer of ready-to-go functions, continuing
until all functions have been processed.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">call_graph</span> <span class="o">=</span> <span class="nn">CallGraph</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">wasm</span><span class="py">.functions</span><span class="p">);</span>

<span class="k">let</span> <span class="n">counts</span> <span class="o">=</span> <span class="p">{</span> <span class="n">f</span><span class="p">:</span> <span class="n">call_graph</span><span class="nf">.num_callees_of</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">for</span> <span class="n">f</span> <span class="k">in</span> <span class="n">wasm</span><span class="py">.functions</span> <span class="p">};</span>

<span class="k">let</span> <span class="n">layer</span> <span class="o">=</span> <span class="p">[</span> <span class="n">f</span> <span class="k">for</span> <span class="n">f</span> <span class="k">in</span> <span class="n">wasm</span><span class="py">.functions</span> <span class="k">if</span> <span class="n">counts</span><span class="p">[</span><span class="n">f</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">];</span>
<span class="k">while</span> <span class="n">layer</span> <span class="n">is</span> <span class="n">not</span> <span class="n">empty</span> <span class="p">{</span>
    <span class="n">parallel</span> <span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">layer</span> <span class="p">{</span>
        <span class="n">func</span><span class="nf">.inline</span><span class="p">(</span><span class="o">...</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">let</span> <span class="n">next_layer</span> <span class="o">=</span> <span class="p">[];</span>
    <span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">layer</span> <span class="p">{</span>
        <span class="k">for</span> <span class="n">caller</span> <span class="k">in</span> <span class="n">call_graph</span><span class="nf">.callers_of</span><span class="p">(</span><span class="n">func</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">counts</span><span class="p">[</span><span class="n">caller</span><span class="p">]</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">counts</span><span class="p">[</span><span class="n">caller</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
                <span class="n">next_layer</span><span class="nf">.push</span><span class="p">(</span><span class="n">caller</span><span class="p">)</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">layer</span> <span class="o">=</span> <span class="n">next_layer</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This algorithm will leverage available parallelism, and it avoids repeating work
via the same dependency-based scheduling that topological sorting did, but it
has a flaw. It will not terminate when it encounters recursion cycles in the
call graph. If function <code class="language-plaintext highlighter-rouge">f</code> calls function <code class="language-plaintext highlighter-rouge">g</code> which also calls <code class="language-plaintext highlighter-rouge">f</code>, for
example, then it will not schedule either of them into a layer because they are
both waiting for the other to be processed first. One way we can avoid this
problem is by avoiding cycles.</p>

<p>If you partition a graph’s nodes into disjoint sets, where each set contains
every node reachable from every other node in that set, you get that graph’s
<a href="https://en.wikipedia.org/wiki/Strongly_connected_component"><em>strongly-connected components</em></a> (SCCs). If a node does not participate in a
cycle, then it will be in its own singleton SCC. The members of a cycle, on the
other hand, will all be grouped into the same SCC, since those nodes are all
reachable from each other.</p>

<p>In the following example, the dotted boxes designate the graph’s SCCs:</p>

<p><img src="/images/inliner-scc.svg" alt="" /></p>

<p>Ignoring edges between nodes within the same SCC, and only considering edges
across SCCs, gives us the graph’s <em>condensation</em>. The condensation is always
acyclic, because the original graph’s cycles are “hidden” within the SCCs.</p>

<p>Here is the condensation of the previous example:</p>

<p><img src="/images/inliner-condensation.svg" alt="" /></p>

<p>We can adapt our parallel-inlining algorithm to operate on strongly-connected
components, and now it will correctly terminate because we’ve removed all
cycles. First, we find the call graph’s SCCs and create the reverse (or
transpose) condensation, where an edge <code class="language-plaintext highlighter-rouge">a→b</code> is flipped to <code class="language-plaintext highlighter-rouge">b→a</code>. We do this
because we will query this graph to find the callers of a given function <code class="language-plaintext highlighter-rouge">f</code>,
not the functions that <code class="language-plaintext highlighter-rouge">f</code> calls. I am not aware of an existing name for the
reverse condensation, so, at Chris Fallin’s brilliant suggestion, I have decided
to call it an <em>evaporation</em>. From there, the algorithm largely remains as it was
before, although we keep track of counts and layers by SCC rather than by
function.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">call_graph</span> <span class="o">=</span> <span class="nn">CallGraph</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">wasm</span><span class="py">.functions</span><span class="p">);</span>
<span class="k">let</span> <span class="n">components</span> <span class="o">=</span> <span class="nn">StronglyConnectedComponents</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">call_graph</span><span class="p">);</span>
<span class="k">let</span> <span class="n">evaoporation</span> <span class="o">=</span> <span class="nn">Evaporation</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">components</span><span class="p">);</span>

<span class="k">let</span> <span class="n">counts</span> <span class="o">=</span> <span class="p">{</span> <span class="n">c</span><span class="p">:</span> <span class="n">evaporation</span><span class="nf">.num_callees_of</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="k">in</span> <span class="n">components</span> <span class="p">};</span>

<span class="k">let</span> <span class="n">layer</span> <span class="o">=</span> <span class="p">[</span> <span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="k">in</span> <span class="n">components</span> <span class="k">if</span> <span class="n">counts</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">];</span>
<span class="k">while</span> <span class="n">layer</span> <span class="n">is</span> <span class="n">not</span> <span class="n">empty</span> <span class="p">{</span>
    <span class="n">parallel</span> <span class="k">for</span> <span class="n">func</span> <span class="k">in</span> <span class="n">scc</span> <span class="k">in</span> <span class="n">layer</span> <span class="p">{</span>
        <span class="n">func</span><span class="nf">.inline</span><span class="p">(</span><span class="o">...</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">let</span> <span class="n">next_layer</span> <span class="o">=</span> <span class="p">[];</span>
    <span class="k">for</span> <span class="n">scc</span> <span class="k">in</span> <span class="n">layer</span> <span class="p">{</span>
        <span class="k">for</span> <span class="n">caller_scc</span> <span class="k">in</span> <span class="n">evaporation</span><span class="nf">.callers_of</span><span class="p">(</span><span class="n">scc</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">counts</span><span class="p">[</span><span class="n">caller_scc</span><span class="p">]</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">counts</span><span class="p">[</span><span class="n">caller_scc</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
                <span class="n">next_layer</span><span class="nf">.push</span><span class="p">(</span><span class="n">caller_scc</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">layer</span> <span class="o">=</span> <span class="n">next_layer</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is the algorithm we <a href="https://github.com/bytecodealliance/wasmtime/blob/250b99229b34241bd08f8026c4c9720d9d5233ab/crates/wasmtime/src/compile.rs#L702">use</a> in Wasmtime, modulo minor tweaks here and there to
engineer some data structures and combine some loops. After parallel inlining,
the rest of the compiler pipeline continues in parallel for each function,
yielding unlinked machine code. Finally, we link all that together and resolve
relocations, same as we did previously.</p>

<p>Heuristics are the only implementation detail left to discuss, but there isn’t
much to say that hasn’t already been said. Wasmtime prefers not to inline calls
within the same Wasm module, while cross-module calls are a strong hint that we
should consider inlining. Beyond that, our heuristics are extremely naive at the
moment, and only consider the code sizes of the caller and callee
functions. There is a lot of room for improvement here, and we intend to make
those improvements on-demand as people start playing with the inliner. For
example, there are many things we don’t consider in our heuristics today, but
possibly should:</p>

<ul>
  <li>Hints from WebAssembly’s <a href="https://github.com/WebAssembly/compilation-hints/blob/main/proposals/compilation-hints/Overview.md">compilation-hints proposal</a></li>
  <li>The number of edges to a callee function in the call graph</li>
  <li>Whether any of a call’s arguments are constants</li>
  <li>Whether the call is inside a loop or a block marked as “cold”</li>
  <li>Etc…</li>
</ul>

<h2 id="some-initial-results">Some Initial Results</h2>

<p>The speed up you get (or don’t get) from enabling inlining is going to vary from
program to program. Here are a couple synthetic benchmarks.</p>

<p>First, let’s investigate the simplest case possible, a cross-module call of an
empty function in a loop:</p>

<div class="language-scheme highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">component</span>
  <span class="c1">;; Define one module, exporting an empty function `f`.</span>
  <span class="p">(</span><span class="nf">core</span> <span class="nv">module</span> <span class="nv">$M</span>
    <span class="p">(</span><span class="nf">func</span> <span class="p">(</span><span class="nf">export</span> <span class="s">"f"</span><span class="p">)</span>
      <span class="nv">nop</span>
    <span class="p">)</span>
  <span class="p">)</span>

  <span class="c1">;; Define another module, importing `f`, and exporting a function</span>
  <span class="c1">;; that calls `f` in a loop.</span>
  <span class="p">(</span><span class="nf">core</span> <span class="nv">module</span> <span class="nv">$N</span>
    <span class="p">(</span><span class="nf">import</span> <span class="s">"m"</span> <span class="s">"f"</span> <span class="p">(</span><span class="nf">func</span> <span class="nv">$f</span><span class="p">))</span>
    <span class="p">(</span><span class="nf">func</span> <span class="p">(</span><span class="nf">export</span> <span class="s">"g"</span><span class="p">)</span> <span class="p">(</span><span class="nf">param</span> <span class="nv">$counter</span> <span class="nv">i32</span><span class="p">)</span>
      <span class="p">(</span><span class="nf">loop</span> <span class="nv">$loop</span>
        <span class="c1">;; When counter is zero, return.</span>
        <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">i32</span><span class="o">.</span><span class="nv">eq</span> <span class="p">(</span><span class="nf">local</span><span class="o">.</span><span class="nv">get</span> <span class="nv">$counter</span><span class="p">)</span> <span class="p">(</span><span class="nf">i32</span><span class="o">.</span><span class="nv">const</span> <span class="mi">0</span><span class="p">))</span>
          <span class="p">(</span><span class="nf">then</span> <span class="p">(</span><span class="nf">return</span><span class="p">)))</span>
        <span class="c1">;; Do our cross-module call.</span>
        <span class="p">(</span><span class="nf">call</span> <span class="nv">$f</span><span class="p">)</span>
        <span class="c1">;; Decrement the counter and continue to the next iteration</span>
        <span class="c1">;; of the loop.</span>
        <span class="p">(</span><span class="nf">local</span><span class="o">.</span><span class="nv">set</span> <span class="nv">$counter</span> <span class="p">(</span><span class="nf">i32</span><span class="o">.</span><span class="nv">sub</span> <span class="p">(</span><span class="nf">local</span><span class="o">.</span><span class="nv">get</span> <span class="nv">$counter</span><span class="p">)</span>
                                     <span class="p">(</span><span class="nf">i32</span><span class="o">.</span><span class="nv">const</span> <span class="mi">1</span><span class="p">)))</span>
        <span class="p">(</span><span class="nf">br</span> <span class="nv">$loop</span><span class="p">))</span>
    <span class="p">)</span>
  <span class="p">)</span>

  <span class="c1">;; Instantiate and link our modules.</span>
  <span class="p">(</span><span class="nf">core</span> <span class="nv">instance</span> <span class="nv">$m</span> <span class="p">(</span><span class="nf">instantiate</span> <span class="nv">$M</span><span class="p">))</span>
  <span class="p">(</span><span class="nf">core</span> <span class="nv">instance</span> <span class="nv">$n</span> <span class="p">(</span><span class="nf">instantiate</span> <span class="nv">$N</span> <span class="p">(</span><span class="nf">with</span> <span class="s">"m"</span> <span class="p">(</span><span class="nf">instance</span> <span class="nv">$m</span><span class="p">))))</span>

  <span class="c1">;; Lift and export the looping function.</span>
  <span class="p">(</span><span class="nf">func</span> <span class="p">(</span><span class="nf">export</span> <span class="s">"g"</span><span class="p">)</span> <span class="p">(</span><span class="nf">param</span> <span class="s">"n"</span> <span class="nv">u32</span><span class="p">)</span>
    <span class="p">(</span><span class="nf">canon</span> <span class="nv">lift</span> <span class="p">(</span><span class="nf">core</span> <span class="nv">func</span> <span class="nv">$n</span> <span class="s">"g"</span><span class="p">))</span>
  <span class="p">)</span>
<span class="p">)</span>
</code></pre></div></div>

<p>We can inspect the machine code that this compiles down to via the <code class="language-plaintext highlighter-rouge">wasmtime
compile</code> and <code class="language-plaintext highlighter-rouge">wasmtime objdump</code> commands. Let’s focus only on the looping
function. Without inlining, we see a loop around a call, as we would expect:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">00000020</span> <span class="nf">wasm</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nv">function</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
        <span class="c1">;; Function prologue.</span>
        <span class="err">20:</span> <span class="nf">pushq</span>   <span class="o">%</span><span class="nb">rbp</span>
        <span class="err">21:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span> <span class="o">%</span><span class="nb">rbp</span>

        <span class="c1">;; Check for stack overflow.</span>
        <span class="err">24:</span> <span class="nf">movq</span>    <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="nb">rdi</span><span class="p">),</span> <span class="o">%</span><span class="nv">r10</span>
        <span class="err">28:</span> <span class="nf">movq</span>    <span class="mh">0x10</span><span class="p">(</span><span class="o">%</span><span class="nv">r10</span><span class="p">),</span> <span class="o">%</span><span class="nv">r10</span>
        <span class="err">2</span><span class="nl">c:</span> <span class="nf">addq</span>    <span class="kc">$</span><span class="mh">0x30</span><span class="p">,</span> <span class="o">%</span><span class="nv">r10</span>
        <span class="err">30:</span> <span class="nf">cmpq</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span> <span class="o">%</span><span class="nv">r10</span>
        <span class="err">33:</span> <span class="nf">ja</span>      <span class="mh">0x89</span>

        <span class="c1">;; Allocate this function's stack frame, save callee-save</span>
        <span class="c1">;; registers, and shuffle some registers.</span>
        <span class="err">39:</span> <span class="nf">subq</span>    <span class="kc">$</span><span class="mh">0x20</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
        <span class="err">3</span><span class="nl">d:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rbx</span><span class="p">,</span> <span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">)</span>
        <span class="err">41:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nv">r14</span><span class="p">,</span> <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">)</span>
        <span class="err">46:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nv">r15</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">)</span>
        <span class="err">4</span><span class="nl">b:</span> <span class="nf">movq</span>    <span class="mh">0x40</span><span class="p">(</span><span class="o">%</span><span class="nb">rdi</span><span class="p">),</span> <span class="o">%</span><span class="nb">rbx</span>
        <span class="err">4</span><span class="nl">f:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rdi</span><span class="p">,</span> <span class="o">%</span><span class="nv">r15</span>
        <span class="err">52:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rdx</span><span class="p">,</span> <span class="o">%</span><span class="nv">r14</span>

        <span class="c1">;; Begin loop.</span>
        <span class="c1">;;</span>
        <span class="c1">;; Test our counter for zero and break out if so.</span>
        <span class="err">55:</span> <span class="nf">testl</span>   <span class="o">%</span><span class="nb">r14d</span><span class="p">,</span> <span class="o">%</span><span class="nb">r14d</span>
        <span class="err">58:</span> <span class="nf">je</span>      <span class="mh">0x72</span>
        <span class="c1">;; Do our cross-module call.</span>
        <span class="err">5</span><span class="nl">e:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nv">r15</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsi</span>
        <span class="err">61:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rbx</span><span class="p">,</span> <span class="o">%</span><span class="nb">rdi</span>
        <span class="err">64:</span> <span class="nf">callq</span>   <span class="mi">0</span>
        <span class="c1">;; Decrement our counter.</span>
        <span class="err">69:</span> <span class="nf">subl</span>    <span class="kc">$</span><span class="mi">1</span><span class="p">,</span> <span class="o">%</span><span class="nb">r14d</span>
        <span class="c1">;; Continue to the next iteration of the loop.</span>
        <span class="err">6</span><span class="nl">d:</span> <span class="nf">jmp</span>     <span class="mh">0x55</span>

        <span class="c1">;; Function epilogue: restore callee-save registers and</span>
        <span class="c1">;; deallocate this functions's stack frame.</span>
        <span class="err">72:</span> <span class="nf">movq</span>    <span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">),</span> <span class="o">%</span><span class="nb">rbx</span>
        <span class="err">76:</span> <span class="nf">movq</span>    <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">),</span> <span class="o">%</span><span class="nv">r14</span>
        <span class="err">7</span><span class="nl">b:</span> <span class="nf">movq</span>    <span class="mh">0x10</span><span class="p">(</span><span class="o">%</span><span class="nb">rsp</span><span class="p">),</span> <span class="o">%</span><span class="nv">r15</span>
        <span class="err">80:</span> <span class="nf">addq</span>    <span class="kc">$</span><span class="mh">0x20</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
        <span class="err">84:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rbp</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
        <span class="err">87:</span> <span class="nf">popq</span>    <span class="o">%</span><span class="nb">rbp</span>
        <span class="err">88:</span> <span class="nf">retq</span>

        <span class="c1">;; Out-of-line traps.</span>
        <span class="err">89:</span> <span class="nf">ud2</span>
            <span class="err">╰─╼</span> <span class="nl">trap:</span> <span class="nf">StackOverflow</span>
</code></pre></div></div>

<p>When we enable inlining, then <code class="language-plaintext highlighter-rouge">M::f</code> gets inlined into <code class="language-plaintext highlighter-rouge">N::g</code>. Despite <code class="language-plaintext highlighter-rouge">N::g</code>
becoming a leaf function, we will still <code class="language-plaintext highlighter-rouge">push %rbp</code> and all that in the prologue
and pop it in the epilogue, because Wasmtime always enables frame pointers. But
because it no longer needs to shuffle values into ABI argument registers or
allocate any stack space, it doesn’t need to do any explicit stack checks, and
nearly all the rest of the code also goes away. All that is left is a loop
decrementing a counter to zero:<sup id="fnref:remove-loops" role="doc-noteref"><a href="#fn:remove-loops" class="footnote" rel="footnote">3</a></sup></p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">00000020</span> <span class="nf">wasm</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nv">function</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
        <span class="c1">;; Function prologue.</span>
        <span class="err">20:</span> <span class="nf">pushq</span>   <span class="o">%</span><span class="nb">rbp</span>
        <span class="err">21:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span> <span class="o">%</span><span class="nb">rbp</span>

        <span class="c1">;; Loop.</span>
        <span class="err">24:</span> <span class="nf">testl</span>   <span class="o">%</span><span class="nb">edx</span><span class="p">,</span> <span class="o">%</span><span class="nb">edx</span>
        <span class="err">26:</span> <span class="nf">je</span>      <span class="mh">0x34</span>
        <span class="err">2</span><span class="nl">c:</span> <span class="nf">subl</span>    <span class="kc">$</span><span class="mi">1</span><span class="p">,</span> <span class="o">%</span><span class="nb">edx</span>
        <span class="err">2</span><span class="nl">f:</span> <span class="nf">jmp</span>     <span class="mh">0x24</span>

        <span class="c1">;; Function epilogue.</span>
        <span class="err">34:</span> <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rbp</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
        <span class="err">37:</span> <span class="nf">popq</span>    <span class="o">%</span><span class="nb">rbp</span>
        <span class="err">38:</span> <span class="nf">retq</span>
</code></pre></div></div>

<p>With this simplest of examples, we can just count the difference in number of
instructions in each loop body:</p>

<ul>
  <li>12 without inlining (7 in <code class="language-plaintext highlighter-rouge">N::g</code> and 5 in <code class="language-plaintext highlighter-rouge">M::f</code> which are 2 to push the
frame pointer, 2 to pop it, and 1 to return)</li>
  <li>4 with inlining</li>
</ul>

<p>But we might as well verify that the inlined version really is faster via some
quick-and-dirty benchmarking with <a href="https://github.com/sharkdp/hyperfine"><code class="language-plaintext highlighter-rouge">hyperfine</code></a>. This won’t measure <em>only</em> Wasm
execution time, it also measures spawning a whole Wasmtime process, loading code
from disk, etc…, but it will work for our purposes if we crank up the number
of iterations:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hyperfine \
    "wasmtime run --allow-precompiled -Cinlining=n --invoke 'g(100000000)' no-inline.cwasm" \
    "wasmtime run --allow-precompiled -Cinlining=y --invoke 'g(100000000)' yes-inline.cwasm"

Benchmark 1: wasmtime run --allow-precompiled -Cinlining=n --invoke 'g(100000000)' no-inline.cwasm
  Time (mean ± σ):     138.2 ms ±   9.6 ms    [User: 132.7 ms, System: 6.7 ms]
  Range (min … max):   128.7 ms … 167.7 ms    19 runs

Benchmark 2: wasmtime run --allow-precompiled -Cinlining=y --invoke 'g(100000000)' yes-inline.cwasm
  Time (mean ± σ):      37.5 ms ±   1.1 ms    [User: 33.0 ms, System: 5.8 ms]
  Range (min … max):    35.7 ms …  40.8 ms    77 runs

Summary
  'wasmtime run --allow-precompiled -Cinlining=y --invoke 'g(100000000)' yes-inline.cwasm' ran
    3.69 ± 0.28 times faster than 'wasmtime run --allow-precompiled -Cinlining=n --invoke 'g(100000000)' no-inline.cwasm'
</code></pre></div></div>

<p>Okay so if we measure Wasm doing almost nothing but empty function calls and
then we measure again after removing function call overhead, we get a big speed
up — it would be disappointing if we didn’t! But maybe we can benchmark
something a tiny bit more realistic.</p>

<p>A program that we commonly reach for when benchmarking is a <a href="https://github.com/bytecodealliance/sightglass/blob/f888169c8d9e274dd784822d3028897e5ec18385/benchmarks/pulldown-cmark/rust-benchmark/src/main.rs">small wrapper</a>
around <a href="https://github.com/pulldown-cmark/pulldown-cmark/">the <code class="language-plaintext highlighter-rouge">pulldown-cmark</code> markdown library</a> that parses the <a href="https://commonmark.org/">CommonMark</a>
specification (which is itself written in markdown) and renders that to
HTML. This is Real World™ code operating on Real World™ inputs that matches Real
World™ use cases people have for Wasm. That is, good benchmarking is incredibly
difficult, but this program is nonetheless a pretty good candidate for inclusion
in our corpus. There’s just one hiccup: in order for our inliner to activate
normally, we need a program using components and making cross-module calls, and
this program doesn’t do that. But we don’t have a good corpus of such benchmarks
yet because this kind of component composition is still relatively new, so let’s
keep using our <code class="language-plaintext highlighter-rouge">pulldown-cmark</code> program but measure our inliner’s effects via a
more circuitous route.</p>

<p>Wasmtime has tunables to enable the inlining of intra-module
calls<sup id="fnref:wasmtime-intra-module" role="doc-noteref"><a href="#fn:wasmtime-intra-module" class="footnote" rel="footnote">4</a></sup> and <code class="language-plaintext highlighter-rouge">rustc</code> and LLVM have tunables for disabling
inlining<sup id="fnref:rustc-disable-inlining" role="doc-noteref"><a href="#fn:rustc-disable-inlining" class="footnote" rel="footnote">5</a></sup>. Therefore we can roughly estimate the speed
ups our inliner might unlock on a similar, but extensively componentized and
cross-module calling, program by:</p>

<ul>
  <li>
    <p>Disabling inlining when compiling the Rust source code to Wasm</p>
  </li>
  <li>
    <p>Compiling the resulting Wasm binary to native code with Wasmtime twice: once
with inlining disabled, and once with intra-module call inlining enabled</p>
  </li>
  <li>
    <p>Comparing those two different compilations’ execution speeds</p>
  </li>
</ul>

<p>Running this experiment with <a href="https://github.com/bytecodealliance/sightglass">Sightglass</a>, our internal benchmarking
infrastructure and tooling, yields the following results:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>execution :: instructions-retired :: pulldown-cmark.wasm

  Δ = 7329995.35 ± 2.47 (confidence = 99%)

  with-inlining is 1.26x to 1.26x faster than without-inlining!

  [35729153 35729164.72 35729173] without-inlining
  [28399156 28399169.37 28399179] with-inlining
</code></pre></div></div>

<h2 id="conclusion">Conclusion</h2>

<p>Wasmtime and Cranelift now have a function inliner! Test it out via the <code class="language-plaintext highlighter-rouge">-C
inlining=y</code> command-line flag or via the
<a href="https://docs.rs/wasmtime/36.0.0/wasmtime/struct.Config.html#method.compiler_inlining"><code class="language-plaintext highlighter-rouge">wasmtime::Config::compiler_inlining</code></a> method. Let us know if
you run into any bugs or whether you see any speed-ups when running Wasm
components containing multiple core modules.</p>

<p>Thanks to <a href="https://cfallin.org/">Chris Fallin</a> and <a href="https://www.venge.net/graydon/">Graydon Hoare</a> for reading early drafts of this
piece and providing valuable feedback. Any errors that remain are my own.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:determinism" role="doc-endnote">
      <p>Deterministic compilation gives a number of benefits: testing is
easier, debugging is easier, builds can be byte-for-byte reproducible, it is
well-behaved in the face of incremental compilation and fine-grained
caching, etc… <a href="#fnref:determinism" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:chains" role="doc-endnote">
      <p>For what it is worth, this still allows collapsing chains of
mutually-recursive calls (<code class="language-plaintext highlighter-rouge">a</code> calls <code class="language-plaintext highlighter-rouge">b</code> calls <code class="language-plaintext highlighter-rouge">c</code> calls <code class="language-plaintext highlighter-rouge">a</code>) into a single,
self-recursive call (<code class="language-plaintext highlighter-rouge">abc</code> calls <code class="language-plaintext highlighter-rouge">abc</code>). Our actual implementation does not
do this in practice, preferring additional parallelism instead, but it could
in theory. <a href="#fnref:chains" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:remove-loops" role="doc-endnote">
      <p>Cranelift cannot currently remove loops without side effects,
and generally doesn’t mess with control-flow at all in its mid-end. We’ve
had various discussions about how we might best fit control-flow-y
optimizations into Cranelift’s mid-end architecture over the years, but it
also isn’t something that we’ve seen would be very beneficial for actual,
Real World™ Wasm programs, given that (a) LLVM has already done much of
this kind of thing when producing the Wasm, and (b) we do some
branch-folding when lowering from our mid-level IR to our machine-specific
IR. Maybe we will revisit this sometime in the future if it crops up more
often after inlining. <a href="#fnref:remove-loops" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:wasmtime-intra-module" role="doc-endnote">
      <p><code class="language-plaintext highlighter-rouge">-C cranelift-wasmtime-inlining-intra-module=yes</code> <a href="#fnref:wasmtime-intra-module" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:rustc-disable-inlining" role="doc-endnote">
      <p><code class="language-plaintext highlighter-rouge">-Cllvm-args=--inline-threshold=0</code>,
<code class="language-plaintext highlighter-rouge">-Cllvm-args=--inlinehint-threshold=0</code>, and <code class="language-plaintext highlighter-rouge">-Zinline-mir=no</code> <a href="#fnref:rustc-disable-inlining" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Nick Fitzgerald</name></author><summary type="html"><![CDATA[Function inlining is one of the most important compiler optimizations, not because of its direct effects, but because of the follow-up optimizations it unlocks. It may reveal, for example, that an otherwise-unknown function parameter value is bound to a constant argument, which makes a conditional branch unconditional, which in turn exposes that the function will always return the same value. Inlining is the catalyst of modern compiler optimization.]]></summary></entry><entry><title type="html">Exceptions in Cranelift and Wasmtime</title><link href="https://bytecodealliance.org/articles/wasmtime-exceptions" rel="alternate" type="text/html" title="Exceptions in Cranelift and Wasmtime" /><published>2025-11-06T00:00:00+00:00</published><updated>2025-11-06T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/wasmtime-exceptions</id><content type="html" xml:base="https://bytecodealliance.org/articles/wasmtime-exceptions"><![CDATA[<p>This is a blog post outlining the odyssey I recently took to implement
the <a href="https://github.com/webassembly/exception-handling">Wasm exception-handling
proposal</a> in
<a href="https://wasmtime.dev/">Wasmtime</a>, the open-source
<a href="https://webassembly.org/">WebAssembly</a> engine for which I’m a core
team member/maintainer, and its <a href="https://cranelift.dev/">Cranelift</a>
compiler backend.</p>

<!--end_excerpt-->

<p><em>Note: this is a cross-post with my <a href="https://cfallin.org/blog/">personal blog</a>;
this post is also available
<a href="https://cfallin.org/blog/2025/11/06/exceptions/">here</a>.</em></p>

<p>When first discussing this work, I made an off-the-cuff estimate in
the Wasmtime biweekly project meeting that it would be “maybe two
weeks on the compiler side and a week in Wasmtime”. Reader, I need to
make a confession now: I was wrong and it was <em>not</em> a three-week
task. This work spanned from late March to August of this year
(roughly half-time, to be fair; I wear many hats). Let that be a
lesson!<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<p>In this post we’ll first cover what exceptions are and why some
languages want them (and what other languages do instead) – in
particular what the big deal is about (so-called) “zero-cost”
exception handling. Then we’ll see how Wasm has specified a
bytecode-level foundation that serves as a least-common denominator
but also has some unique properties. We’ll then take a roundtrip
through what it means for a <em>compiler</em> to support exceptions – the
control-flow implications, how one reifies the communication with the
unwinder, how all this intersects with the ABI, etc. – before finally
looking at how Wasmtime puts it all together (and is careful to avoid
performance pitfalls and stay true to the intended performance of the
spec).</p>

<h2 id="why-exceptions">Why Exceptions?</h2>

<p>Many readers will already be familiar with exceptions as they are
present in languages as widely varied as
<a href="https://en.wikipedia.org/wiki/Python_(programming_language)">Python</a>,
<a href="https://en.wikipedia.org/wiki/Java_(programming_language)">Java</a>,
<a href="https://en.wikipedia.org/wiki/JavaScript">JavaScript</a>,
<a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a>,
<a href="https://en.wikipedia.org/wiki/Lisp_(programming_language)">Lisp</a>,
<a href="https://en.wikipedia.org/wiki/OCaml">OCaml</a>, and many more. But let’s
briefly review so we can (i) be precise what we mean by an exception,
and (ii) discuss <em>why</em> exceptions are so popular.</p>

<p><a href="https://en.wikipedia.org/wiki/Exception_handling_(programming)">Exception
handling</a>
is a mechanism for <em>nonlocal flow control</em>. In particular, most
flow-control constructs are <em>intraprocedural</em> (send control to other
code in the current function) and <em>lexical</em> (target a location that
can be known statically). For example, <code class="language-plaintext highlighter-rouge">if</code> statements and <code class="language-plaintext highlighter-rouge">loop</code>s
both work this way: they stay within the local function, and we know
exactly where they will transfer control. In contrast, exceptions are
(or can be) <em>interprocedural</em> (can transfer control to some point in
some other function) and <em>dynamic</em> (target a location that depends on
runtime state).<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<p>To unpack that a bit: an exception is <em>thrown</em> when we want to signal
an error or some other condition that requires “unwinding” the current
computation, i.e., backing out of the current context; and it is
<em>caught</em> by a “handler” that is interested in the particular kind of
exception and is currently “active” (waiting to catch that
exception). That handler can be in the current function, or in any
function that has called it. Thus, an exception throw and catch can
result in an abnormal, early return from a function.</p>

<p>One can understand the need for this mechanism by considering how
programs can handle errors. In some languages, such as Rust, it is
common to see function signatures of the form <code class="language-plaintext highlighter-rouge">fn foo(...) -&gt;
Result&lt;T, E&gt;</code>. The
<a href="https://doc.rust-lang.org/std/result/enum.Result.html"><code class="language-plaintext highlighter-rouge">Result</code></a> type
indicates that <code class="language-plaintext highlighter-rouge">foo</code> normally returns a value of type <code class="language-plaintext highlighter-rouge">T</code>, but may
produce an error of type <code class="language-plaintext highlighter-rouge">E</code> instead. The key to making this ergonomic
is providing some way to “short-circuit” execution if an error is
returned, propagating that error upward: that is, Rust’s <code class="language-plaintext highlighter-rouge">?</code> operator,
for example, which turns into essentially “if there was an error,
return that error from this function”.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> This is quite conceptually
nice in many ways: why should error handling be different than any
other data flow in the program? Let’s describe the type of results to
include the possibility of errors; and let’s use normal control flow
to handle them. So we can write code like</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">f</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">u32</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="k">if</span> <span class="n">bad</span> <span class="p">{</span>
    <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="o">...</span><span class="p">));</span>
  <span class="p">}</span>
  <span class="nf">Ok</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">g</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">u32</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="c1">// The `?` propagates any error to our caller, returning early.</span>
  <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="nf">f</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
  <span class="nf">Ok</span><span class="p">(</span><span class="n">result</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>and we don’t have to do anything special in <code class="language-plaintext highlighter-rouge">g</code> to propagate errors
from <code class="language-plaintext highlighter-rouge">f</code> further, other than use the <code class="language-plaintext highlighter-rouge">?</code> operator.</p>

<p>But there is a <em>cost</em> to this: it means that every error-producing
function has a larger return type, which might have ABI implications
(another return register at least, if not a stack-allocated
representation of the <code class="language-plaintext highlighter-rouge">Result</code> and the corresponding loads/stores to
memory), and also, there is at least one conditional branch after
every call to such a function that checks if we need to handle the
error. The dynamic efficiency of the “happy path” (with no thrown
exceptions) is thus impacted. Ideally, we skip any cost unless an
error actually occurs (and then perhaps we accept slightly more cost
in that case, as tradeoffs often go).</p>

<p>It turns out that this is possible with the <em>help of the language
runtime</em>. Consider what happens if we omit the <code class="language-plaintext highlighter-rouge">Result</code> return types
and error checks at each return. We will need to reach the code that
handles the error in some other way. Perhaps we can jump directly to
this code somehow?</p>

<p>The key idea of “zero-cost exception handling” is to get the compiler
to build side-tables to <em>tell us</em> where this code – known as a
“handler” – is. We can walk the callstack, visiting our caller and
its caller and onward, until we find a function that would be
interested in the error condition we are raising. This logic is
implemented with the help of these side-tables and some code in the
language runtime called the “unwinder” (because it “unwinds” the
stack). If no errors are raised, then none of this logic is executed
at runtime. And we no longer have our explicit checks for error
returns in the “happy path” where no errors occur. This is why the
common term for this style of error-handling is called “zero-cost”:
more precisely, it is zero-cost when <em>no</em> errors occur, but the
unwinding in case of error can still be expensive.</p>

<p>This is the status quo for exception-handling implementations in most
production languages: for example, in the C++ world, exception
handling is commonly implemented via the <a href="https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html">Itanium C++
ABI</a><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>, which
defines a comprehensive set of tables emitted by the compiler and a
complex dance between the system unwinding library and
compiler-generated code to find and transfer control to
handlers. Handler tables and stack unwinders are common in interpreted
and just-in-time (JIT)-compiled language implementations, too: for
example, SpiderMonkey has <a href="https://searchfox.org/firefox-main/rev/50a34d25155fd70628ee69c7d68a2509c0e3445d/js/src/vm/StencilEnums.h#18">try
notes</a>
on its bytecode (so named for “try blocks”) and a
<a href="https://searchfox.org/firefox-main/rev/a5316cedc669bcec09efae23521e0af6b9d3d257/js/src/jit/JitFrames.cpp#691">HandleException</a>
function that <a href="https://searchfox.org/firefox-main/rev/a5316cedc669bcec09efae23521e0af6b9d3d257/js/src/jit/JitFrames.cpp#751-845">walks stack
frames</a>
to find a handler.</p>

<h2 id="the-wasm-exception-handling-spec">The Wasm Exception-Handling Spec</h2>

<p>The WebAssembly specification now (since version 3.0) has <a href="https://github.com/webassembly/exception-handling">exception
handling</a>. This
proposal was a long time in the making by various folks in the
standards, toolchain and browser worlds, and the CG (standards group)
has now merged it into the spec and included it in the
recently-released “Wasm 3.0” milestone. If you’re already familiar
with the proposal, you can skip over this section to the Cranelift-
and Wasmtime-specific bits below.</p>

<p>First: let’s discuss <em>why</em> Wasm needs an extension to the bytecode
definition to support exceptions. As we described above, the key idea
of zero-cost exception handling is that an unwinder visits stack
frames and looks for handlers, transferring control directly to the
first handler it finds, outside the normal function return
path. Because the call stack is <em>protected</em>, or not directly readable
or writable from Wasm code (part of Wasm’s <a href="https://en.wikipedia.org/wiki/Control-flow_integrity">control-flow
integrity</a>
aspect), an unwinder that works this way necessarily must be a
privileged part of the Wasm runtime itself. We can’t implement it in
“userspace” because there is no way for Wasm bytecode to transfer
control directly back to a distant caller, aside from a chain of
returns. This missing functionality is what the extension to the
specification adds.</p>

<p>The implementation comes down to only three opcodes (!), and some new
types in the bytecode-level type system. (In other words – given the
length of this post – it’s deceptively simple.) These opcodes are:</p>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">try_table</code>, which wraps an inner body, and specifies <em>handlers</em> to
be active during that body. For example:</p>

    <pre><code class="language-wat">(block $b1    ;; defines a label for a forward edge to the end of this block
  (block $b2  ;; likewise, another label
    (try_table
      (catch $tag1 $b1) ;; exceptions with tag `$tag1` will be caught by code at $b1
      (catch_all $b2)   ;; all other exceptions will be caught by code at $b2

      body...)))
</code></pre>

    <p>In this example, if an exception is thrown from within the code in
<code class="language-plaintext highlighter-rouge">body</code>, and it matches one of the specified tags (more below!),
control will transfer to the location defined by the end of the
given block. (This is the same as other control-flow transfers in
Wasm: for example, a branch <code class="language-plaintext highlighter-rouge">br $b1</code> also jumps to the end of
<code class="language-plaintext highlighter-rouge">$b1</code>.)</p>

    <p>This construct is the single all-purpose “catch” mechanism, and is
powerful enough to directly translate typical <code class="language-plaintext highlighter-rouge">try</code>/<code class="language-plaintext highlighter-rouge">catch</code> blocks
in most programming languages with exceptions.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">throw</code>: an instruction to directly throw a new exception. It
carries the tag for the exception, like: <code class="language-plaintext highlighter-rouge">throw $tag1</code>.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">throw_ref</code>, used to rethrow an exception that has already been
caught and is held by reference (more below!).</p>
  </li>
</ul>

<p>And that’s it! We implement those three opcodes and we are “done”.</p>

<h3 id="payloads">Payloads</h3>

<p>That’s not the whole story, of course. Ordinarily a source language
will offer the ability to carry some <em>data</em> as part of an exception:
that is, the error condition is not just one of a static set of kinds
of errors, but contains some fields as well. (E.g.: not just “file not
found”, but “file not found: $PATH”.)</p>

<p>One could build this on top of an bytecode-level exception-throw
mechanism that only had throw/catch with static tags, with the help of
some global state, but that would be cumbersome; instead, the Wasm
specification offers <em>payloads</em> on each exception. For full
generality, this payload can actually take the form of a <em>list</em> of
values; i.e., it is a full product type (struct type).</p>

<p>We alluded to “tags” above but didn’t describe them in detail. These
tags are key to the payload definition: each tag is effectively a type
definition that specifies its list of payload value types as
well. (Technically, in the Wasm AST, a tag definition names a
<em>function type</em> with only parameters, no returns, which is a nice way
of reusing an existing entity/concept.) Now we show how they are
defined with a sample module:</p>

<pre><code class="language-wat">(module
 ;; Define a "tag", which serves to define the specific kind of exception
 ;; and specify its payload values.
 (tag $t (param i32 i64))

 (func $f (param i32 i64)
       ;; Throw an exception, to be caught by whatever handler is "closest"
       ;; dynamically.
       (throw $t (local.get 0) (local.get 1)))

 (func $g (result i32 i64)
       (block $b (result i32 i64)
              ;; Run a body below, with the given handlers (catch-clauses)
              ;; in-scope to catch any matching exceptions.
              ;;
              ;; Here, if an exception with tag `$t` is thrown within the body,
              ;; control is transferred to the end of block `$b` (as if we had
              ;; branched to it), with the payload values for that exception
              ;; pushed to the operand stack.
              (try_table (catch $t $b)
                         (call $f (i32.const 1) (i64.const 2)))
              (i32.const 3)
              (i64.const 4))))
</code></pre>

<p>Here we’ve defined one tag (the Wasm text format lets us attach a name
<code class="language-plaintext highlighter-rouge">$t</code>, but in the binary format it is only identified by its index, 0),
with two payload values. We can throw an exception with this tag given
values of these types (as in function <code class="language-plaintext highlighter-rouge">$f</code>) and we can catch it if we
specify a catch destination as the end of a block meant to return
exactly those types as well.  Here, if function <code class="language-plaintext highlighter-rouge">$g</code> is invoked, the
exception payload values <code class="language-plaintext highlighter-rouge">1</code> and <code class="language-plaintext highlighter-rouge">2</code> will be thrown with the
exception, which will be caught by the <code class="language-plaintext highlighter-rouge">try_table</code>; the results of
<code class="language-plaintext highlighter-rouge">$g</code> will be <code class="language-plaintext highlighter-rouge">1</code> and <code class="language-plaintext highlighter-rouge">2</code>. (The values <code class="language-plaintext highlighter-rouge">3</code> and <code class="language-plaintext highlighter-rouge">4</code> are present to allow
the Wasm module to validate, i.e. have correct types, but they are
dynamically unreachable because of the throw in <code class="language-plaintext highlighter-rouge">$f</code> and will not be
returned.)</p>

<p>This is an instance where Wasm, being a bytecode, can afford to
generalize a bit relative to real-metal ISAs and offer conveniences to
the Wasm producer (i.e., toolchain generating Wasm modules). In this
sense, it is a little more like a compiler IR. In contrast, most other
exception-throw ABIs have a fixed definition of payload, e.g., one or
two machine register-sized values. In practice some producers might
choose a small fixed signature for all exception tags anyway, but
there is no reason to impose such an artificial limit if there is a
compiler and runtime behind the Wasm in any case.</p>

<h3 id="unwind-cleanup-and-destructors">Unwind, Cleanup, and Destructors</h3>

<p>So far, we’ve seen how Wasm’s primitives can allow for basic exception
throws and catches, but what about languages with scoped resources,
e.g. C++ with its destructors? If one writes something like</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Scoped</span> <span class="p">{</span>
    <span class="n">Scoped</span><span class="p">()</span> <span class="p">{}</span>
    <span class="o">~</span><span class="n">Scoped</span><span class="p">()</span> <span class="p">{</span> <span class="n">cleanup</span><span class="p">();</span> <span class="p">}</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">Scoped</span> <span class="n">s</span><span class="p">();</span>
    <span class="k">throw</span> <span class="n">my_exception</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>then the <code class="language-plaintext highlighter-rouge">throw</code> should transfer control out of <code class="language-plaintext highlighter-rouge">f</code> and upward to
whatever handler matches, but the destructor of <code class="language-plaintext highlighter-rouge">s</code> still needs to run
and call <code class="language-plaintext highlighter-rouge">cleanup</code>. This is not quite a “catch” because we don’t want
to terminate the search: we aren’t actually handling the error
condition.</p>

<p>The usual approach to compile such a program is to “catch and
rethrow”. That is, the program is lowered to something like</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="p">{</span>
    <span class="k">throw</span> <span class="p">...</span>
<span class="p">}</span> <span class="n">catch_any</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">cleanup</span><span class="p">();</span>
    <span class="n">rethrow</span> <span class="n">e</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">catch_any</code> catches <em>any</em> exception propagating past this point
on the stack, and <code class="language-plaintext highlighter-rouge">rethrow</code> re-throws the same exception.</p>

<p>Wasm’s exception primitives provide exactly the pieces we need for
this: a <code class="language-plaintext highlighter-rouge">catch_all_ref</code> clause, which <em>catches all exceptions</em> and
<em>boxes the caught exception as a reference</em>; and a <code class="language-plaintext highlighter-rouge">throw_ref</code>
instruction, which <em>re-throws a previously-caught exception</em>.<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup></p>

<p>In actuality there is a two-by-two matrix of “catch” options: we can
<code class="language-plaintext highlighter-rouge">catch</code> a specific tag or <code class="language-plaintext highlighter-rouge">catch_all</code>; and we can catch and
immediately unpack the exception into its payload values (as we saw
above), or we can catch it as a reference. So we have <code class="language-plaintext highlighter-rouge">catch</code>,
<code class="language-plaintext highlighter-rouge">catch_ref</code>, <code class="language-plaintext highlighter-rouge">catch_all</code>, and <code class="language-plaintext highlighter-rouge">catch_all_ref</code>.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">6</a></sup></p>

<h3 id="dynamic-identity-and-compositionality">Dynamic Identity and Compositionality</h3>

<p>There is one final detail to the Wasm proposal, and in fact it’s the
part that I find the most interesting and unique. Given the above
introduction, and any familiarity with exception systems in other
language semantics and/or runtime systems, one might expect that the
“tags” identifying kinds of exceptions and matching throws with
particular catch handlers would be static labels. In other words, if I
throw an exception with tag <code class="language-plaintext highlighter-rouge">$tA</code>, then the first handler for <code class="language-plaintext highlighter-rouge">$tA</code>
anywhere up the stack, from any module, should catch it.</p>

<p>However, one of Wasm’s most significant properties as a bytecode is
its emphasis on isolation. It has a distinction between static
<em>modules</em> and dynamic <em>instances</em> of those modules, and modules have
no “static members”: every entity (e.g., memory, table, or global
variable) defined by a module is replicated per instance of that
module. This creates a clean separation between instances and means
that, for example, one can freely reuse a common module (say, some
kind of low-level glue or helper module) with separate instances in
many places without them somehow communicating or interfering with
each other.</p>

<p>Consider what happens if we have an instance A that invokes some other
(dynamically provided) function reference which ultimately invokes a
callback in A. Say that the instance throws an exception from within
its callback in order to unwind all the way to its outer stack frames,
across the intermediate functions in some other Wasm instance(s):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                A.f   ---------call---------&gt;   B.g   --------call---------&gt;    A.callback
                 ^                                                                  v
               catch $t                                                           throw $t
                 |                                                                  |
                 `----------------------------&lt;-------------------------------------'
</code></pre></div></div>

<p>The instance A expects that the exception that it throws from its
callback function to <code class="language-plaintext highlighter-rouge">f</code> is a <em>local</em> concern to that instance only,
and that B cannot interfere. After all, if the exception tag is
defined inside A, and Wasm preserves modularity, then B should not be
able to name that tag to catch exceptions by that tag, even if it also
uses exception handling internally. The two modules should not
interact: that is the meaning of modularity, and it permits us to
reason about each instance’s behavior locally, with the effects of
“the rest of the world” confined to imports and exports.</p>

<p>Unfortunately, if one designed a straightforward “static” tag-matching
scheme, this might not be the case if B were an instance of the same
module as A: in that case, if B also used a tag <code class="language-plaintext highlighter-rouge">$t</code> internally and
registered handlers for that tag, it could interfere with the desired
throw/catch behavior, and violate modularity.</p>

<p>So the Wasm exception handling standard specifies that tags have
<em>dynamic instances</em> as well, just as memories, tables and globals
do. (Put in programming-language theory terms, tags are <em>generative</em>.)
Each instance of a module creates its own dynamic identities for the
statically-defined tags in those modules, and uses those dynamic
identities to tag exceptions and find handlers. This means that no
matter what instance B is, above, if instance A does not export its
tag <code class="language-plaintext highlighter-rouge">$t</code> for B to import, there is no way for B to catch the thrown
exception explicitly (it can still catch <em>all</em> exceptions, and it may
do so and rethrow to perform some cleanup). Local modular reasoning is
restored.</p>

<p>Once we have tags as dynamic entities, just like Wasm memories, we can
take the same approach that we do for the other entities to allow them
to be imported and exported. Thus, visibility of exception payloads
and ability for modules to catch certain exceptions is completely
controlled by the instantiation graph and the import/export linking,
just as for all other Wasm storage.</p>

<p>This is surprising (or at least was to me)! It creates some pretty
unique implementation challenges in the unwinder – in essence, it
means that we need to know about instance identity for each stack
frame, not just static code location and handler list.</p>

<h2 id="compiling-exceptions-in-cranelift">Compiling Exceptions in Cranelift</h2>

<p>Before we implement the primitives for exception handling in Wasmtime,
we need to support exceptions in our underlying compiler backend,
Cranelift.</p>

<p>Why should this be a compiler concern? What is special about
exceptions that makes them different from, say, new Wasm instructions
that implement additional mathematical operators (when we already have
many arithmetic operators in the IR), or Wasm memories (when we
already have loads/stores in the IR)?</p>

<p>In brief, the complexities come in three flavors: new kinds of control
flow, fundamentally different than ordinary branches or calls in that
they are “externally actuated” (by the unwinder); a new facet of the
ABI (that we get to define!) that governs how the unwinder interacts
with compiled code; and interactions between the “scoped” nature of
handlers and inlining in particular. We’ll talk about each below.</p>

<p>Note that much of this discussion started with an
<a href="https://github.com/bytecodealliance/rfcs/pull/36">RFC</a> for
Wasmtime/Cranelift, which had been posted way back in August of 2024
by Daniel Hillerstrom with help from my colleague Nick Fitzgerald, and
was discussed then; many of the choices within were subsequently
refined as I discovered interesting nuances during implementation and
we talked them through.</p>

<h3 id="control-flow">Control Flow</h3>

<p>There are a few ways to think about exception handlers from the point
of view of compiler <a href="https://en.wikipedia.org/wiki/Intermediate_representation">IR (intermediate
representation)</a>.
First, let’s recognize that exception handling (i) is a form of
control flow, and (ii) has all the same implications various compiler
stages that other kinds of control flow do. For example, the register
allocator has to consider how to get registers into the right state
whenever control moves from one basic block to the next (“edge
moves”); exception catches are a new kind of edge, and so the regalloc
needs to be aware of that, too.</p>

<p>One could see every call or other opcode that could throw as having
regular control-flow edges to every possible handler that could
match. I’ll call this the “regular edges” approach. The upside is that
it’s pretty simple to retrofit: one “only” needs to add new kinds of
control-flow opcodes that have out-edges, but that’s already a kind of
thing that IRs have. The disadvantage is that, in functions with a lot
of possible throwing opcodes and/or handlers, the overhead can get
quite high. And control-flow graph overhead is a bad kind of overhead:
many analyses’ runtimes are heavily dependent the edge and node (basic
block) counts, sometimes superlinearly.</p>

<p>The other major option is to build a kind of <em>implicit</em> new control
flow into the IR’s semantics. For example, one could lower the
source-language semantics of a “try block” down to regions in the IR,
with one set of handlers attached.  This is clearly more efficient
than adding out-edges from (say) every callsite within the try-block
to every handler in scope. On the other hand, it’s hard to understate
how invasive this change would be. This means that <em>every</em> traversal
over IR, analyzing dataflow or reachability or any other property, has
to consider these new implicit edges anyway. In a large established
compiler like Cranelift, we can lean on Rust’s type system for a lot
of different kinds of refactors, but changing a fundamental invariant
goes beyond that: we would likely have a long tail of issues stemming
from such a change, and it would permanently increase the cognitive
overhead of making new changes to the compiler. In general we want to
trend toward a smaller, simpler core and compositional rather than
entangled complexity.</p>

<p>Thus, the choice is clear: in Cranelift we opted to introduce one new
instruction, <code class="language-plaintext highlighter-rouge">try_call</code>, that calls a function and catches (some)
exceptions.  In other words, there are now two possible kinds of
return paths: a normal return or (possibly one of many) exceptional
return(s). The handled exceptions and block targets are enumerated in
an <em>exception table</em>. Because there are control-flow edges stemming
from this opcode, it is a block terminator, like a conditional
branch. It looks something like (in Cranelift’s IR, CLIF):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>function %f0(i32) -&gt; i32, f32, f64 {
    sig0 = (i32) -&gt; f32 tail
    fn0 = %g(i32) -&gt; f32 tail

    block0(v1: i32):
        v2 = f64const 0x1.0
        ;; exception-catching callsite
        try_call fn0(v1), sig0, block1(ret0, v2), [ tag0: block2(exn0), default: block3(exn0) ]

    ;; normal return path
    block1(v3: f32, v4: f64):
        v5 = iconst.i32 1
        return v5, v3, v4

    ;; exception handler for tag0
    block2(v6: i64):
        v7 = ireduce.i32 v6
        v8 = iadd_imm.i32 v7, 1
        v9 = f32const 0x0.0        
        return v8, v9, v2

    ;; exception handler for all other exceptions
    block2(v10: i64):
        v11 = ireduce.i32 v10
        v12 = f32.const 0x0.0
        v13 = f64.const 0x0.0
        return v11, v12, v13
}
</code></pre></div></div>

<p>There are a few aspects to note here. First, why are we only concerned
with calls? What about other sources of exceptions? This is an
important invariant in the IR: exception <em>throws</em> are <em>only externally
sourced</em>. In other words, if an exception has been thrown, if we go
deep enough into the callstack, we will find that that throw was
implemented by calling out into the runtime.  The IR itself has no
other opcodes that throw! This turns out to be sufficient: (i) we only
need to build what Wasmtime needs, here, and (ii) we can implement
Wasm’s throw opcodes as “libcalls”, or calls into the Wasmtime
runtime. So, within Cranelift-compiled code, exception throws always
happen at callsites. We can thus get away with adding only one opcode,
<code class="language-plaintext highlighter-rouge">try_call</code>, and attach handler information directly to that opcode.</p>

<p>The next characteristic of note is that handlers are ordinary basic
blocks.  Thus may not seem remarkable unless one has seen other
compiler IRs, such as LLVM’s, where exception handlers are definitely
special: they start with “landing pad” instructions, and cannot be
branched to as ordinary basic blocks. That might look something like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>function %f() {
    block0:
        ;; Callsite defining a return value `v0`, with normal
        ;; return path to `block1` and exception handler `block2`.
        v0 = try_call ..., block1, [ tag0: block2 ]
        
    block1:
        ;; Normal return; use returned value.
        return v0
        
    block2 exn_handler: ;; Specially-marked block!
        ;; Exception handler payload value.
        v1 = exception_landing_pad
        ...
}
</code></pre></div></div>

<p>This bifurcation of kinds of blocks (normal and exception handler) is
undesirable from our point of view: just as exceptional edges add a
new cross-cutting concern that every analysis and transform needs to
consider, so would new kinds of blocks with restrictions. It was an
explicit design goal (and we have tests that show!) that the same
block can be both an ordinary block and a handler block – not because
that would be common, necessarily (handlers usually do very different
things than normal code paths), but because it’s one less weird quirk
of the IR.</p>

<p>But then if handlers are normal blocks, the data flow question becomes
very interesting. An exception-catching call, unlike every other
opcode in our IR, has <em>conditionally-defined values</em>: that is, its
normal function return value(s) are available only if the callee
returns normally, and the <em>exception payload value(s)</em>, which are
passed in from the unwinder and carry information about the caught
exception, are available only if the callee throws an exception that
we catch. How can we ensure that these values are represented such
that they can only be used in valid ways? We can’t make them all
regular SSA definitions of the opcode: that would mean that all
successors (regular return and exceptional) get to use them, as in:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>function %f() {
    block0:
        ;; Callsite defining a return value `v0`, with normal return path
        ;; to `block1` and exception handler `block2`.
        v0 = try_call ..., block1, [ tag0: block2 ]
      
    block1:
        ;; Use `v0` legally: it is defined on normal return.
        return v0
      
    block2:
        ;; Oops! We use `v0` here, but the normal return value is undefined
        ;; when an exception is caught and control reaches this handler block.
        return v0
}
</code></pre></div></div>

<p>This is the reason that a compiler may choose to make handler blocks
special: by bifurcating the universe of blocks, one ensures that
normal-return and exceptional-return values are used only where
appropriate. Some compiler IRs reify exceptional return payloads via
“landing pad” instructions that must start handler blocks, just as
phis start regular blocks (in phi- rather than blockparam-based
SSA). But, again, this bifurcation is undesirable.</p>

<p>Our insight here, after <a href="https://github.com/bytecodealliance/rfcs/pull/36">a lot of
discussion</a>, was to
put the definitions where they belong: <em>on the edges</em>. That is,
regular returns are only defined once we know we’re following the
regular-return edge, and likewise for exception payloads. But we don’t
want to have special instructions that must be in the successor
blocks: that’s a weird distributed invariant and, again, likely to
lead to bugs when transforming IR. Instead, we leverage the fact that
we use <em>blockparam-based SSA</em> and we widen the domain of allowable
block-call arguments.</p>

<p>Whereas previously one might end a block like <code class="language-plaintext highlighter-rouge">brif v1, block2(v2,
v3), block3(v4, v5)</code>, i.e. with blockparams assigned values in the
chosen successor via a list of value-uses in the branch, we now allow
(i) SSA values, (ii) a special “normal return value” sentinel, or
(iii) a special “exceptional return value” sentinel. The latter two
are indexed because there can be more than one of each. So one can
write a block-call in a <code class="language-plaintext highlighter-rouge">try_call</code> as <code class="language-plaintext highlighter-rouge">block2(ret0, v1, ret1)</code>, which
passes the two return values of the call and a normal SSA value; or
<code class="language-plaintext highlighter-rouge">block3(exn0, exn1)</code>, which passes just the two exception payload
values.  We do have a new well-formedness check on the IR that ensures
that (i) normal returns are used only in the normal-return blockcall,
and exception payloads are used only in the handler-table blockcalls;
(ii) normal returns’ indices are bounded by the signature; and (iii)
exception payloads’ indices are bounded by the ABI’s number of
exception payload values; but all of these checks are local to the
instruction, not distributed across blocks. That’s nice, and conforms
with the way that all of our other instructions work, too. (Block-call
argument types are then checked against block-parameter types in the
successor block, but that happens the same as for any branch.) So we
have, repeating from above, a callsite like</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    block1:
        try_call fn0(v1), block2(ret0), [ tag0: block3(exn0, exn1) ]
</code></pre></div></div>

<p>with all of the desired properties: only one kind of block, explicit
control flow, and SSA values defined only where they are legal to use.</p>

<p>All of this may seem somewhat obvious in hindsight, but as attested by
the above GitHub discussions and Cranelift weekly meeting minutes, it
was far from clear when we started how to design all of this to
maximize simplicity and generality and minimize quirks and
footguns. I’m pretty happy with our final design: it feels like a
natural extension of our core blockparam-SSA control flow graph, and I
managed to <a href="https://github.com/bytecodealliance/wasmtime/pull/10510">put it into the
compiler</a>
without too much trouble at all (well, <a href="https://github.com/bytecodealliance/wasmtime/pull/10502">a
few</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/10485">PRs</a> and
<a href="https://github.com/bytecodealliance/wasmtime/pull/10555">associated</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/10554">fixes</a> to
Cranelift
<a href="https://github.com/bytecodealliance/regalloc2/pull/214">and</a>
<a href="https://github.com/bytecodealliance/regalloc2/pull/220">regalloc2</a>
<a href="https://github.com/bytecodealliance/regalloc2/pull/216">functionality</a>
<a href="https://github.com/bytecodealliance/regalloc2/pull/224">and testing</a>;
and I’m sure I’ve missed a few).</p>

<h3 id="data-flow-and-abi">Data Flow and ABI</h3>

<p>So we have defined an IR that can express exception handlers – what
about the interaction between this function body and the unwinder? We
will need to define a different kind of semantics to nail down that
interface: in essence, it is a property of the <a href="https://en.wikipedia.org/wiki/Application_Binary_Interface">ABI (Application
Binary
Interface)</a>.</p>

<p>As mentioned above, existing exception-handling ABIs exist for native
code, such as compiled C++. While we are certainly willing to draw
inspiration from native ABIs and align with them as much as makes
sense, in Wasmtime we already define our own ABI<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">7</a></sup>, and so we are
not necessarily constrained by existing standards.</p>

<p>In particular, there is a very good reason we would prefer not to: to
unwind to a particular exception handler, register state must be
restored as specified in the ABI, and the standard Itanium ABI
requires the usual callee-saved (“non-volatile”) registers on the
target ISA to be restored. But this requires (i) having the register
state at time of throw, and (ii) processing unwind metadata at each
stack frame as we walk up the stack, reading out values of saved
registers from stack frames. The latter is <a href="https://github.com/bytecodealliance/wasmtime/pull/2710">already
supported</a>
with a generic “unwind pseudoinstruction” framework I built four years
ago, but would still add complexity to our unwinder, and this
complexity would be load-bearing for correctness; and the former is
extremely difficult with Wasmtime’s normal runtime-entry
trampolines. So we instead choose to have a simpler exception ABI: all
<code class="language-plaintext highlighter-rouge">try_call</code>s, that is, callsites with handlers, clobber <em>all</em>
registers. This means that the compiler’s ordinary register-allocation
behavior will save all live values to the stack and restore them on
either a normal or exceptional return. We only have to restore the
stack (stack pointer and frame pointer registers) and redirect the
program counter (PC) to a handler.</p>

<p>The other aspect of the ABI that matters to the exception-throw
unwinder is exceptional payload. The native Itanium ABI specifies two
registers on most platforms (e.g.: <code class="language-plaintext highlighter-rouge">rax</code> and <code class="language-plaintext highlighter-rouge">rdx</code> on x86-64, or <code class="language-plaintext highlighter-rouge">x0</code>
and <code class="language-plaintext highlighter-rouge">x1</code> on aarch64) to carry runtime-defined playload; so for
simplicity, we adopt the same convention.</p>

<p>That’s all well and good; now how do we implement <code class="language-plaintext highlighter-rouge">try_call</code> with the
appropriate register-allocator behavior to conform to this? We already
have fairly complex ABI handling
(<a href="https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/machinst/abi.rs">machine-independent</a>
<a href="https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/isa/x64/abi.rs">and</a>
<a href="https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/isa/aarch64/abi.rs">five</a>
<a href="https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/isa/s390x/abi.rs">different</a>
<a href="https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/isa/riscv64/abi.rs">architecture</a>
<a href="https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/isa/pulley_shared/abi.rs">implementations</a>)
in Cranelift, but it follows a general pattern: we generate a single
instruction at the register-allocator level, and emit uses and defs
with fixed-register constraints. That is, we tell regalloc that
parameters must be in certain registers (e.g., <code class="language-plaintext highlighter-rouge">rdi</code>, <code class="language-plaintext highlighter-rouge">rsi</code>, <code class="language-plaintext highlighter-rouge">rcx</code>,
<code class="language-plaintext highlighter-rouge">rdx</code>, <code class="language-plaintext highlighter-rouge">r8</code>, <code class="language-plaintext highlighter-rouge">r9</code> on x86-64 System-V calling-convention platforms, or
<code class="language-plaintext highlighter-rouge">x0</code> up to <code class="language-plaintext highlighter-rouge">x7</code> on aarch64 platforms) and let it handle any necessary
moves. So in the simplest case, a call might look like (on aarch64),
with register-allocator uses/defs and constraints annotated:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bl (call) v0 [def, fixed(x0)], v1 [use, fixed(x0)], v2 [use, fixed(x1)]
</code></pre></div></div>

<p>It is not always this simple, however: calls are not actually always a
single instruction, and this turned out to be quite problematic for
exception-handling support. In particular, when values are returned in
memory, as the ABI specifies they must be when there are more return
values than registers, we add (or added, prior to this work!) load
instructions <em>after</em> the call to load the extra results from their
locations on the stack. So a callsite might generate instructions like</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bl v0 [def, fixed(x0)], ..., v7 [def, fixed(x7)] # first eight return values
ldr v8, [sp]     # ninth return value
ldr v9, [sp, #8] # tenth return value
</code></pre></div></div>

<p>and so on. This is problematic simply because we said that the
<code class="language-plaintext highlighter-rouge">try_call</code> was a terminator; and it is at the IR level, but no longer
at the regalloc level, and regalloc expects correctly-formed
control-flow graphs as well. So I had to <a href="https://github.com/bytecodealliance/wasmtime/pull/10502">do a
refactor</a> to
merge these return-value loads into a single regalloc-level
pseudoinstruction, and in turn this cascaded into a few regalloc fixes
(<a href="https://github.com/bytecodealliance/regalloc2/pull/226">allowing more than 256
operands</a> and
<a href="https://github.com/bytecodealliance/regalloc2/pull/214">more aggressively splitting live-ranges to allow worst-case
allocation</a>,
plus a <a href="https://github.com/bytecodealliance/regalloc2/pull/220">fix to the live range-splitting
fix</a> and a
<a href="https://github.com/bytecodealliance/regalloc2/pull/216">fuzzing
improvement</a>).</p>

<p>There is one final question that might arise when considering the
interaction of exception handling and register allocation in
Cranelift-compiled code. In Cranelift, we have an invariant that the
register allocator is allowed to insert <em>moves</em> between any two
instructions – register-to-register, or loads or stores to/from
spill-slots in the stack frame, or moves between different spill-slots
– and indeed it does this whenever there is more state than fits in
registers. It also needs to insert <em>edge moves</em> “between” blocks,
because when jumping to another spot in the code, we might need the
register values in a differently-assigned configuration. When we have
an unwinder that jumps to a different spot in the code to invoke a
handler, we need to ensure that all the proper moves have executed so
the state is as expected.</p>

<p>The answer here turns out to be a careful argument that we don’t need
to do anything at all. (That’s the best kind of solution to a problem,
but only if one is correct!) The crux of the argument has to do with
critical edges. A critical edge is one from a block with multiple
successors to one with multiple predecessors: for example, in the graph</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   A    D
  / \  /
 B   C
</code></pre></div></div>

<p>where A can jump to B or C, and D can also jump to C, then A-to-C is a
critical edge. The problem with critical edges is that there is
nowhere to put code that has to run on the transition from A to C (it
can’t go in A, because we may go to B or C; and it can’t go in C,
because we may have come from A or D). So the register allocator
prohibits them, and we “split” them when generating code by inserting
empty blocks (<code class="language-plaintext highlighter-rouge">e</code> below) on them:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   A    D
  / \   |
 |   e  |
 |   \ /
 B    C
</code></pre></div></div>

<p>The key insight is that a <code class="language-plaintext highlighter-rouge">try_call</code> always has more than one
successor as long as it has a handler (because it must always have a
normal return-path successor too)<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">8</a></sup>; and in this case, because we
split critical edges, the immediate successor block on the
exception-catch path has only one predecessor. So the register
allocator can always put its moves that have to run on catching an
exception in the successor (handler) block rather than the predecessor
block. Our rule for where to put edge moves prefers the successor
(block “after” the edge) unless it has multiple in-edges, so this was
already the case. The only thing we have to be careful about is to
record the address of the <em>inserted edge block</em>, if any (<code class="language-plaintext highlighter-rouge">e</code> above),
rather than the IR-level handler block (<code class="language-plaintext highlighter-rouge">C</code> above), in the handler
table.</p>

<p>And that’s pretty much it, as far as register allocation is concerned!</p>

<p>We’ve now covered the basics of Cranelift’s exception support. At this
point, having landed the compiler half but not the Wasmtime half, I
context-switched away for a bit, and in the meantime, bjorn3 picked
this support up right away as a means to add panic-unwinding support
to
<a href="https://github.com/rust-lang/rustc_codegen_cranelift"><code class="language-plaintext highlighter-rouge">rustc_codegen_cranelift</code></a>,
the Cranelift-based Rust compiler backend. With <a href="https://github.com/bytecodealliance/wasmtime/pull/10593">a few small
changes</a> they
contributed, and a <a href="https://github.com/bytecodealliance/wasmtime/pull/10709">followup edge-case
fix</a> and a
<a href="https://github.com/bytecodealliance/wasmtime/pull/10609">refactor</a>,
panic-unwinding support in <code class="language-plaintext highlighter-rouge">rustc_codegen_cranelift</code> was working. That
was very good intermediate validation that what I had built was usable
and relatively solid.</p>

<h2 id="exceptions-in-wasmtime">Exceptions in Wasmtime</h2>

<p>We have a compiler that supports exceptions; we understand Wasm
exception semantics; let’s build support into Wasmtime! How hard could
it be?</p>

<h3 id="challenge-1-garbage-collection-interactions">Challenge 1: Garbage Collection Interactions</h3>

<p>I started by sketching out the codegen for each of the three opcodes
(<code class="language-plaintext highlighter-rouge">try_table</code>, <code class="language-plaintext highlighter-rouge">throw</code>, and <code class="language-plaintext highlighter-rouge">throw_ref</code>). My mental model at the very
beginning of this work, having read but not fully internalized the
Wasm exception-handling proposal, was that I would be able to
implement a “basic” throw/catch first, and then somehow build the
<code class="language-plaintext highlighter-rouge">exnref</code> objects later. And I had figured I could build <code class="language-plaintext highlighter-rouge">exnref</code>s in a
(in hindsight) somewhat hacky way, by aggregating values together in a
kind of tuple and creating a table of such tuples indexed by exnrefs,
just as Wasmtime does for externrefs.</p>

<p>This understanding quickly gave way to a deeper one when I realized a
few things:</p>

<ul>
  <li>
    <p>Exception objects (exnrefs) can carry references to other GC objects
(that is, GC types can be part of the payload signature of an
exception), and GC objects can store exnrefs in fields. Hence,
exnrefs need to be traced, and can participate in GC cycles; this
either implies an additional collector on top of our GC collector
(ugh) or means that exception objects needs to be on the GC heap
when GC is enabled.</p>
  </li>
  <li>
    <p>We’ll need a host API to introspect and build exception objects, and
we already have nice host APIs for GC objects.</p>
  </li>
</ul>

<p>There was a question <a href="https://github.com/bytecodealliance/wasmtime/pull/11230">in an extensively-discussed
PR</a> whether
we could build a cheap “subset” implementation that doesn’t mandate
the existence of a GC heap for storing exception objects. This would
be great in theory for guests that use exceptions for C-level
setjmp/longjmp but no other GC features.  However, it’s a little
tricky for a few reasons. First, this would require the subset to
exclude <code class="language-plaintext highlighter-rouge">throw_ref</code> (so we don’t have to invent another kind of
exception object storage). But it’s not great to subset the spec –
and <code class="language-plaintext highlighter-rouge">throw_ref</code> is not just for GC guest languages, but also for
rethrows. Second, more generally, this is additional maintenance and
testing surface that we’d rather not have for now. Instead we expect
that we can make GC cheap enough, and its growth heuristic smart
enough that a “frequent setjmp/longjmp” stress-test of exceptions (for
example) should live within a very small (e.g., few-kilobyte) GC heap,
essentially approximating the purpose-built storage. My colleague Nick
Fitzgerald (who built and is driving improvements to Wasmtime’s GC
support) wrote up <a href="https://github.com/bytecodealliance/wasmtime/issues/11256">a nice
issue</a>
describing the tradeoffs and ideas we have.</p>

<p>All of that said, we’ll only build one exception object implementation
– great! – but it will have to be a new kind of GC object. This
spawned a <a href="https://github.com/bytecodealliance/wasmtime/pull/11230">large
PR</a> to build
out exception objects first, prior to actual support for throwing and
catching them, with host APIs to allocate them and inspect their
fields. In essence, they are structs with immutable fields and with a
less-exposed type lattice and no subtyping.</p>

<h3 id="challenge-2-generative-tags-and-dynamic-identity">Challenge 2: Generative Tags and Dynamic Identity</h3>

<p>So there I was, implementing the <code class="language-plaintext highlighter-rouge">throw</code> instruction’s libcall
(runtime implementation), and finally getting to the heart of the
matter: the unwinder itself, which walks stack frames to find a
matching exception handler.  This is the final bit of functionality
that ties it all together. We’re almost there!</p>

<p>But wait: check out that <a href="https://webassembly.github.io/spec/core/exec/instructions.html#xref-syntax-instructions-syntax-instr-control-mathsf-throw-x">spec
language</a>.
We load the “tag address” from the store in step 9: we allocate the
exception instance <code class="language-plaintext highlighter-rouge">{tag z.tags[x], fields val^n}</code>. What is this
<code class="language-plaintext highlighter-rouge">tags</code> array on the store (<code class="language-plaintext highlighter-rouge">z</code>) in the runtime semantics? Tags have
dynamic identity, not static identity! (This is the part where I
learned about the thing I described
<a href="#dynamic-identity-and-compositionality">above</a>.)</p>

<p>This was a problem, because I had defined exception tables to
associate handlers with tags that were identified by integer (<code class="language-plaintext highlighter-rouge">u32</code>)
– like most other entities in Cranelift IR, I had figured this would
be sufficient to let Wasmtime define indices (say: index of the tag in
the module), and then we could compare static tag IDs.</p>

<p>Perhaps this is no problem: the static index defines the entity ID in
the module (defined or imported tag), and we can compare that and the
instance ID to see if a handler is a match. But how do we get the
instance ID from the stack frame?</p>

<p>It turns out that Wasmtime didn’t have a way, because nothing had
needed that yet. (This deficiency had been noticed before when
implementing Wasm coredumps, but there hadn’t been enough reason or
motivation to fix it then.) So I <a href="https://github.com/bytecodealliance/wasmtime/issues/11285">filed an
issue</a> with
a few ideas. We could add a new field in every frame storing the
instance pointer – and in fact this is a simple version of what at
least one other production Wasm implementation, in the SpiderMonkey
web engine,
<a href="https://searchfox.org/firefox-main/rev/643d732886fe0de4e2a3eee3c5ed9bd0d47c77cf/js/src/wasm/WasmFrame.h#112-115">does</a>
(though as described in that <code class="language-plaintext highlighter-rouge">[SMDOC]</code> comment, it only stores
instance pointers on transitions between frames of different
instances; this is enough for the unwinder when walking linearly up
the stack). But that would add overhead to <em>every</em> Wasm function (or
with SpiderMonkey’s approach, require adding trampolines between
instances, which would be a large change for Wasmtime), and exception
handling is still used somewhat rarely in practice.  Ideally we’d have
a “pay-as-you-go” scheme with as little extra complexity as posible.</p>

<p>Instead, I came up with an idea to <a href="https://github.com/bytecodealliance/wasmtime/pull/11321">add “dynamic context” items to
exception handler
lists</a>. The
idea is that we inject an SSA value into the list and it is stored in
a stack location that is given in the handler table metadata, so the
stack-walker can find it. To Cranelift, this is some arbitrary opaque
value; Wasmtime will use it to store the raw instance pointer
(<code class="language-plaintext highlighter-rouge">vmctx</code>) for use by the unwinder.</p>

<p>This filled out the design to a more general state nicely: it is
symmetric with exception payload, in the sense that the compiled code
can communicate context or state <em>to</em> the unwinder as it reads the
frames, and the unwinder in turn can communicate data <em>to</em> the
compiled code when it unwinds.</p>

<p>It turns out – though I didn’t intend this at all at the time – that
this also nicely solves the <em>inlining problem</em>. In brief, we want all
of our IR to be “local”, not treating the function boundary specially;
this way, IR can be composed by the inliner without anything
breaking. Storing some “current instance” state for the whole function
will, of course, break when we inline a function from one module
(hence instance) into another!</p>

<p>Instead, we can give a nice operational semantics to handler tables
with dynamic-context items: the unwinder should read left-to-right,
updating its “current dynamic context” at each dynamic-context item,
and checking for a tag match at tag-handler items. Then the inliner
can <em>compose</em> exception tables: when a <code class="language-plaintext highlighter-rouge">try_call</code> callsite inlines a
function body as its callee, and that body itself has any other
callsites, we attach a handler table that simply concatenates the
exception table items.</p>

<p>It’s important, here, to point out another surprising fact about Wasm
semantics: we <em>cannot do certain optimizations</em> to resolve handlers
statically or optimize the handler list, or at least not naively,
without global program analysis to understand where tags come
from. For example, if we see a handler for tag 0 then one for tag 1,
and we see a throw for tag 1 directly inside the <code class="language-plaintext highlighter-rouge">try_table</code>s body, we
cannot necessarily resolve it: tag 0 and tag 1 could be the same tag!</p>

<p>Wait, how can that be? Well, consider <em>tag imports</em>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(module
  (import "test" "e0" (tag $e0))
  (import "test" "e1" (tag $e1))

  (func ...
        (try_table
                   (catch $e0 $b0)
                   (catch $e1 $b1)
                   (throw $e1)
                   (unreachable))))
</code></pre></div></div>

<p>We could instantiate this module giving the same dynamic tag instance
twice, for both imports; then the first handler (to block <code class="language-plaintext highlighter-rouge">$b0</code>)
matches; or separate tags; then the block <code class="language-plaintext highlighter-rouge">$b1</code> matches. The only way
to win the optimization game is not to play – we have to preserve the
original handler list.  Fortunately, that makes the compiler’s job
easier. We transcribe the <code class="language-plaintext highlighter-rouge">try_table</code>’s handlers directly to Cranelift
exception-handler tables, and those directly to metadata in the
compiled module, read in exactly that order by the unwinder’s
handler-matching logic.</p>

<h3 id="challenge-3-rooting">Challenge 3: Rooting</h3>

<p>Since exception objects are GC-managed objects, we have to ensure that
they are properly <em>rooted</em>: that is, any handles to these objects
outside of references inside other GC objects need to be known to the
GC so the objects remain alive (and so the references are updated in
the case of a moving GC).</p>

<p>Within a Wasm-to-Wasm exception throw scenario, this is fairly easy:
the references are rooted in the compiled code on either side of the
control-flow transfer, and the reference only briefly passes through
the unwinder. As long as we are careful to handle it with the
appropriate types, all will work fine.</p>

<p>Passing exceptions across the host/Wasm boundary is another matter,
though. We support the full matrix of {host, Wasm} x {host, Wasm}
exception catch/throw pairs: that is, exceptions can be thrown from
native host code called by Wasm (via a Wasm import), and exceptions
can be thrown out of Wasm code and returned as a kind of error to the
host code that invoked the Wasm. This works by boxing the exception
inside an <code class="language-plaintext highlighter-rouge">anyhow::Error</code> so we use Rust-style value-based error
propagation (via <code class="language-plaintext highlighter-rouge">Result</code> and the <code class="language-plaintext highlighter-rouge">?</code> operator) in host code.</p>

<p>What happens when we have a value inside the <code class="language-plaintext highlighter-rouge">Error</code> that holds an
exception object in the Wasmtime <code class="language-plaintext highlighter-rouge">Store</code>? How does Wasmtime know this
is rooted?</p>

<p>The answer in Wasmtime prior to recent work was to use one of two
kinds of external rooting wrappers: <code class="language-plaintext highlighter-rouge">Rooted</code> and
<code class="language-plaintext highlighter-rouge">ManuallyRooted</code>. Both wrappers hold an index into a table contained
inside the <code class="language-plaintext highlighter-rouge">Store</code>, and that table contains the actual GC
reference. This allows the GC to easily see the roots and update them.</p>

<p>The difference lies in the lifetime disciplines: <code class="language-plaintext highlighter-rouge">ManuallyRooted</code>
requires, as the name implies, manual unrooting; it has no <code class="language-plaintext highlighter-rouge">Drop</code>
implementation, and so easily creates leaks. <code class="language-plaintext highlighter-rouge">Rooted</code>, on the other
hand, had a LIFO (last-in first-out) discipline based on a <code class="language-plaintext highlighter-rouge">Scope</code>, an
RAII type created by the embedder (user) of Wasmtime. <code class="language-plaintext highlighter-rouge">Rooted</code> GC
references that escape that dynamic scope are unrooted, and will cause
an error (panic) at runtime if used. Neither of those behaviors is
ideal for a value type – an exception – that is <em>meant</em> to escape
scopes via <code class="language-plaintext highlighter-rouge">?</code>-propagation.</p>

<p>The design that we landed on, instead, takes a different and much
simpler approach: the <code class="language-plaintext highlighter-rouge">Store</code> has a single, explicit root slot for the
“pending exception”, and host code can set this and then return a
<em>sentinel value</em> (<code class="language-plaintext highlighter-rouge">wasmtime::ThrownException</code>) in the <code class="language-plaintext highlighter-rouge">Result</code>’s error
type (boxed up into an <code class="language-plaintext highlighter-rouge">anyhow::Error</code>). This easily allows
propagation to work as expected, with no unbounded leaks (there is
only one pending exception that is rooted) and no unrooted propagating
exceptions (because no actual GC reference propagates, only the
sentinel).</p>

<p>As a side-quest, while thinking through this rooting dilemma, I also
<a href="https://github.com/bytecodealliance/wasmtime/issues/11445">realized</a>
that it <em>should</em> be possible to create an “owned” rooted reference
that behaves more like a conventional owned Rust value (e.g. <code class="language-plaintext highlighter-rouge">Box</code>);
hence <a href="https://github.com/bytecodealliance/wasmtime/pull/11514"><code class="language-plaintext highlighter-rouge">OwnedRooted</code> was born to replace
<code class="language-plaintext highlighter-rouge">ManuallyRooted</code></a>.
This type works without requiring access to the <code class="language-plaintext highlighter-rouge">Store</code> to unroot when
dropped; the key idea is to hold a refcount to a separate tiny
allocation that is used as a “drop flag”, and then have the store
periodically scan these drop-flags and lazily remove roots, with a
thresholding algorithm to give that scanning amortized linear-time
behavior.<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">9</a></sup></p>

<p>Now that we have this, in theory, we could pass an
<code class="language-plaintext highlighter-rouge">OwnedRooted&lt;ExnRef&gt;</code> directly in the <code class="language-plaintext highlighter-rouge">Error</code> type to propagate
exceptions through host code; but the store-rooted approach is simple
enough, has a marginal performance advantage (no separate allocation),
and so I don’t see a strong need to change the API at the moment.</p>

<h3 id="life-of-an-exception-quick-walkthrough">Life of an Exception: Quick Walkthrough</h3>

<p>Now that we’ve discussed all the design choices, let’s walk through
the life an exception throw/catch, from start to finish. Let’s assume
a Wasm-to-Wasm throw/catch for simplicity here.</p>

<ul>
  <li>First, the Wasm program is executing within a <code class="language-plaintext highlighter-rouge">try_table</code>, which
results in an exception handler <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/translate/code_translator.rs#L609-L613">catch blocks being
created</a>
for each handler case listed in the <code class="language-plaintext highlighter-rouge">try_table</code> instruction. The
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/translate/code_translator.rs#L4325"><code class="language-plaintext highlighter-rouge">create_catch_block</code></a>
function generates code that invokes
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/func_environ/gc/enabled.rs#L415"><code class="language-plaintext highlighter-rouge">translate_exn_unbox</code></a>,
which reads out all of the fields from the exception object and
pushes them onto the Wasm operand stack in the handler path. This
handler block is registered in the
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/translate/stack.rs#L570"><code class="language-plaintext highlighter-rouge">HandlerState</code></a>,
which tracks the current lexical stack of handlers (and hands out
checkpoints so that when we pop out of a Wasm block-type operator,
we can pop the handlers off the state as well). These handlers are
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/translate/stack.rs#L611">provided as an
iterator</a>
which is <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/translate/code_translator.rs#L661">passed to the <code class="language-plaintext highlighter-rouge">translate_call</code>
method</a>
and eventually ends up <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/func_environ.rs#L2366-L2379">creating an exception
table</a>
on a <code class="language-plaintext highlighter-rouge">try_call</code> instruction. This <code class="language-plaintext highlighter-rouge">try_call</code> will invoke whatever
Wasm code is about to throw the exception.</li>
  <li>Then, the Wasm program reaches a <code class="language-plaintext highlighter-rouge">throw</code> opcode, which is
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/translate/code_translator.rs#L621">translated</a>
via
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/func_environ.rs#L2825"><code class="language-plaintext highlighter-rouge">FuncEnvironment::translate_exn_throw</code></a>
to a <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/func_environ/gc/enabled.rs#L473-L482">three-operation
sequence</a>
that fetches the current instance ID (via a libcall into the
runtime), allocates a new exception object with that instance ID and
a fixed tag number and fills in its slots with the given values
popped from the Wasm operand stack, and delegates to <code class="language-plaintext highlighter-rouge">throw_ref</code>.</li>
  <li>The <code class="language-plaintext highlighter-rouge">throw_ref</code> opcode implementation then invokes the
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/cranelift/src/func_environ/gc/enabled.rs#L519"><code class="language-plaintext highlighter-rouge">throw_ref</code></a>
libcall.</li>
  <li>This libcall is deceptively simple: its
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/libcalls.rs#L1695-L1707">implementation</a>
sets the pending exception on the store, and returns a sentinel that
signals a pending exception. That’s it!</li>
  <li>This works because the glue code for <em>all</em> libcalls processes errors
(via the
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/traphandlers.rs#L152"><code class="language-plaintext highlighter-rouge">HostResult</code></a>
trait implementations) and eventually reaches <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/traphandlers.rs#L773-L786">this
case</a>
which sees a pending exception sentinel and invokes
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/throw.rs#L15"><code class="language-plaintext highlighter-rouge">compute_handler</code></a>. Now
we’re getting to the heart of the exception-throw implementation.</li>
  <li><code class="language-plaintext highlighter-rouge">compute_handler</code> walks the stack with
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/throw.rs#L45"><code class="language-plaintext highlighter-rouge">Handler::find</code></a>,
which itself is based on
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/stackwalk.rs#L248"><code class="language-plaintext highlighter-rouge">visit_frames</code></a>,
which does about what one would expect for code with a frame-pointer
chain: it walks the singly-linked list of frames. At each frame, the
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/throw.rs#L51">closure</a>
that <code class="language-plaintext highlighter-rouge">compute_handler</code> gave to <code class="language-plaintext highlighter-rouge">Handler::find</code> looks up the program
counter in that frame (which will be a return address, i.e., the
instruction after the call that created the next lower frame) using
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/module/registry.rs#L74"><code class="language-plaintext highlighter-rouge">lookup_module_by_pc</code></a>
to find a <code class="language-plaintext highlighter-rouge">Module</code>, which itself has an
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/exception_table.rs#L225"><code class="language-plaintext highlighter-rouge">ExceptionTable</code></a>
(a parser for serialized metadata produced during compilation from
Cranelift metadata) that knows how to look up a PC within a
module. This will produce an <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/exception_table.rs#L310"><code class="language-plaintext highlighter-rouge">Iterator</code> over
handlers</a>
which we <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/throw.rs#L63">test in
order</a>
to see if any match. (The groups of exception-handler table items
that come out of Cranelift are post-processed
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/exception_table.rs#L144">here</a>
to generate the tables that the above routines search.)</li>
  <li>If we find a handler, that is, if <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/throw.rs#L108-L109">the dynamic tag instance is the
same</a>
or <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/throw.rs#L67">we reach a catch-all
handler</a>,
then we have an exception handler! We return the PC and SP to
restore
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/throw.rs#L112-L120">here</a>,
computing SP via an FP-to-SP offset (i.e., the size of the frame),
which is fixed and included in the exception tables when we
construct them.</li>
  <li>That action then becomes an <code class="language-plaintext highlighter-rouge">UnwindState::UnwindToWasm</code>
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/traphandlers.rs#L779">here</a>.</li>
  <li>This <code class="language-plaintext highlighter-rouge">UnwindToWasm</code> state then triggers <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/wasmtime/src/runtime/vm/traphandlers.rs#L913-L933">this
case</a>
in the <code class="language-plaintext highlighter-rouge">unwind</code> libcall, which is invoked whenever any libcall
returns an error code; that eventually calls the no-return function
<code class="language-plaintext highlighter-rouge">resume_to_exception_handler</code>, which is a little function written in
inline assembly that does exactly what it says on the tin. <a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/arch/x86.rs#L32-L34">These
three
instructions</a>
set <code class="language-plaintext highlighter-rouge">rsp</code> and <code class="language-plaintext highlighter-rouge">rbp</code> to their new values, and jump to the new <code class="language-plaintext highlighter-rouge">rip</code>
(PC). The same stub exists for each of our four native-compilation
architectures (x86-64 above,
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/arch/aarch64.rs#L60-L62">aarch64</a>,
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/arch/riscv64.rs#L29-L31">riscv64</a>,
and
<a href="https://github.com/bytecodealliance/wasmtime/blob/8e22ff89f6affe4f79fcebdb416d0ab401d43c97/crates/unwinder/src/arch/s390x.rs#L32-L33">s390x</a><sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">10</a></sup>).
That transfers control to the catch-block created above, and the
Wasm continues running, unboxing the exception payload and running
the handler!</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>So we have Wasm exception handling now! For all of the interesting
design questions we had to work through, the end was pretty
anticlimactic. I landed <a href="https://github.com/bytecodealliance/wasmtime/pull/11326">the final
PR</a>, and
after a follow-up cleanup PR
(<a href="https://github.com/bytecodealliance/wasmtime/pull/11467">1</a>) and
some fuzzbug fixes
(<a href="https://github.com/bytecodealliance/wasmtime/pull/11500">1</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/11507">2</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/11530">3</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/11531">4</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/11535">5</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/11564">6</a>
<a href="https://github.com/bytecodealliance/wasmtime/pull/11554">7</a>) having
mostly to do with null-pointer handling and other edge cases in the
type system, plus one interaction with tail-calls (and a
separate/pre-existing <a href="https://github.com/bytecodealliance/wasmtime/pull/11689">s390x ABI
bug</a> that it
uncovered), it has been basically stable. We pretty quickly got a few
user reports:
<a href="https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/Running.20lua.205.2E1.20wasi/near/535368031">here</a>
it was reported as working for a Lua interpreter using setjmp/longjmp
inside Wasm based on exceptions, and
<a href="https://bytecodealliance.zulipchat.com/#narrow/channel/217126-wasmtime/topic/WebAssembly.20exceptions.20proposal.20is.20now.20implemented/near/536299383">here</a>
it enabled Kotlin-on-Wasm to run and <a href="https://bytecodealliance.zulipchat.com/#narrow/channel/217126-wasmtime/topic/WebAssembly.20exceptions.20proposal.20is.20now.20implemented/near/543748960">pass a large
testsuite</a>.
Not bad!</p>

<!--
  https://github.com/bytecodealliance/regalloc2/pull/220: +82 -84
  https://github.com/bytecodealliance/regalloc2/pull/223: +5 -6
  https://github.com/bytecodealliance/regalloc2/pull/224: +16 -19
  https://github.com/bytecodealliance/wasmtime/pull/10510: +4199 -423
  https://github.com/bytecodealliance/wasmtime/pull/10609: +109 -100
  https://github.com/bytecodealliance/wasmtime/pull/10709: +49 -5
  https://github.com/bytecodealliance/regalloc2/pull/226: +26 -10
  https://github.com/bytecodealliance/regalloc2/pull/221: +1 -1
  https://github.com/bytecodealliance/wasmtime/pull/10571: +36 -24
  https://github.com/bytecodealliance/regalloc2/pull/225: +1 -1
  https://github.com/bytecodealliance/wasmtime/pull/10590: +19 -5
  https://github.com/bytecodealliance/regalloc2/pull/227: +1 -1
  https://github.com/bytecodealliance/wasmtime/pull/10747: +84 -5
  https://github.com/bytecodealliance/wasmtime/pull/10748: +84 -5
  https://github.com/bytecodealliance/wasmtime/pull/10919: +1472 -347
  https://github.com/bytecodealliance/wasmtime/pull/11230: +2490 -191
  https://github.com/bytecodealliance/regalloc2/pull/231: +93 -12
  https://github.com/bytecodealliance/wasmtime/pull/11321: +1771 -322
  https://github.com/bytecodealliance/wasmtime/pull/11326: +2593 -523
  https://github.com/bytecodealliance/wasmtime/pull/11467: +269 -227
  https://github.com/bytecodealliance/wasmtime/pull/11500: +246 -116
  https://github.com/bytecodealliance/wasmtime/pull/11507: +15 -1
  https://github.com/bytecodealliance/wasmtime/pull/11511: +6 -2
  https://github.com/bytecodealliance/wasmtime/pull/11514: +758 -531
  https://github.com/bytecodealliance/wasmtime/pull/11530: +12 -2
  https://github.com/bytecodealliance/wasmtime/pull/11531: +1 -0
  https://github.com/bytecodealliance/wasmtime/pull/11533: +31 -20
  https://github.com/bytecodealliance/wasmtime/pull/11535: +31 -1
  https://github.com/bytecodealliance/wasmtime/pull/11554: +2 -0
  https://github.com/bytecodealliance/wasmtime/pull/11564: +29 -5
  https://github.com/bytecodealliance/wasmtime/pull/10485: +303 -203
  https://github.com/bytecodealliance/regalloc2/pull/212: +1 -2
  https://github.com/bytecodealliance/wasmtime/pull/10502: +1338 -767
  https://github.com/bytecodealliance/regalloc2/pull/214: +7 -3
  https://github.com/bytecodealliance/regalloc2/pull/216: +56 -8
  https://github.com/bytecodealliance/wasmtime/pull/10554: +15 -26
  https://github.com/bytecodealliance/wasmtime/pull/10555: +13 -6
-->

<p>All told, this took 37 PRs with a diff-stat of <code class="language-plaintext highlighter-rouge">+16264 -4004</code> (16KLoC
total) – certainly not the “small-to-medium-sized” project I had
initially optimistically expected, but I’m happy we were able to build
it out and get it to a stable state relatively easily. It was a
rewarding journey in a different way than a lot of my past work
(mostly on the Cranelift side) – where many of my past projects have
been really very open-ended design or even research questions, here we
had the high-level shape already and all of the work was in designing
high-quality details and working out all the interesting interactions
with the rest of the system. I’m happy with how clean the IR design
turned out in particular, and I don’t think it would have done so
without the really excellent continual discussion with the rest of the
Cranelift and Wasmtime contributors (thanks to Nick Fitzgerald and
Alex Crichton in particular here).</p>

<p>As an aside: I am happy to see how, aside from use-cases for Wasm
exception handling, the exception support in Cranelift itself has been
useful too.  As mentioned above, <code class="language-plaintext highlighter-rouge">cg_clif</code> picked it up almost as soon
as it was ready; but then, as an unexpected and pleasant surprise,
Alex subsequently <a href="https://github.com/bytecodealliance/wasmtime/pull/11592">rewrote Wasmtime’s trap
unwinding</a> to
use Cranelift exception handlers in our entry trampolines rather than
a setjmp/longjmp, as the latter have longstanding semantic
questions/issues in Rust. This took <a href="https://github.com/bytecodealliance/wasmtime/pull/11629">one more
intrinsic</a>,
which I implemented after discussing with Alex how best to expose
exception handler addresses to custom unwind logic without the full
exception unwinder, but was otherwise a pretty direct application of
<code class="language-plaintext highlighter-rouge">try_call</code> and our exception ABI.  General building blocks prove
generally useful, it seems!</p>

<hr />

<p><em>Thanks to Alex Crichton and Nick Fitzgerald for providing feedback on
a draft of this post!</em></p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>To explain myself a bit, I underestimated the interactions of
  exception handling with garbage collection (GC); I hadn’t
  realized yet that <code class="language-plaintext highlighter-rouge">exnref</code>s were a full first-class value and
  would need to be supported including in the host API. Also, it
  turns out that exceptions can cross the host/guest boundary, and
  goodness knows that gets really fun really fast. I was <em>only</em>
  off by a factor of two on the compiler side at least! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>From an implementation perspective, the dynamic, interprocedural
  nature of exceptions is what makes them far more interesting,
  and involved, than classical control flow such as conditionals,
  loops, or calls! This is why we need a mechanism that involves
  runtime data structrues, “stack walks”, and lookup tables,
  rather than simply generating a jump to the right place: the
  target of an exception-throw can only be computed at runtime,
  and we need a convention to transfer control with “payload” to
  that location. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>For those so inclined, this is a
  <a href="https://en.wikipedia.org/wiki/Monad_(functional_programming)"><em>monad</em></a>,
  and e.g. <a href="https://en.wikipedia.org/wiki/Haskell">Haskell</a>
  implements the ability to have “result or error” types that
  return from a sequence early via
  <a href="https://hackage.haskell.org/package/base-4.21.0.0/docs/Data-Either.html#t:Either"><code class="language-plaintext highlighter-rouge">Either</code></a>,
  explicitly describing the concept as such. The <code class="language-plaintext highlighter-rouge">?</code> operator
  serves as the “bind” of the monad: it connects an
  error-producing computation with a use of the non-error value,
  returning the error directly if one is given instead. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>So named for the <a href="https://en.wikipedia.org/wiki/IA-64">Intel Itanium
  (IA-64)</a>, an instruction-set
  architecture that happened to be the first ISA where this scheme was
  implemented for C++, and is now essentially dead (before its time! woefully
  misunderstood!) but for that legacy… <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>It’s worth briefly noting here that the Wasm exception handling
  proposal went through a somewhat twisty journey, with an earlier
  variant (now called “legacy exception handling”) that shipped in
  some browsers but was never standardized handling rethrows in a
  different way. In particular, that proposal did not offer
  first-class exception object references that could be rethrown;
  instead, it had an explicit <code class="language-plaintext highlighter-rouge">rethrow</code> instruction. I wasn’t
  around for the early debates about this design, but in my
  opinion, providing first-class exception object references that
  can be plumbed around via ordinary dataflow is far nicer. It
  also permits a simpler implementation, as long as one literally
  implements the semantics by always allocating an exception
  object.<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">11</a></sup> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>To be precise, because it may be a little surprising:
  <code class="language-plaintext highlighter-rouge">catch_ref</code> pushes both the payload values <em>and</em> the exception
  reference onto the operand stack at the handler destination. In
  essence, the rule is: tag-specific variants always unpack the
  payloads; and <em>also</em>, <code class="language-plaintext highlighter-rouge">_ref</code> variants always push the exception
  reference. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>In particular, we have defined our own ABI in Wasmtime to allow
   universal tail calls between any two signatures to work, as
   required by the Wasm tail-calling opcodes. This ABI, called
   “<code class="language-plaintext highlighter-rouge">tail</code>”, is based on the standard System V calling convention
   but differs in that the callee cleans up any stack arguments. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>It’s not compiler hacking without excessive trouble from
   edge-cases, of course, so we had one <a href="https://github.com/bytecodealliance/wasmtime/pull/10709">interesting
   bug</a>
   from the <em>empty handler-list</em> case which means we have to force
   edge-splitting anyway for all <code class="language-plaintext highlighter-rouge">try_call</code>s for this subtle
   reason. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>Of course, while doing this, I managed to create
  <a href="https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-vvp9-h8p2-xwfc">CVE-2025-61670</a>
  in the C/C++ API by a combination of (i) a simple typo in the C
  FFI bindings (<code class="language-plaintext highlighter-rouge">as</code> vs. <code class="language-plaintext highlighter-rouge">from</code>, which is important when
  transferring ownership!) and (ii) not realizing that the C++
  wrapper does not properly maintain single ownership. We didn’t
  have ASAN tests, so I didn’t see this upfront; Alex discovered
  the issue while updating the Python bindings (which quickly
  found the leak) and managed the CVE. Sorry and thanks! <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>It turns out that even three lines of assembly are hard to get
  right: the s390x variant <a href="https://github.com/bytecodealliance/wasmtime/pull/10973">had a
  bug</a>
  where we got the register constraints wrong (GPR 0 is special on
  s390x, and a branch-to-register can only take GPR 1–15; we
  needed a different constraint to represent that)and had a
  miscompilation as a result. Thanks to our resident s390x
  compiler hacker Ulrich Weigand for tracking this down. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Of course, always boxing exceptions is not the only way to
  implement the proposal. It should be possible to “unbox”
  exceptions and skip the allocation, carrying payloads directly
  through some other engine state, if they are not caught as
  references. We haven’t implemented this optimization in Wasmtime
  and we expect the allocation performance for small exception
  objects to be adequate for most use-cases. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Chris Fallin</name></author><summary type="html"><![CDATA[This is a blog post outlining the odyssey I recently took to implement the Wasm exception-handling proposal in Wasmtime, the open-source WebAssembly engine for which I’m a core team member/maintainer, and its Cranelift compiler backend.]]></summary></entry><entry><title type="html">Wasmtime 35 Brings AArch64 Support in Winch</title><link href="https://bytecodealliance.org/articles/winch-aarch64-support" rel="alternate" type="text/html" title="Wasmtime 35 Brings AArch64 Support in Winch" /><published>2025-08-14T00:00:00+00:00</published><updated>2025-08-14T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/winch-aarch64-support</id><content type="html" xml:base="https://bytecodealliance.org/articles/winch-aarch64-support"><![CDATA[<p><a href="https://wasmtime.dev/">Wasmtime</a> is a fast, secure, standards
compliant and lightweight WebAssembly (Wasm) runtime.</p>

<p>As of Wasmtime 35, Winch <a href="https://docs.wasmtime.dev/stability-tiers.html#aarch64">supports AArch64 for Core
Wasm</a>
proposals, along with additional Wasm proposals like the <a href="https://component-model.bytecodealliance.org/">Component
Model</a> and <a href="https://github.com/WebAssembly/custom-page-sizes/blob/main/proposals/custom-page-sizes/Overview.md">Custom Page
Sizes</a>.
<!--end_excerpt--></p>

<p>Embedders can
<a href="https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#method.strategy">configure</a>
Wasmtime to use either <a href="https://cranelift.dev/">Cranelift</a> or
<a href="https://github.com/bytecodealliance/wasmtime/tree/main/winch">Winch</a>
as the Wasm compiler depending on the use-case: Cranelift is an
optimizing compiler aiming to generate fast code. Winch is a
‘baseline’ compiler, aiming for fast compilation and low-latency
startup.</p>

<p>This blog post will cover the main changes needed to accommodate
support for AArch64 in Winch.</p>

<h2 id="quick-tour-of-winchs-architecture">Quick Tour of Winch’s Architecture</h2>

<p>To achieve its low-latency goal, Winch focuses on converting Wasm code
to assembly code for the target Instruction Set Architecture (ISA) as
quickly as possible. Unlike Cranelift, Winch’s architecture
intentionally avoids using an intermediate representation or complex
register allocation algorithms in its compilation process. For this
reason, baseline compilers are also referred to as single-pass
compilers.</p>

<p>Winch’s architecure can be largely divided into two parts
which can be classified as ISA-agnostic and ISA-specific.</p>

<p><img src="/articles/img/2025-07-16-winch-aarch64/compilation-process.png" alt="Winch's Architecture" /></p>

<p>Adding support for AArch64 to Winch involved adding a new
implementation of the <code class="language-plaintext highlighter-rouge">MacroAssembler</code> trait, which is ultimately in
charge of emitting AArch64 assembly. Winch’s ISA-agnostic components
remained unchanged, and shared with the existing x86_64
implementation.</p>

<p>Winch’s code generation context implements
<a href="https://crates.io/crates/wasmparser"><code class="language-plaintext highlighter-rouge">wasmparser</code></a>’s
<a href="https://docs.rs/wasmparser/0.235.0/wasmparser/trait.VisitOperator.html"><code class="language-plaintext highlighter-rouge">VisitOperator</code></a>
trait, which requires defining handlers for each Wasm opcode:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">visit_i32_const</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span>
  <span class="c1">// Code generation starts here.</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When an opcode handler is invoked, the Code Generation Context
prepares all the necessary values and registers, followed by the
machine code emission of the sequence of instructions to represent the
Wasm instruction in the target ISA.</p>

<p>Last but not least, the register allocator algorithm uses a simple
round robin approach over the available ISA registers. When a
requested register is unavailable, all the current live values at the
current program point are saved to memory (known as value spilling),
thereby freeing the requested register for immediate use.</p>

<h2 id="emitting-aarch64-assembly">Emitting AArch64 Assembly</h2>

<h3 id="shadow-stack-pointer-ssp">Shadow Stack Pointer (SSP)</h3>

<p>AArch64 defines very specific restrictions with regards to the usage
of the stack pointer register (SP). Concretely, SP must be 16-byte
aligned whenever it is used to address stack memory. Given that
Winch’s register allocation algorithm requires value spilling at
arbitrary program points, it can be challenging to maintain such
alignment.</p>

<p>AArch64’s SP requirement states that SP must be 16-byted when
addressing stack memory, however it can be unaligned if not used to
address stack memory and doesn’t prevent using other registers for
stack memory addressing, nor it states that these other registers be
16-byte aligned. To avoid opting for less efficient approaches like
overallocating memory to ensure alignment each time a value is saved,
Winch’s architecture employs a <em>shadow stack pointer</em> approach.</p>

<p>Winch’s shadow stack pointer approach defines <code class="language-plaintext highlighter-rouge">x28</code> as the base register
for stack memory addressing, enabling:</p>

<ul>
  <li>8-byte stack slots for live value spilling.</li>
  <li>8-byte aligned stack memory loads.</li>
</ul>

<h3 id="signal-handlers">Signal handlers</h3>

<p>Wasmtime can be
<a href="https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#method.signals_based_traps">configured</a>
to leverage signals-based traps to detect exceptional situations in
Wasm programs e.g., an out-of-bounds memory access. Traps are
synchronous exceptions, and when they are raised, they are caught and
handled by code defined in Wasmtime’s runtime. These handlers are Rust
functions compiled to the target ISA, following the native calling
convention, which implies that whenever there is a transition from
Winch generated code to a signal handler, SP must be 16-byte
aligned. Note that even though Wasmtime can be configured to avoid
signals-based traps, Winch does not support such option yet.</p>

<p>Given that traps can happen at arbitrary program points, Winch’s
approach to ensure 16-byte alignment for SP is two-fold:</p>

<ul>
  <li>Emit a series of instructions that will
correctly align SP before each potentially-trapping Wasm instruction.
Note that this could result in overallocation of stack memory if SP is
not 16-byte aligned.</li>
  <li>Exclusively use SSP as the canonical stack pointer value, copying
the value of SSP to SP after each allocation/deallocation. This
maintains the SP &gt;= SSP invariant, which ensures that SP always
reflects an overapproximation of the consumed stack space and it
allows the generated code to save an extra move instruction, if
overallocation due to alignment happens, as described in the
previous point.</li>
</ul>

<p>It’s worth noting that the approach mentioned above doesn’t take into
account asynchronous exceptions, also known as interrupts. Further
testing and development is needed in order to ensure that Winch
generated code for AArch64 can correctly handle interrupts e.g.,
<code class="language-plaintext highlighter-rouge">SIGALRM</code>.</p>

<h3 id="immediate-value-handling">Immediate Value Handling</h3>

<p>To minimize register pressure and reduce the need for spilling values,
Winch’s instruction selection prioritizes emitting instructions that
support immediate operands whenever possible, such as <code class="language-plaintext highlighter-rouge">mov x0,
#imm</code>. However, due to the fixed-width instruction encoding in AArch64
(which always uses 32-bit instructions), encoding large immediate
values directly within a single instruction can sometimes be
impossible. In such cases, the immediate is first loaded into an
auxiliary register—often a “scratch” or temporary register—and then
used in subsequent instructions that require register operands.</p>

<p>Scratch registers offer the advantage that they are not tracked by the
register allocator, reducing the possibility of register allocator
induced spills. However, they should be used sparingly and only for
short-lived operations.</p>

<p>AArch64’s fixed 32-bit instruction encoding imposes stricter limits on
the size of immediate values that can be encoded directly, unlike
other ISAs supported by Winch, such as x86_64, which support
variable-length instructions and can encode larger immediates more
easily.</p>

<p>Before supporting AArch64, Winch’s ISA-agnostic component assumed a
single scratch register per ISA. While this worked well for x86_64,
where most instructions can encode a broad range of immediates
directly, it proved problematic for AArch64. Specifically, for
instruction sequences involving instructions with immediates
in which the scratch register was previously acquired.</p>

<p>Consider the following snippet from Winch’s ISA-agnostic code for
computing a Wasm table element address:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 1. Load index into the scratch register.</span>
<span class="n">masm</span><span class="nf">.mov</span><span class="p">(</span><span class="n">scratch</span><span class="nf">.writable</span><span class="p">(),</span> <span class="n">index</span><span class="nf">.into</span><span class="p">(),</span> <span class="n">bound_size</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> 
<span class="c1">// 2. Multiply with an immediate element size.</span>
<span class="n">masm</span><span class="nf">.mul</span><span class="p">(</span>
	<span class="n">scratch</span><span class="nf">.writable</span><span class="p">(),</span>
	<span class="n">scratch</span><span class="nf">.inner</span><span class="p">(),</span>
	<span class="nn">RegImm</span><span class="p">::</span><span class="nf">i32</span><span class="p">(</span><span class="n">table_data</span><span class="py">.element_size</span><span class="nf">.bytes</span><span class="p">()</span> <span class="k">as</span> <span class="nb">i32</span><span class="p">),</span>
	<span class="n">table_data</span><span class="py">.element_size</span><span class="p">,</span>
<span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="n">masm</span><span class="nf">.load_ptr</span><span class="p">(</span>
	<span class="n">masm</span><span class="nf">.address_at_reg</span><span class="p">(</span><span class="n">base</span><span class="p">,</span> <span class="n">table_data</span><span class="py">.offset</span><span class="p">)</span><span class="o">?</span><span class="p">,</span>
	<span class="nd">writable!</span><span class="p">(</span><span class="n">base</span><span class="p">),</span>
<span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="n">masm</span><span class="nf">.mov</span><span class="p">(</span><span class="nd">writable!</span><span class="p">(</span><span class="n">tmp</span><span class="p">),</span> <span class="n">base</span><span class="nf">.into</span><span class="p">(),</span> <span class="n">ptr_size</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="n">masm</span><span class="nf">.add</span><span class="p">(</span><span class="nd">writable!</span><span class="p">(</span><span class="n">base</span><span class="p">),</span> <span class="n">base</span><span class="p">,</span> <span class="n">scratch</span><span class="nf">.inner</span><span class="p">()</span><span class="nf">.into</span><span class="p">(),</span> <span class="n">ptr_size</span><span class="p">)</span>
</code></pre></div></div>

<p>In step 1, the code clobbers the designated scratch register. More
critically, if the immediate passed to <code class="language-plaintext highlighter-rouge">Masm::mul</code> cannot be encoded
directly in the AArch64 mul instruction, the <code class="language-plaintext highlighter-rouge">Masm::mul</code> implementation
will load the immediate into a register—clobbering the scratch
register again—and emit a register-based multiplication instruction.</p>

<p>One way to address this limitation is to avoid using a scratch
register for the index altogether and instead request a register from
the register allocator. This approach, however, increases register
pressure and potentially raises memory traffic, particularly in
architectures like x86_64.</p>

<p>Winch’s preferred solution is to introduce an explicit scratch register
allocator that provides a small pool of scratch registers (e.g., x16
and x17 in AArch64). By managing scratch registers explicitly, Winch
can safely allocate and use them without risking accidental
clobbering, especially when generating code for architectures with
stricter immediate encoding constraints.</p>

<h2 id="whats-next">What’s Next</h2>

<p>Though it wasn’t a radical change, the completeness of AArch64 in
Winch marks a new stage for the compiler’s architecture, layering a
more robust and solid foundation for future ISA additions.</p>

<p>Contributions are welcome! If you’re interested in contributing, you can:</p>

<ul>
  <li>Start by reading <a href="https://docs.wasmtime.dev/contributing.html">Wasmtime’s contributing documentation</a></li>
  <li>Checkout <a href="https://github.com/orgs/bytecodealliance/projects/12/views/4">Winch’s project board</a></li>
</ul>

<h2 id="thats-a-wrap">That’s a wrap</h2>

<p>Thanks to everyone who <a href="https://github.com/bytecodealliance/wasmtime/issues/8321">contributed</a>
to the completeness of the AArch64 backend!
Thanks also to <a href="https://github.com/fitzgen">Nick Fitzgerald</a> and
<a href="https://github.com/cfallin">Chris Fallin</a> for their feedback on early
drafts of this article.</p>]]></content><author><name>Saúl Cabrera</name></author><summary type="html"><![CDATA[Wasmtime is a fast, secure, standards compliant and lightweight WebAssembly (Wasm) runtime. As of Wasmtime 35, Winch supports AArch64 for Core Wasm proposals, along with additional Wasm proposals like the Component Model and Custom Page Sizes.]]></summary></entry><entry><title type="html">Running WebAssembly (Wasm) Components From the Command Line</title><link href="https://bytecodealliance.org/articles/invoking-component-functions-in-wasmtime-cli" rel="alternate" type="text/html" title="Running WebAssembly (Wasm) Components From the Command Line" /><published>2025-05-21T00:00:00+00:00</published><updated>2025-05-21T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/invoking-component-functions-in-wasmtime-cli</id><content type="html" xml:base="https://bytecodealliance.org/articles/invoking-component-functions-in-wasmtime-cli"><![CDATA[<p>Wasmtime’s 33.0.0 release supports invoking Wasm component exports directly from the command line with the new <code class="language-plaintext highlighter-rouge">--invoke</code> flag. 
This article walks through building a Wasm component in Rust and using <code class="language-plaintext highlighter-rouge">wasmtime run --invoke</code> to execute specific functions (enabling powerful workflows for scripting, testing, and integrating Wasm into modern development pipelines).
<!--end_excerpt--></p>

<h2 id="the-evolution-of-wasmtimes-cli">The Evolution of Wasmtime’s CLI</h2>

<p>Wasmtime’s <code class="language-plaintext highlighter-rouge">run</code> subcommand has traditionally supported running Wasm modules as well as invoking that <strong>module</strong>’s exported function. However, with the evolution of the Wasm Component Model, this article focuses on a newer capability; creating a component that exports a function and then demonstrating how to invoke that <strong>component</strong>’s exported function.</p>

<p>By the end of this article, you’ll be ready to create Wasm components and orchestrate their exported component functions to improve your workflow’s efficiency and promote reuse. Potential examples include:</p>

<ul>
  <li>Shell Scripting: Embed Wasm logic directly into Bash or Python scripts for seamless automation.</li>
  <li>CI/CD Pipelines: Validate components in GitHub Actions, GitLab CI, or other automation tools without embedding them in host applications.</li>
  <li>Cross-Language Testing: Quickly verify that interfaces match across implementations in Rust, JavaScript, and Python.</li>
  <li>Debugging: Inspect exported functions during development with ease.</li>
  <li>Microservices: Chain components in serverless workflows, such as compress → encrypt → upload, leveraging Wasm’s modularity.</li>
</ul>

<h2 id="tooling--dependencies">Tooling &amp; Dependencies</h2>

<p>If you want to follow along, please install:</p>

<ul>
  <li><a href="https://www.rust-lang.org/tools/install">Rust</a> (if you already have Rust installed, make sure you are on <a href="https://github.com/rust-lang/rust/releases">the latest version</a>),</li>
  <li><a href="https://crates.io/crates/cargo"><code class="language-plaintext highlighter-rouge">cargo</code></a> (if already installed, please make sure you are on <a href="https://crates.io/crates/cargo">the latest version</a>),</li>
  <li><a href="https://crates.io/crates/cargo-component"><code class="language-plaintext highlighter-rouge">cargo component</code></a> (if already installed, please make sure you are on <a href="https://crates.io/crates/cargo-component">the latest version</a>), and</li>
  <li><a href="https://docs.wasmtime.dev/cli-install.html"><code class="language-plaintext highlighter-rouge">wasmtime</code> CLI</a> (or use a <a href="https://docs.wasmtime.dev/cli-install.html#download-precompiled-binaries">precompiled binary</a>). If already installed, ensure you are using <a href="https://github.com/bytecodealliance/wasmtime/releases">v33.0.0</a> or newer.</li>
</ul>

<p>You can check versions using the following commands:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>rustc <span class="nt">--version</span>
<span class="gp">$</span><span class="w"> </span>cargo <span class="nt">--version</span>
<span class="gp">$</span><span class="w"> </span>cargo component <span class="nt">--version</span>
<span class="gp">$</span><span class="w"> </span>wasmtime <span class="nt">--version</span>
</code></pre></div></div>

<p>We must explicitly <code class="language-plaintext highlighter-rouge">add</code> the <code class="language-plaintext highlighter-rouge">wasm32-wasip2</code> target. This ensures that our component adheres to WASI’s system interface for non-browser environments (e.g., file system access, sockets, random etc.):</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>rustup target add wasm32-wasip2
</code></pre></div></div>

<h2 id="creating-a-new-wasm-component-with-rust">Creating a New Wasm Component With Rust</h2>

<p>Let’s start by creating a new Wasm library that we will later convert to a Wasm component using <code class="language-plaintext highlighter-rouge">cargo component</code> and the <code class="language-plaintext highlighter-rouge">wasm32-wasip2</code> target:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>cargo component new <span class="nt">--lib</span> wasm_answer
<span class="gp">$</span><span class="w"> </span><span class="nb">cd </span>wasm_answer
</code></pre></div></div>

<p>If you open the <code class="language-plaintext highlighter-rouge">Cargo.toml</code> file, you will notice that the <code class="language-plaintext highlighter-rouge">cargo component</code> command has automatically added some essential configurations.</p>

<p>The <code class="language-plaintext highlighter-rouge">wit-bindgen-rt</code> dependency (with the <code class="language-plaintext highlighter-rouge">["bitflags"]</code> feature) under <code class="language-plaintext highlighter-rouge">[dependencies]</code>, and the <code class="language-plaintext highlighter-rouge">crate-type = ["cdylib"]</code> setting under the <code class="language-plaintext highlighter-rouge">[lib]</code> section.</p>

<p>Your <code class="language-plaintext highlighter-rouge">Cargo.toml</code> should now include these entries (as shown in the example below):</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[package]</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"wasm_answer"</span>
<span class="py">version</span> <span class="p">=</span> <span class="s">"0.1.0"</span>
<span class="py">edition</span> <span class="p">=</span> <span class="s">"2024"</span>

<span class="nn">[dependencies]</span>
<span class="nn">wit-bindgen-rt</span> <span class="o">=</span> <span class="p">{</span> <span class="py">version</span> <span class="p">=</span> <span class="s">"0.41.0"</span><span class="p">,</span> <span class="py">features</span> <span class="p">=</span> <span class="nn">["bitflags"]</span> <span class="p">}</span>

<span class="nn">[lib]</span>
<span class="py">crate-type</span> <span class="p">=</span> <span class="nn">["cdylib"]</span>

<span class="nn">[package.metadata.component]</span>
<span class="py">package</span> <span class="p">=</span> <span class="s">"component:wasm-answer"</span>

<span class="nn">[package.metadata.component.dependencies]</span>
</code></pre></div></div>

<p>The directory structure of the <code class="language-plaintext highlighter-rouge">wasm_answer</code> example is automatically scaffolded out for us by <code class="language-plaintext highlighter-rouge">cargo component</code>:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>tree wasm_answer
<span class="go">
wasm_answer
├── Cargo.lock
├── Cargo.toml
├── src
│   ├── bindings.rs
│   └── lib.rs
└── wit
    └── world.wit
</span></code></pre></div></div>

<h2 id="wit">WIT</h2>

<p>If we open the <code class="language-plaintext highlighter-rouge">wit/world.wit</code> file, that <code class="language-plaintext highlighter-rouge">cargo component</code> created for us, we can see that <code class="language-plaintext highlighter-rouge">cargo component</code> generates a minimal <code class="language-plaintext highlighter-rouge">world.wit</code> that exports a raw function:</p>

<pre><code class="language-wit">package component:wasm-answer;

/// An example world for the component to target.
world example {
    export hello-world: func() -&gt; string;
}
</code></pre>

<p>We can simply adjust the <code class="language-plaintext highlighter-rouge">export</code> line (as shown below):</p>

<pre><code class="language-wit">package component:wasm-answer;

/// An example world for the component to target.
world example {
    export get-answer: func() -&gt; u32;
}
</code></pre>

<blockquote>
  <p><strong>But, instead, let’s use an interface to export our function!</strong></p>
</blockquote>

<p>While the above approach works, the <a href="https://github.com/bytecodealliance/component-docs/blob/main/component-model/examples/tutorial/wit/adder/world.wit">recommended best practice</a> is to <strong>wrap related functions inside an interface, which you then export from your world</strong>. This is more modular, extensible, and aligns with how the Wasm Interface Type (WIT) format is used in multi-function or real-world components. Let’s update the <code class="language-plaintext highlighter-rouge">wit/world.wit</code> file as follows:</p>

<pre><code class="language-wit">package component:wasm-answer;

interface answer {
    get-answer: func() -&gt; u32;
}

world example {
    export answer;
}
</code></pre>

<p>Next, we update our <code class="language-plaintext highlighter-rouge">src/lib.rs</code> file accordingly, by pasting in the following Rust code:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[allow(warnings)]</span>
<span class="k">mod</span> <span class="n">bindings</span><span class="p">;</span>

<span class="k">use</span> <span class="nn">bindings</span><span class="p">::</span><span class="nn">exports</span><span class="p">::</span><span class="nn">component</span><span class="p">::</span><span class="nn">wasm_answer</span><span class="p">::</span><span class="nn">answer</span><span class="p">::</span><span class="n">Guest</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">Component</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Guest</span> <span class="k">for</span> <span class="n">Component</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">get_answer</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
        <span class="mi">42</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nn">bindings</span><span class="p">::</span><span class="nd">export!</span><span class="p">(</span><span class="n">Component</span> <span class="n">with_types_in</span> <span class="n">bindings</span><span class="p">);</span>
</code></pre></div></div>

<p>Now, let’s create the Wasm component with our exported <code class="language-plaintext highlighter-rouge">get_answer()</code> function:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>cargo component build <span class="nt">--target</span> wasm32-wasip2
</code></pre></div></div>

<p>Our newly generated <code class="language-plaintext highlighter-rouge">.wasm</code> file now lives at the following location:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>file target/wasm32-wasip2/debug/wasm_answer.wasm
<span class="go">target/wasm32-wasip2/debug/wasm_answer.wasm: WebAssembly (wasm) binary module version 0x1000d
</span></code></pre></div></div>

<p>We can also use the <code class="language-plaintext highlighter-rouge">--release</code> option which optimises builds for production:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>cargo component build <span class="nt">--target</span> wasm32-wasip2 <span class="nt">--release</span>
</code></pre></div></div>

<p>If we check the sizes of the <code class="language-plaintext highlighter-rouge">debug</code> and <code class="language-plaintext highlighter-rouge">release</code>, we see a difference of <code class="language-plaintext highlighter-rouge">2.1M</code> and <code class="language-plaintext highlighter-rouge">16K</code>, respectively.</p>

<p>Debug:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="nb">du</span> <span class="nt">-mh</span> target/wasm32-wasip2/debug/wasm_answer.wasm
<span class="go">2.1M	target/wasm32-wasip2/debug/wasm_answer.wasm
</span></code></pre></div></div>

<p>Release:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="nb">du</span> <span class="nt">-mh</span> target/wasm32-wasip2/release/wasm_answer.wasm
<span class="go">16K	target/wasm32-wasip2/release/wasm_answer.wasm
</span></code></pre></div></div>

<h2 id="how-invoke-works-a-practical-example">How Invoke Works: A Practical Example</h2>

<p>The <code class="language-plaintext highlighter-rouge">wasmtime run</code> command can take one positional argument and just run a <code class="language-plaintext highlighter-rouge">.wasm</code> or <code class="language-plaintext highlighter-rouge">.wat</code> file:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run foo.wasm
<span class="gp">$</span><span class="w"> </span>wasmtime run foo.wat
</code></pre></div></div>

<h3 id="invoke-wasm-modules">Invoke: Wasm Modules</h3>

<p>In the case of a Wasm <strong>module</strong> that exports a raw function directly, the <code class="language-plaintext highlighter-rouge">run</code> command accepts an optional <code class="language-plaintext highlighter-rouge">--invoke</code> argument, which is the name of an exported raw function (of the <strong>module</strong>) to run:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> initialize foo.wasm
</code></pre></div></div>

<h3 id="invoke-wasm-components">Invoke: Wasm Components</h3>

<p>In the case of a Wasm <strong>component</strong> that uses typed interfaces (defined <a href="https://component-model.bytecodealliance.org/design/wit.html">in WIT</a>, in concert with <a href="https://component-model.bytecodealliance.org/design/components.html">the Component Model</a>), the <code class="language-plaintext highlighter-rouge">run</code> command now also accepts the optional <code class="language-plaintext highlighter-rouge">--invoke</code> argument for calling an exported function of a <strong>component</strong>.</p>

<p>However, the calling of an exported function of a <strong>component</strong> uses <a href="https://github.com/bytecodealliance/wasm-tools/tree/a56e8d3d2a0b754e0465c668f8e4b68bad97590f/crates/wasm-wave#readme">WAVE</a>(a human-oriented text encoding of Wasm Component Model values). For example:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'initialize()'</span> foo.wasm
</code></pre></div></div>

<blockquote>
  <p>You will notice the different syntax of <code class="language-plaintext highlighter-rouge">initialize</code> versus <code class="language-plaintext highlighter-rouge">'initialize()'</code> when referring to a <strong>module</strong> versus a <strong>component</strong>, respectively.</p>
</blockquote>

<p>Back to our <code class="language-plaintext highlighter-rouge">get-answer()</code> example:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'get-answer()'</span> target/wasm32-wasip2/debug/wasm_answer.wasm
<span class="go">42
</span></code></pre></div></div>

<p>You will notice that the above <code class="language-plaintext highlighter-rouge">get-answer()</code> function call does not pass in any arguments. Let’s discuss how to represent the arguments passed into function calls in a structured way (using WAVE).</p>

<h4 id="wasm-value-encodingwave">Wasm Value Encoding (WAVE)</h4>

<p>Transferring and invoking complex argument data via the command line is challenging, especially with Wasm components that use diverse value types. To simplify this, Wasm Value Encoding (<a href="https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasm-wave/README.md">WAVE</a>) was introduced; offering a concise way to represent structured values directly in the CLI.</p>

<p>WAVE provides a standard way to encode function calls and/or results. WAVE is a human-oriented text encoding of Wasm Component Model values; designed to be consistent with the <a href="https://github.com/WebAssembly/component-model/blob/main/design/mvp/WIT.md">WIT IDL format</a>.</p>

<p>Below are a few additional pointers for constructing your <code class="language-plaintext highlighter-rouge">wasmtime run --invoke</code> commands using WAVE.</p>

<h4 id="quotes">Quotes</h4>

<p>As shown above, the component’s exported function name and mandatory parentheses are contained in one set of single quotes, i.e., <code class="language-plaintext highlighter-rouge">'get-answer()'</code>:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'get-answer()'</span> target/wasm32-wasip2/release/wasm_answer.wasm
</code></pre></div></div>

<p>The result from our correctly typed command above is as follows:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="go">42
</span></code></pre></div></div>

<h4 id="parentheses">Parentheses</h4>

<p>Parentheses after the exported function’s name are mandatory. The presence of the parenthesis <code class="language-plaintext highlighter-rouge">()</code> signifies function invocation, as opposed to the function name just being referenced. If your function takes a string argument, ensure that you contain your string in double quotes (inside the parentheses). For example:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'initialize("hello")'</span> foo.wasm
</code></pre></div></div>

<p>If your exported function takes more than one argument, ensure that each argument is separated using a single comma <code class="language-plaintext highlighter-rouge">,</code> as shown below:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'initialize("Pi", 3.14)'</span> foo.wasm
<span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'add(1, 2)'</span> foo.wasm
</code></pre></div></div>

<h2 id="recap-wasm-modules-versus-wasm-components">Recap: Wasm Modules versus Wasm Components</h2>

<p>Let’s wrap this article up with a recap to crystallize your knowledge.</p>

<h3 id="earlier-wasmtime-run-support-for-modules">Earlier Wasmtime Run Support for Modules</h3>

<p>If we are not using the Component Model and just creating a module, we use a simple command like <code class="language-plaintext highlighter-rouge">wasmtime run foo.wasm</code> (<strong>without</strong> WAVE syntax). This approach typically applies to modules, which export a <code class="language-plaintext highlighter-rouge">_start</code> function, or reactor modules, which can optionally export the <code class="language-plaintext highlighter-rouge">wasi:cli/run</code> interface—standardized to enable consistent execution semantics.</p>

<p>Example of running a Wasm <strong>module</strong> that exports a raw function directly:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> initialize foo.wasm
</code></pre></div></div>

<h3 id="wasmtime-run-support-for-components">Wasmtime Run Support for Components</h3>

<p>As Wasm evolves with the Component Model, developers gain fine-grained control over component execution and composition. Components using WIT can now be run with <code class="language-plaintext highlighter-rouge">wasmtime run</code>, using the optional <code class="language-plaintext highlighter-rouge">--invoke</code> argument to call exported functions (<strong>with</strong> <a href="https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wasm-wave">WAVE</a>).</p>

<p>Example of running a Wasm <strong>component</strong> that exports a function:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wasmtime run <span class="nt">--invoke</span> <span class="s1">'add(1, 2)'</span> foo.wasm
</code></pre></div></div>

<p>For more information, visit the <a href="https://docs.wasmtime.dev/cli-options.html#run">cli-options section</a> of the Wasmtime documentation.</p>

<h2 id="benefits-and-usefulness">Benefits and Usefulness</h2>

<p>The addition of support for the run <code class="language-plaintext highlighter-rouge">--invoke</code> feature <a href="https://github.com/bytecodealliance/wasmtime/pull/10054">for components</a> allows users to specify and execute exported functions from a Wasm component. This enables greater flexibility for testing, debugging, and integration. We now have the ability to perform the execution of arbitrary exported functions directly from the command line, this feature opens up a world of possibilities for integrating Wasm into modern development pipelines.</p>

<p><strong>This evolution from monolithic Wasm modules to composable, CLI-friendly components exemplifies the versatility and power of Wasm in real-world scenarios.</strong></p>]]></content><author><name>Tim McCallum</name></author><summary type="html"><![CDATA[Wasmtime’s 33.0.0 release supports invoking Wasm component exports directly from the command line with the new --invoke flag. This article walks through building a Wasm component in Rust and using wasmtime run --invoke to execute specific functions (enabling powerful workflows for scripting, testing, and integrating Wasm into modern development pipelines).]]></summary></entry><entry><title type="html">Wasmtime Becomes the First Bytecode Alliance Core Project</title><link href="https://bytecodealliance.org/articles/wasmtime-core-project" rel="alternate" type="text/html" title="Wasmtime Becomes the First Bytecode Alliance Core Project" /><published>2025-04-30T00:00:00+00:00</published><updated>2025-04-30T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/wasmtime-core-project</id><content type="html" xml:base="https://bytecodealliance.org/articles/wasmtime-core-project"><![CDATA[<p>The Bytecode Alliance is very happy to announce a significant milestone for both <a href="https://wasmtime.dev/">Wasmtime</a> and the Bytecode Alliance: Wasmtime has officially been promoted to become the BA’s first Core Project. As someone deeply involved in Wasmtime and the proposal process, I’m incredibly excited to share this news and what it signifies.</p>

<h2 id="defining-core-projects">Defining Core Projects</h2>

<p>Within the Bytecode Alliance, we’ve established two tiers for the projects under our umbrella: Hosted and Core. While all <a href="https://bytecodealliance.org/projects">projects in the BA</a>, Hosted and Core alike, are required to drive forward and align with our <a href="https://github.com/bytecodealliance/governance/blob/main/mission.md">mission</a> and <a href="https://github.com/bytecodealliance/governance/blob/main/operational-principles.md">operational principles</a>, Core Projects represent the flagships of the Alliance.</p>

<p>This distinction isn’t merely symbolic. Core Projects are held to even more rigorous standards concerning governance maturity, security practices, community health, and strategic alignment with the BA’s goals. You can find the detailed criteria in our <a href="https://github.com/bytecodealliance/governance/blob/main/TSC/core-and-hosted-projects.md">Core and Hosted Project Requirements</a>. In return for meeting these heightened expectations, Core Projects gain direct representation on the Bytecode Alliance Technical Steering Committee (TSC), playing a crucial role in guiding the technical evolution of the Alliance. Establishing this tier, and having Wasmtime be the first project to meet its requirements, is a vital step in maturing the BA’s governance structure.</p>

<h2 id="wasmtime-a-natural-fit-as-the-inaugural-core-project">Wasmtime: A Natural Fit as the Inaugural Core Project</h2>

<p>Wasmtime is a fast, scaleable, highly secure, and embeddable WebAssembly runtime in wide use across many different environments.</p>

<p>From its inception, Wasmtime was designed to embody the core tenets of the Bytecode Alliance. Its focus on providing a fast, secure, and standards-compliant WebAssembly runtime aligns directly with the BA’s mission to create state-of-the-art foundations emphasizing security, efficiency, and modularity.</p>

<p>Wasmtime has been instrumental in turning the <a href="https://component-model.bytecodealliance.org/">Component Model</a> vision of fine-grained sandboxing and capabilities-based security – what we initially called “nanoprocesses” – into a practical reality. It has consistently served as a proving ground for cutting-edge standards work, particularly the Component Model and WASI, driving innovation while maintaining strict standards compliance. <a href="https://bytecodealliance.org/articles/security-and-correctness-in-wasmtime">Our commitment</a> to robust security practices, including extensive fuzzing and a rigorous security response process, is non-negotiable.</p>

<p>The journey to Core Project status involved formally documenting how Wasmtime meets these stringent requirements. You can find this documentation in our <a href="https://github.com/bytecodealliance/governance/blob/main/projects/core/wasmtime/proposal-core.md">proposal for Core Project status</a>, which provides evidence for the Wasmtime project’s mature governance, security posture, CI/CD processes, community health, and widespread production adoption. Based on this evidence and the TSC’s strong recommendation, the Board of Directors unanimously agreed that Wasmtime not only fulfills the criteria but is strategically vital to the Alliance’s success, making it the ideal candidate to become the first Core Project.</p>

<h2 id="re-joining-the-tsc">Re-Joining the TSC</h2>

<p>After the Core Project promotion, the <a href="https://github.com/orgs/bytecodealliance/teams/wasmtime-core">Wasmtime core team</a> has <a href="https://bytecodealliance.zulipchat.com/#narrow/channel/217126-wasmtime/topic/ANN.3A.20The.20Wasmtime.20core.20team.20has.20chosen.20a.20TSC.20representative!/with/505661632">appointed me</a> to represent the project on the TSC, so I re-joined the TSC in this new role.</p>

<h2 id="more-information">More Information</h2>

<p>You can find more information about Wasmtime in a number of places:</p>
<ul>
  <li><a href="https://github.com/bytecodealliance/wasmtime">GitHub Repository</a></li>
  <li><a href="https://wasmtime.dev/">Homepage</a></li>
  <li><a href="https://docs.wasmtime.dev/">Documentation</a></li>
</ul>

<p>And you can join the conversation in the Bytecode Alliance community’s <a href="https://bytecodealliance.zulipchat.com">chat platform</a>, which has a <a href="https://bytecodealliance.zulipchat.com/#narrow/channel/217126-wasmtime">dedicated channel for Wasmtime</a>.</p>]]></content><author><name>Till Schneidereit</name></author><summary type="html"><![CDATA[The Bytecode Alliance is very happy to announce a significant milestone for both Wasmtime and the Bytecode Alliance: Wasmtime has officially been promoted to become the BA’s first Core Project. As someone deeply involved in Wasmtime and the proposal process, I’m incredibly excited to share this news and what it signifies.]]></summary></entry><entry><title type="html">Wasmtime LTS Releases</title><link href="https://bytecodealliance.org/articles/wasmtime-lts" rel="alternate" type="text/html" title="Wasmtime LTS Releases" /><published>2025-04-22T00:00:00+00:00</published><updated>2025-04-22T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/wasmtime-lts</id><content type="html" xml:base="https://bytecodealliance.org/articles/wasmtime-lts"><![CDATA[<p><a href="https://wasmtime.dev/">Wasmtime</a> is a lightweight WebAssembly runtime built for
speed, security, and standards-compliance. Wasmtime now supports
long-term-support (LTS) releases that are maintained with security fixes for 2
years after their initial release.</p>

<!--end_excerpt-->

<p>The Wasmtime project releases a new version once a month with new features, bug
fixes, and performance improvements. Previously though these releases were only
supported for 2 months meaning that embedders needed to follow the Wasmtime
project pretty closely to receive security updates. This rate of change can be
too fast for users so Wasmtime now supports <a href="https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasmtime-lts.md">LTS
releases</a>.</p>

<p>Every 12th version of Wasmtime will now be considered an LTS release and will
receive security fixes for 2 years, or 24 months. This means that users can now
update Wasmtime once-a-year instead of once-a-month and be guaranteed that they
will always receive security updates. Wasmtime’s 24.0.0 release has been
retroactively classified as a LTS release and will be supported until August 20,</p>
<ol>
  <li>Wasmtime’s upcoming 36.0.0 release on August 20, 2025 will be supported
until August 20, 2027, meaning that users will have one year starting in August
to upgrade from 24.0.0 to 36.0.0.</li>
</ol>

<p>You can view a table of Wasmtime’s releases <a href="https://docs.wasmtime.dev/stability-release.html">in the documentation
book</a> which has information on
all currently supported releases, upcoming releases, and information about
previously supported releases. The high-level summary of Wasmtime’s LTS release
channel is:</p>

<ul>
  <li>LTS releases receive patch updates for 2 years after their initial release.</li>
  <li>Patch releases are guaranteed to preserve API compatibility.</li>
  <li>Patch releases strive to maintain tooling compatibility (e.g. the Rust version
required to compile Wasmtime) from the time of release. Depending on EOL dates
from components such as GitHub Actions images, however, this may need minor
updates.</li>
  <li>Patch releases are guaranteed to be issued for any security bug found in
historical releases of Wasmtime.</li>
  <li>Patch releases may be issued to fix non-security released bugs as they are
discovered. The Wasmtime project will rely on contributions to provide
backports for these fixes.</li>
  <li>Patch releases will not be issued for new features to Wasmtime, even if a
contribution is made to backport a new feature.</li>
</ul>

<p>If you’re a current user of Wasmtime and would like to use an LTS release then
it’s recommended to either downgrade to the 24.0.0 version or wait for this
August to upgrade to the 36.0.0 version. Wasmtime 34.0.0, to be released June
20, 2025, will be supported up until the release of Wasmtime 36.0.0 on August
20, 2025.</p>]]></content><author><name>Alex Crichton</name></author><summary type="html"><![CDATA[Wasmtime is a lightweight WebAssembly runtime built for speed, security, and standards-compliance. Wasmtime now supports long-term-support (LTS) releases that are maintained with security fixes for 2 years after their initial release.]]></summary></entry><entry><title type="html">WAMR 2024: A Year in Review</title><link href="https://bytecodealliance.org/articles/wamr-2024-summary" rel="alternate" type="text/html" title="WAMR 2024: A Year in Review" /><published>2025-02-19T00:00:00+00:00</published><updated>2025-02-19T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/wamr-2024-summary</id><content type="html" xml:base="https://bytecodealliance.org/articles/wamr-2024-summary"><![CDATA[<p>In 2024, the WAMR community saw many thrilling advancements, including the development of new features, increased industrial use, and an improved experience for developers. Passionate developers and industry professionals have come together to enhance and expand WAMR in ways we couldn’t have imagined. From exciting new tools to a growing community, there’s a lot to be proud of. Let’s take a closer look at the key highlights of WAMR 2024, showcasing the community’s efforts, new features, and the establishment of the Embedded Special Interest Group (ESIG).
<!--end_excerpt--></p>

<h2 id="community-contributions-a-year-of-growth">Community Contributions: A Year of Growth</h2>

<p>The WAMR community has shown <a href="https://next.ossinsight.io/analyze/bytecodealliance?period=past_12_months&amp;repoIds=184654298#overview">incredible dedication and enthusiasm throughout 2024</a>. Here are some impressive numbers that highlight the community’s contributions:</p>

<ul>
  <li>707 New PRs: The community has been actively involved in enhancing WAMR, with 707 new PRs submitted this year.</li>
  <li>292 New Issues: Developers have identified and reported 292 new issues, helping to improve the stability and performance of WAMR.</li>
  <li>861 New Stars on GitHub: The project gained 861 new stars, reflecting its growing popularity and recognition.</li>
  <li>236 Active Participants: With 236 active participants, the community has been vibrant and engaged, driving WAMR forward with their collective efforts.</li>
</ul>

<p>Breaking down the contributions further:</p>

<ul>
  <li>Intel and Others: Half of the PRs, 43.85%, were created by Intel, while the remaining 56.15% were created by the community, including independent contributors and customers integrating WAMR into their products.</li>
  <li>Community Contributions: The major driving force within the community is company contributors, who provided approximately 85% of the PRs among those created by non-Intel contributors.</li>
</ul>

<p>The top three non-Intel organized contributors have made significant impacts:</p>

<ul>
  <li>Midokura: Contributed 33.33% of organized PRs and helped review 35.01% of PRs.</li>
  <li>Amazon: Contributed 14.33% of organized PRs and helped review 19.90% of PRs.</li>
  <li>Xiaomi: Contributed 12.40% of organized PRs and helped review 15.11% of PRs.</li>
</ul>

<p>These contributions have been instrumental in driving WAMR forward, and we extend our heartfelt thanks to everyone involved.</p>

<h2 id="new-features-in-wamr-2024">New Features in WAMR 2024</h2>

<p>Several exciting new features have been added to WAMR in 2024, aimed at enhancing the development experience and expanding the capabilities of WAMR. Here are some of the key features:</p>

<h3 id="development-tools-simplifying-wasm-development">Development Tools: Simplifying Wasm Development</h3>

<p>One of the most exciting additions to WAMR in 2024 is the introduction of new development tools aimed at simplifying Wasm development. These tools include:</p>

<ul>
  <li>Linux perf for Wasm Functions: This tool allows developers to profile Wasm functions directly, providing insights into performance bottlenecks.</li>
  <li>AOT Debugging: Ahead-of-time (AOT) debugging support has been added, making it easier to debug Wasm applications.</li>
  <li>Call Stack Dumps: Enhanced call stack dumps provide detailed information about the execution flow, aiding in troubleshooting and optimization.</li>
</ul>

<p>Before these tools, developing a Wasm application or plugin using a host language was a complex task. Mapping Wasm functions back to the source code written in the host language required deep knowledge and was often cumbersome. Debugging information from the runtime and the host language felt like two foreign languages trying to communicate without a translator. These new development tools act as that much-needed translator, bridging the gap and making Wasm development more accessible and efficient.</p>

<h3 id="shared-heap-efficient-memory-sharing">Shared Heap: Efficient Memory Sharing</h3>

<p>Another significant feature introduced in 2024 is the shared heap. This feature addresses the challenge of sharing memory between the host and Wasm. Traditionally, copying data at the host-Wasm border was inefficient, and existing solutions like externref lacked flexibility and toolchain support.</p>

<p>The shared heap approach uses a pre-allocated region of linear memory as a “swap” area. Both the embedded system and Wasm can store and access shared objects here without the need for copying. However, this feature comes with its own set of challenges. Unlike memory.grow(), the new memory region isn’t controlled by Wasm and may not even be aware of it. This requires runtime APIs to map the embedded-provided memory area into linear memory, making it a runtime-level solution rather than a Wasm opcode.</p>

<p>It’s important to note that the shared heap is an experimental feature, and the intent is to work towards a standardized approach within the WebAssembly Community Group (CG). This will help set expectations for early adopters and ensure alignment with the broader Wasm ecosystem. As the feature evolves, feedback from the community will be crucial in shaping its development and eventual standardization.</p>

<h3 id="newly-implemented-features">Newly Implemented Features</h3>

<p>Several features have been finalized in 2024, further enhancing WAMR’s capabilities:</p>

<ul>
  <li>GC: Garbage collection features for the interpreter, LLVM-JIT, and AOT have been finalized.</li>
  <li>Legacy Exception Handling: Legacy exception handling for the interpreter has been added.</li>
  <li>WASI-NN: Support for WASI-NN with OpenVINO and llama.cpp backends has been introduced.</li>
  <li>WASI Preview1 Support: Ongoing support for WASI on ESP-IDF and Zephyr.</li>
  <li>Memory64: Table64 support for the interpreter and AOT has been finalized.</li>
</ul>

<p>These new features and improvements are designed to make WAMR more powerful and easier to use, catering to the needs of developers and industry professionals alike.</p>

<h2 id="active-engagement-in-embedded-special-interest-group-esig">Active engagement in Embedded Special Interest Group (ESIG)</h2>

<p>In the embedding industry, the perspective on Wasm differs slightly from the cloud-centric view that the current Wasm Community Group (CG) often focuses on. To address these unique requirements, the Embedded Special Interest Group (ESIG) was established in 2024. This group aims to discover solutions that prioritize performance, footprint and stability, tailored specifically for embedding devices.</p>

<p>The ESIG has already achieved several accomplishments this year, thanks to the shared understanding and collaboration with customers. By focusing on the unique needs of the embedding industry, ESIG is paving the way for more specialized and efficient Wasm solutions.</p>

<h2 id="industrial-adoption">Industrial adoption</h2>

<p>The adoption of WAMR in the industry has been remarkable, with several key players integrating WAMR into their systems to leverage its performance and flexibility. Here are some notable examples:</p>

<p>Alibaba’s Microservice Engine (MSE) has adopted WAMR as a Wasm runtime to execute Wasm plugins in their gateways Higress. This integration has resulted in an <a href="https://www.alibabacloud.com/blog/higresss-new-wasm-runtime-greatly-improves-performance_601025">impressive ~50% performance improvement</a>, showcasing the efficiency and robustness of WAMR in real-world applications.</p>

<p>WAMR has also been integrated into Runwasi as one of the Wasm runtimes to execute Wasm in containerd. This integration allows for seamless execution of Wasm modules within containerized environments, providing a versatile and efficient solution for running Wasm applications.</p>

<p>For more information on industrial adoptions and other use cases, please refer to <a href="https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/ADOPTERS.md">this link</a>.</p>

<p>These examples highlight the growing trust and reliance on WAMR in various industrial applications, demonstrating its capability to deliver significant performance enhancements and operational efficiencies.</p>

<h2 id="conclusion">Conclusion</h2>

<p>2024 has been a transformative year for WAMR, marked by significant community contributions, innovative features, and the establishment of the ESIG. As we look ahead, we are excited about the continued growth and evolution of WAMR, driven by the passion and dedication of our community. We invite you to join us on this journey, explore the new features, and contribute to the future of WebAssembly Micro Runtime.</p>

<p>Thank you for being a part of the WAMR community. Here’s to an even more exciting 2025!</p>]]></content><author><name>Liang He</name></author><summary type="html"><![CDATA[In 2024, the WAMR community saw many thrilling advancements, including the development of new features, increased industrial use, and an improved experience for developers. Passionate developers and industry professionals have come together to enhance and expand WAMR in ways we couldn’t have imagined. From exciting new tools to a growing community, there’s a lot to be proud of. Let’s take a closer look at the key highlights of WAMR 2024, showcasing the community’s efforts, new features, and the establishment of the Embedded Special Interest Group (ESIG).]]></summary></entry><entry><title type="html">Bytecode Alliance Election Results</title><link href="https://bytecodealliance.org/articles/election-results" rel="alternate" type="text/html" title="Bytecode Alliance Election Results" /><published>2025-01-14T00:00:00+00:00</published><updated>2025-01-14T00:00:00+00:00</updated><id>https://bytecodealliance.org/articles/election-results</id><content type="html" xml:base="https://bytecodealliance.org/articles/election-results"><![CDATA[<p>Each December the Bytecode Alliance conducts elections to fill important roles on our governing Board and Technical Steering Committee (TSC). I’m pleased to announce the results of our just-held December 2024 election, in which our Recognized Contributors (RCs) selected three Elected Delegates to the TSC and one At-Large Director to represent them on the Alliance Board.</p>

<!--end_excerpt-->

<h3 id="tsc-elected-delegates">TSC Elected Delegates</h3>

<p>The Bytecode Alliance Technical Steering Committee acts as the top-level governing body for projects and Special Interest Groups hosted by the Alliance, ensuring they further the Alliance’s mission and are conducted in accordance with our values and principles. The TSC also oversees the Bytecode Alliance Recognized Contributor program to encourage and engage individual contributors as participants in Alliance projects and groups.  As defined in its <a href="https://github.com/bytecodealliance/governance/blob/main/TSC/charter.md#composition">charter</a> the TSC is composed of representatives from each Alliance Core Project and individuals selected by Recogized Contributors.</p>

<p>Our new TSC Elected Delegates (and their GitHub IDs, as we know each other in our RC community) are:</p>

<ul>
  <li>Andrew Brown (@abrown)</li>
  <li>Bailey Hayes (@ricochet)</li>
  <li>Oscar Spencer (@ospencer)</li>
</ul>

<p>They will each serve a two-year term on the TSC.</p>

<h3 id="at-large-director">At-Large Director</h3>

<p>Our RCs are also represented by two At-Large Directors they select to serve on our Board (as described in our organization <a href="https://bytecodealliance.org/assets/bylaws.pdf">bylaws</a>), with overlapping two-year terms staggered to start each January. In this most recent election The Recognized Contributors chose Bailey Hayes (@ricochet) as At-Large Director.</p>

<h3 id="congratulations">Congratulations!</h3>

<p>I look forward to working with each of our electees, and am happy to introduce them here as part of bringing them onboard in their new roles. You’ll find our full Board and TSC listed on the <a href="https://bytecodealliance.org/about">About</a> page of our website.</p>

<p>Thank you to all our Recognized Contributors for taking part in the election process and in general for their ongoing support of Alliance projects and communities. I’d also like to thank our outgoing leadership for their outstanding work - Nick Fitzgerald (@fitzgen) as TSC Chair and Elected Delegate, and Till Schneidereit (@tschneidereit) as Elected Delegate and At-Large Director.</p>]]></content><author><name>David Bryant</name></author><summary type="html"><![CDATA[Each December the Bytecode Alliance conducts elections to fill important roles on our governing Board and Technical Steering Committee (TSC). I’m pleased to announce the results of our just-held December 2024 election, in which our Recognized Contributors (RCs) selected three Elected Delegates to the TSC and one At-Large Director to represent them on the Alliance Board.]]></summary></entry></feed>