<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Simon Marlow</title>
    <link href="https://simonmar.github.io/atom.xml" rel="self" />
    <link href="https://simonmar.github.io" />
    <id>https://simonmar.github.io/atom.xml</id>
    <author>
        <name>Simon Marlow</name>
        
        <email>marlowsd@gmail.com</email>
        
    </author>
    <updated>2025-06-11T00:00:00Z</updated>
    <entry>
    <title>Browsing Stackage with VS Code and Glean</title>
    <link href="https://simonmar.github.io/posts/2025-06-11-Glean-stackage-vscode.html" />
    <id>https://simonmar.github.io/posts/2025-06-11-Glean-stackage-vscode.html</id>
    <published>2025-06-11T00:00:00Z</published>
    <updated>2025-06-11T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Browsing Stackage with VS Code and Glean</h1>
  <span class="post-date">June 11, 2025</span>
  <p>Have you ever wished you could browse all the Haskell packages
together in your IDE, with full navigation using go-to-definition
and find-references? Here’s a demo of something I hacked together
while at <a href="https://zfoh.ch/zurihac2025/">ZuriHac 2025</a> over the weekend:</p>
<p><video src="/images/vscode-stackage-glean.webm" controls=""><a href="/images/vscode-stackage-glean.webm">Video</a></video></p>
<p>In the <a href="2025-05-22-Glean-Haskell.html">previous post</a> I talked about
how to index all of Hackage (actually Stackage, strictly speaking,
because it’s not in general possible to build all of Hackage together)
using <a href="https://glean.software">Glean</a>. Since that post I made some
more progress on the indexer:</p>
<ul>
<li><p>The indexer now <a href="https://github.com/facebookincubator/Glean/pull/532">indexes
types</a>. You can
see type-on-hover working in the demo. The types are similar to what
you see in the Haddock-generated hyperlinked source, except that
here it’s always using the type of the definition and not the type
at the usage site, which might be more specific. That’s a TODO for
later.</p></li>
<li><p>Fixed a bunch of things, enriched the index with details about
constructors, fields and class methods, and made indexing more
efficient.</p></li>
</ul>
<p>The DB size including types is now about <strong>850MB</strong>, and it takes
<strong>just under 8 minutes</strong> on my 9-year-old laptop to index the nearly
3000 packages in my stackage LTS 21.21 snapshot. (Note: the figures
here were updated on 12-06-2025 when I redid the measurments).</p>
<h2 id="hooking-it-up-to-vs-code">Hooking it up to VS Code</h2>
<p>The architecture looks like this:</p>
<p><img src="/images/vscode-glean-arch.svg" /></p>
<p>The LSP server is a modified version of
<a href="https://github.com/josephsumabat/static-ls">static-ls</a>, which is
already designed to provide an LSP service based on static
information. I just reimplemented a few of its handlers to make calls
to Glass instead of the existing hie/hiedb implementations. You can
see the changes on <a href="https://github.com/simonmar/static-ls/commits/glean/">my fork of
static-ls</a>. Of
course, these changes are still quite hacky and not suitable for
upstreaming.</p>
<p><a href="https://github.com/facebookincubator/Glean/tree/main/glean/glass">Glass</a>
is a “Language-agnostic Symbol Server”. Essentially it provides an API
abstraction over Glean with operations that are useful for code
navigation and search.</p>
<h2 id="where-to-next">Where to next?</h2>
<p>There remain a few issues to solve before this can be useful.</p>
<ul>
<li><p><strong>Make Glean more easily installable.</strong> There’s a general concensus that
<code>cabal install glean</code> would lower the barrier to entry
significantly; in order to do this we need to build the folly
dependency using Cabal.</p></li>
<li><p><strong>Clean up and ship the LSP server, somehow.</strong> Once Glean is
cabal-installable, we can depend on it from an LSP server package.</p></li>
<li><p><strong>Think about continuous integration to build the Glean
DB</strong>. Perhaps this can piggyback off the stackage CI infra? If we
can already build a complete stackage snapshot, and Glean is
easily installable, then indexing would be fairly
straightforward. I’d love to hear suggestions on how best to do
this.</p></li>
</ul>
<p>And looking forwards a bit further:</p>
<ul>
<li><p><strong>Think about how to handle multiple packages versions.</strong> There’s no
fundamental problem with indexing multiple package versions, except
that Glass’s SymbolID format currently doesn’t include the package
version but that’s easily fixable. We could for example build
multiple stackage LTS instances and index them all in a single Glean
DB. There would be advantages to doing this, if for instance there
were packages in common between two Stackage instances then the
Glean DB would only contain a single copy. A lot of the type
structure would be shared too.</p></li>
<li><p><strong>Provide search functionality in the LSP.</strong> Glean can provide
simple textual search for names, and with some work could also
provide Hoogle-like type search.</p></li>
<li><p><strong>Think about how to index local projects and local changes</strong>. Glean
supports <em>stacked</em> and
<a href="https://glean.software/blog/incremental/"><em>incremental</em></a> DBs, so we
could build a DB for a local project stacked on top of the full
Stackage DB. You would be able to go-to-definition directly from
a file in your project to the packages it depends on in
Stackage. We could re-index new <code>.hie</code> files as they are
generated, rather like how static-ls currently handles changes.</p></li>
<li><p><strong>Integrate with HLS?</strong> Perhaps Glean could be used to handle
references outside of the current project, switching seamlessly
from GHC-based navigation to Glean-based navigation if you jump
into a non-local package.</p></li>
</ul>
<h2 id="more-use-cases">More use cases?</h2>
<p>I talked with a few people at ZuriHac about potential use cases for
Glean within the Haskell ecosystem. Using it in <code>haskell.org</code> came up
a few times, as a way to power search, navigation and analysis. Also
mentioned was the possibility of using it as a Hoogle
backend. Potentially we could replace the Haddock-generated
hyperlinked sources on <code>haskell.org</code> with a Glean-based browser, which
would allow navigating links between packages and find-references.</p>
<p>Another use cases that came up was the possibility of doing impact
analysis for core library changes (or any API changes really). Some of
this is already possible using find-references, but more complex cases
such as finding instances that override certain methods aren’t
possible yet until we extend the indexer to capture richer
information.</p>
<p>If you’re interested in using Glean for something, why not jump on the
<a href="https://discord.com/channels/280033776820813825/505370075402862594/808027763868827659">Glean discord server</a> and tell us about it!</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Indexing Hackage: Glean vs. hiedb</title>
    <link href="https://simonmar.github.io/posts/2025-05-22-Glean-Haskell.html" />
    <id>https://simonmar.github.io/posts/2025-05-22-Glean-Haskell.html</id>
    <published>2025-05-22T00:00:00Z</published>
    <updated>2025-05-22T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Indexing Hackage: Glean vs. hiedb</h1>
  <span class="post-date">May 22, 2025</span>
  <p>I thought it might be fun to try to use Glean to index as much of
Hackage as I could, and then do some rough comparisons against <a
href="https://github.com/wz1000/HieDb">hiedb</a> and also play around to see what interesting queries
we could run against a database of all the code in Hackage.</p>
<p>This project was mostly just for fun: Glean is not going to replace
<code>hiedb</code> any time soon, for reasons that will become clear. Neither are
we ready (yet) to build an HLS plugin that can use Glean, but
hopefully this at least demonstrates that such a thing should be
possible, and Glean might offer some advantages over <code>hiedb</code> in
performance and flexibility.</p>
<p>A bit of background:</p>
<ul>
<li><p><a href="https://glean.software">Glean</a> is a code-indexing system
that we developed at Meta. It’s used internally at Meta for a wide
range of use cases, including code browsing, documentation
generation and code analysis. You can read about the ways in which
Glean is used at Meta in <a
href="https://engineering.fb.com/2024/12/19/developer-tools/glean-open-source-code-indexing/">Indexing
Code At Scale with Glean</a>.</p></li>
<li><p><a href="https://github.com/wz1000/HieDb">hiedb</a> is a code-indexing system for Haskell. It takes
the <code>.hie</code> files that GHC produces when given the option
<code>-fwrite-ide-info</code> and writes the information to a SQLite database
in various tables. The idea is that putting the information in a DB
allows certain operations that an IDE needs to do, such as
go-to-definition, to be fast.</p></li>
</ul>
<p>You can think of Glean as a general-purpose system that does the same
job as <code>hiedb</code>, but for multiple languages and with a more flexible
data model. The open-source version of Glean comes with indexers for
<a href="https://glean.software/docs/indexer/intro/">ten languages or
so</a>, and moreover Glean supports <a
href="https://sourcegraph.com/blog/announcing-scip">SCIP</a> which has
indexers for various languages available from SourceGraph.</p>
<p>Since a <code>hiedb</code> is just a SQLite DB with a few tables, if you want you
can query it directly using SQL. However, most users will access the
data through either the command-line <code>hiedb</code> tool or through the API,
which provide the higher-level operations such as go-to-definition and
find-references. Glean has a similar setup: you can make raw queries
using Glean’s query language (<a
href="https://glean.software/docs/angle/intro/">Angle</a>) using the
<a href="https://glean.software/docs/shell/">Glean shell</a> or the <a href="https://glean.software/docs/cli/">command-line tool</a>, while the higher-level
operations that know about symbols and references are provided by a
separate system called <a href="https://github.com/facebookincubator/Glean/tree/main/glean/glass">Glass</a> which also has a command-line tool and
API. In Glean the raw data is language-specific, while the Glass
interface provides a language-agnostic view of the data in a way
that’s useful for tools that need to navigate or search code.</p>
<h2 id="an-ulterior-motive">An ulterior motive</h2>
<p>In part all of this was an excuse to rewrite Glean’s Haskell
indexer. We built a Haskell indexer a while ago but it’s pretty
limited in what information it stores, only capturing enough
information to do go-to-definition and find-references and only for a
subset of identifiers. Furthermore the old indexer works by first
producing a <code>hiedb</code> and consuming that, which is both unnecessary and
limits the information we can collect. By processing the <code>.hie</code> files
directly we have access to richer information, and we don’t have the
intermediate step of creating the <code>hiedb</code> which can be slow.</p>
<h2 id="the-rest-of-this-post">The rest of this post</h2>
<p>The rest of the post is organised as follows, feel free to jump
around:</p>
<ul>
<li><p><a href="#performance">Performance</a>: a few results comparing <code>hiedb</code> with Glean on an
index of all of Hackage</p></li>
<li><p><a href="#what-other-queries-can-we-do-with-glean">Queries</a>: A couple of examples of queries we can do with
a Glean index of Hackage: searching by name, and finding dead code.</p></li>
<li><p><a href="#apparatus">Apparatus</a>: more details on how I set
everything up and how it all works.</p></li>
<li><p><a href="#whats-next">What’s next</a>: some thoughts on what we still need to add to
the indexer.</p></li>
</ul>
<h1 id="performance">Performance</h1>
<p>All of this was perfomed on a build of 2900+ packages from Hackage,
for more details see <a href="#building-all-of-hackage">Building all of Hackage</a>
below.</p>
<h2 id="indexing-performance">Indexing performance</h2>
<p>I used this hiedb command:</p>
<pre><code>hiedb index -D /tmp/hiedb . --skip-types</code></pre>
<p>I’m using <code>--skip-types</code> because at the time of writing I haven’t
implemented type indexing in Glean’s Haskell indexer, so this should
hopefully give a more realistic comparison.</p>
<p>This was the Glean command:</p>
<pre><code>glean --service localhost:1234 \
  index haskell-hie --db stackage/0 \
  --hie-indexer $(cabal list-bin hie-indexer) \
  ~/code/stackage/dist-newstyle/build/x86_64-linux/ghc-9.4.7 \
  --src &#39;$PACKAGE&#39;</code></pre>
<p>Time to index:</p>
<ul>
<li>hiedb: 1021s</li>
<li>Glean: 470s</li>
</ul>
<p>I should note that in the case of Glean the only parallelism is
between the indexer and the server that is writing to the DB. We
didn’t try to index multiple <code>.hie</code> files in parallel, although that
would be fairly trivial to do. I suspect <code>hiedb</code> is also
single-threaded just going by the CPU load during indexing.</p>
<h2 id="size-of-the-resulting-db">Size of the resulting DB</h2>
<ul>
<li>hiedb: 5.2GB</li>
<li>Glean: 0.8GB</li>
</ul>
<p>It’s quite possible that hiedb is simply storing more information, but
Glean does have a rather efficient storage system based on RocksDB.</p>
<h2 id="performance-of-find-references">Performance of find-references</h2>
<p>Let’s look up all the references of <code>Data.Aeson.encode</code>:</p>
<pre><code>hiedb -D /tmp/hiedb name-refs encode Data.Aeson</code></pre>
<p>This is the query using Glass:</p>
<pre><code>cabal run glass-democlient -- --service localhost:12345 \
  references stackage/hs/aeson/Data/Aeson/var/encode</code></pre>
<p>This is the raw query using Glean:</p>
<pre><code>glean --service localhost:1234 --db stackage/0 \
  &#39;{ Refs.file, Refs.uses[..] } where Refs : hs.NameRefs; Refs.target.occ.name = &quot;encode&quot;; Refs.target.mod.name = &quot;Data.Aeson&quot;&#39;</code></pre>
<ul>
<li><code>hiedb</code>: 2.3s</li>
<li><code>glean</code> (via Glass): 0.39s</li>
<li><code>glean</code> (raw query): 0.03s</li>
</ul>
<p>(side note: <code>hiedb</code> found 416 references while Glean found 415. I
haven’t yet checked where this discrepancy comes from.)</p>
<p>But these results don’t really tell the whole story.</p>
<p>In the case of <code>hiedb</code>, <code>name-refs</code> does a full table scan so it’s
going to take time proportional to the number of refs in the DB. Glean
meanwhile has indexed the references by name, so it can serve this
query very efficiently. The actual query takes a few milliseconds, the
main overhead is encoding and decoding the results.</p>
<p>The reason the Glass query takes longer than the raw Glean query is
because Glass also fetches additional information about each
reference, so it performs a lot more queries.</p>
<p>We can also do the raw <code>hiedb</code> query using the sqlite shell:</p>
<pre><code>sqlite&gt; select count(*) from refs where occ = &quot;v:encode&quot; AND mod = &quot;Data.Aeson&quot;;
417
Run Time: real 2.038 user 1.213905 sys 0.823001</code></pre>
<p>Of course <code>hiedb</code> could index the refs table to make this query much
faster, but it’s interesting to note that Glean has already done that
and it was <em>still</em> quicker to index and produced a smaller DB.</p>
<h2 id="performance-of-find-definition">Performance of find-definition</h2>
<p>Let’s find the definition of <code>Data.Aeson.encode</code>, first with <code>hiedb</code>:</p>
<pre><code>$ hiedb -D /tmp/hiedb name-def encode Data.Aeson
Data.Aeson:181:1-181:7</code></pre>
<p>Now with Glass:</p>
<pre><code>$ cabal run glass-democlient -- --service localhost:12345 \
  describe stackage/hs/aeson/Data/Aeson/var/encode
stackage@aeson-2.1.2.1/src/Data/Aeson.hs:181:1-181:47</code></pre>
<p>(worth noting that <code>hiedb</code> is giving the span of the identifier only,
while Glass is giving the span of the whole definition. This is just a
different choice; the <code>.hie</code> file contains both.)</p>
<p>And the raw query using Glean:</p>
<pre><code>$ glean --service localhost:1234 query --db stackage/0 --recursive \
  &#39;{ Loc.file, Loc.span } where Loc : hs.DeclarationLocation; N : hs.Name; N.occ.name = &quot;encode&quot;; N.mod.name = &quot;Data.Aeson&quot;; Loc.name = N&#39; | jq
{
  &quot;id&quot;: 18328391,
  &quot;key&quot;: {
    &quot;tuplefield0&quot;: {
      &quot;id&quot;: 9781189,
      &quot;key&quot;: &quot;aeson-2.1.2.1/src/Data/Aeson.hs&quot;
    },
    &quot;tuplefield1&quot;: {
      &quot;start&quot;: 4136,
      &quot;length&quot;: 46
    }
  }
}</code></pre>
<p>Times:</p>
<ul>
<li>hiedb: 0.18s</li>
<li>Glean (via Glass): 0.05s</li>
<li>Glean (raw query): 0.01s</li>
</ul>
<p>In fact there’s a bit of overhead when using the Glean CLI, we can get a
better picture of the real query time using the shell:</p>
<pre><code>stackage&gt; { Loc.file, Loc.span } where Loc : hs.DeclarationLocation; N : hs.Name; N.occ.name = &quot;encode&quot;; N.mod.name = &quot;Data.Aeson&quot;; Loc.name = N
{
  &quot;id&quot;: 18328391,
  &quot;key&quot;: {
    &quot;tuplefield0&quot;: { &quot;id&quot;: 9781189, &quot;key&quot;: &quot;aeson-2.1.2.1/src/Data/Aeson.hs&quot; },
    &quot;tuplefield1&quot;: { &quot;start&quot;: 4136, &quot;length&quot;: 46 }
  }
}

1 results, 2 facts, 0.89ms, 696176 bytes, 2435 compiled bytes</code></pre>
<p>The query itself takes less than 1ms.</p>
<p>Again, the issue with <code>hiedb</code> is that its data is not indexed in a way
that makes this query efficient: the <code>defs</code> table is indexed by the
pair <code>(hieFile,occ)</code> not <code>occ</code> alone. Interestingly, when the module
is known it ought to be possible to do a more efficient query with
<code>hiedb</code> by first looking up the <code>hieFile</code> and then using that to query
<code>defs</code>.</p>
<h1 id="what-other-queries-can-we-do-with-glean">What other queries can we do with Glean?</h1>
<p>I’ll look at a couple of examples here, but really the possibilities
are endless. We can collect whatever data we like from the <code>.hie</code>
file, and design the schema around whatever efficient queries we want
to support.</p>
<h2 id="search-by-case-insensitive-prefix">Search by case-insensitive prefix</h2>
<p>Let’s search for all identifiers that start with the case-insensitive
prefix <code>"withasync"</code>:</p>
<pre><code>$ glass-democlient --service localhost:12345 \
  search stackage/withasync -i | wc -l
55</code></pre>
<p>In less than 0.1 seconds we find 55 such identifiers in Hackage. (the
output isn’t very readable so I didn’t include it here, but for
example this finds results not just in <code>async</code> but in a bunch of
packages that wrap <code>async</code> too).</p>
<p>Case-insensitive prefix search is supported by an index that Glean
produces when the DB is created. It works in the same way as efficient
find-references, more details on that <a href="#how-does-it-work">below</a>.</p>
<p>Why only prefix and not suffix or infix? What about fuzzy search? We
could certainly provide a suffix search too; infix gets more tricky
and it’s not clear that Glean is the best tool to use for infix or
fuzzy text search: there are better data representations for that kind
of thing. Still, case-insensitive prefix search is a useful thing to
have.</p>
<p>Could we support Hoogle using Glean? Absolutely. That said, Hoogle
doesn’t seem too slow. Also we need to index types in Glean before it
could be used for type search.</p>
<h2 id="identify-dead-code">Identify dead code</h2>
<p>Dead code is, by definition, code that isn’t used anywhere. We have a
handy way to find that: any identifier with no references isn’t
used. But it’s not <em>quite</em> that simple: we want to ignore references
in imports and exports, and from the type signature.</p>
<p>Admittedly finding unreferenced code within Hackage isn’t all that
useful, because the libraries in Hackage are consumed by end-user code
that we haven’t indexed so we can’t see all the references. But you
could index your own project using Glean and use it to find dead
code. In fact, I did that for Glean itself and identified one entire
module that was dead, amongst a handful of other dead things.</p>
<p>Here’s a query to find dead code:</p>
<pre><code>N where
  N = hs.Name _;
  N.sort.external?;
  hs.ModuleSource { mod = N.mod, file = F };
  !(
    hs.NameRefs { target = N, file = RefFile, uses = R };
    RefFile != F;
    coderef = (R[..]).kind
  )</code></pre>
<p>Without going into all the details, here’s roughly how it works:</p>
<ul>
<li><code>N = hs.Name _;</code> declares <code>N</code> to be a fact of <code>hs.Name</code></li>
<li><code>N.sort.external?;</code> requires <code>N</code> to be external (i.e. exported), as
opposed to a local variable</li>
<li><code>hs.ModuleSource { mod = N.mod, file = F };</code> finds the file <code>F</code>
corresponding to this name’s module</li>
<li>The last part is checking to see that there are no references to
this name that are (a) in a different file and (b) are in code,
i.e. not import/export references. Restricting to other files isn’t
<em>exactly</em> what we want, but it’s enough to exclude references from
the type signature. Ideally we would be able to identify those more
precisely (that’s on the TODO list).</li>
</ul>
<p>You can try this on Hackage and it will find a lot of stuff. It might
be useful to focus on particular modules to find things that aren’t
used anywhere, for example I was interested in which identifiers in
<code>Control.Concurrent.Async</code> aren’t used:</p>
<pre><code>N where
  N = hs.Name _;
  N.mod.name = &quot;Control.Concurrent.Async&quot;;
  N.mod.unit = &quot;async-2.2.4-inplace&quot;;
  N.sort.external?;
  hs.ModuleSource { mod = N.mod, file = F };
  !(
    hs.NameRefs { target = N, file = RefFile, uses = R };
    RefFile != F;
    coderef = (R[..]).kind
  )</code></pre>
<p>This finds 21 identifiers, which I can use to decide what to deprecate!</p>
<h1 id="apparatus">Apparatus</h1>
<h2 id="building-all-of-hackage">Building all of Hackage</h2>
<p>The goal was to build as much of Hackage as possible and then to index
it using both <code>hiedb</code> and Glean, and see how they differ.</p>
<p>To avoid problems with dependency resolution, I used a Stackage LTS
snapshot of package versions. Using LTS-21.21 and GHC 9.4.7, I was
able to build 2922 packages. About 50 failed for some reason or other.</p>
<p>I used this <code>cabal.project</code> file:</p>
<pre><code>packages: */*.cabal
import: https://www.stackage.org/lts-21.21/cabal.config

package *
    ghc-options: -fwrite-ide-info

tests: False
benchmarks: False

allow-newer: *</code></pre>
<p>And did a large <code>cabal get</code> to fetch all the packages in LTS-21.21.</p>
<p>Then</p>
<pre><code>cabal build all --keep-going</code></pre>
<p>After a few retries to install any required RPMs to get the dependency
resolution phase to pass, and to delete a few packages that weren’t
going to configure successfully, I went away for a few hours to let
the build complete.</p>
<p>It’s entirely possible there’s a better way to do this that I don’t
know about - please let me know!</p>
<h2 id="building-glean">Building Glean</h2>
<p>The Haskell indexer I’m using is in <a
href="https://github.com/facebookincubator/Glean/pull/522">this pull
request</a> which at the time of writing isn’t merged yet. (Since I’ve
left Meta I’m just a regular open-source contributor and have to wait
for my PRs to be merged just like everyone else!).</p>
<p>Admittedly Glean is not the easiest thing in the world to build,
mainly because it has a couple of troublesome dependencies:
<a href="https://github.com/facebook/folly">folly</a> (Meta’s library of
highly-optimised C++ utilities) and <a href="https://rocksdb.org/">RocksDB</a>.
Glean depends on a very up to date version of these libraries so we
can’t use any distro packaged versions.</p>
<p>Full instructions for building Glean are
<a href="https://glean.software/docs/building/">here</a> but roughly it goes like
this on Linux:</p>
<ul>
<li>Install a bunch of dependencies with <code>apt</code> or <code>yum</code></li>
<li>Build the C++ dependencies with <code>./install-deps.sh</code> and set some env vars</li>
<li><code>make</code></li>
</ul>
<p>The <code>Makefile</code> is needed because there are some codegen steps that
would be awkward to incorporate into the Cabal setup. After the first
<code>make</code> you can usually just switch to <code>cabal</code> for rebuilding stuff
unless you change something (e.g. a schema) that requires re-running
the codegen.</p>
<h2 id="running-glean">Running Glean</h2>
<p>I’ve done everything here with a running Glean server, which was
started like this:</p>
<pre><code>cabal run exe:glean-server -- \
  --db-root /tmp/db \
  --port 1234 \
  --schema glean/schema/source</code></pre>
<p>While it’s possible to run Glean queries directly on the DB without a
server, running a server is the normal way because it avoids the
latency from opening the DB each time, and it keeps an in-memory cache
which significantly speeds up repeated queries.</p>
<p>The examples that use Glass were done using a running Glass server,
started like this:</p>
<pre><code>cabal run glass-server -- --service localhost:1234 --port 12345</code></pre>
<h2 id="how-does-it-work">How does it work?</h2>
<p>The interesting part of the Haskell indexer is the schema in <a
href="https://github.com/facebookincubator/Glean/blob/8f49a6bfe1217657d19287d6d583b13c4a8154f8/glean/schema/source/hs.angle#L83">hs.angle</a>. Every
language that Glean indexes needs a schema, which describes the data
that the indexer will store in the DB. Unlike an SQL schema, a Glean
schema looks more like a set of datatype declarations, and it really
does correspond to a set of (code-generated) types that you can work
with when programmatically writing data, making queries, or inspecting
results. For more about Glean schemas, see <a
href="https://glean.software/docs/schema/basic/">the
documentation</a>.</p>
<p>Being able to design your own schema means that you can design
something that is a close match for the requirements of the language
you’re indexing. In our Glean schema for Haskell, we use a <code>Name</code>,
<code>OccName</code>, and <code>Module</code> structure that’s similar to the one GHC uses
internally and is stored in the <code>.hie</code> files.</p>
<p>The <a href="https://github.com/facebookincubator/Glean/blob/e523edae14657db4038df4f7676b0072baf268ed/glean/lang/haskell/HieIndexer/Index.hs">indexer
itself</a>
just reads the <code>.hie</code> files and produces Glean data using datatypes
that are generated from the schema. For example, here’s a fragment of
the indexer that produces <code>Module</code> facts, which contain a <code>ModuleName</code>
and a <code>UnitName</code>:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">mkModule ::</span> <span class="dt">Glean.NewFact</span> m <span class="ot">=&gt;</span> <span class="dt">GHC.Module</span> <span class="ot">-&gt;</span> m <span class="dt">Hs.Module</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>mkModule <span class="fu">mod</span> <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>  modname <span class="ot">&lt;-</span> Glean.makeFact <span class="op">@</span><span class="dt">Hs.ModuleName</span> <span class="op">$</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>    fsToText (GHC.moduleNameFS (GHC.moduleName <span class="fu">mod</span>))</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>  unitname <span class="ot">&lt;-</span> Glean.makeFact <span class="op">@</span><span class="dt">Hs.UnitName</span> <span class="op">$</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>    fsToText (unitFS (GHC.moduleUnit <span class="fu">mod</span>))</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a>  Glean.makeFact <span class="op">@</span><span class="dt">Hs.Module</span> <span class="op">$</span></span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Hs.Module_key</span> modname unitname</span></code></pre></div>
<p>Also interesting is how we support fast find-references. This is
done using a <a href="https://glean.software/docs/derived/#stored-derived-predicates">stored derived
predicate</a>
in the schema:</p>
<pre><code>predicate NameRefs:
  {
    target: Name,
    file: src.File,
    uses: [src.ByteSpan]
  } stored {Name, File, Uses} where
  FileXRefs {file = File, refs = Refs};
  {name = Name, spans = Uses} = Refs[..];</code></pre>
<p>here <code>NameRefs</code> is a predicate—which you can think of as a datatype,
or a table in SQL—defined in terms of another predicate,
<code>FileXRefs</code>. The facts of the predicate <code>NameRefs</code> (rows of the table)
are derived automatically using this definition when the DB is
created. If you’re familiar with SQL, a stored derived predicate in
Glean is rather like a materialized view in SQL.</p>
<h1 id="whats-next">What’s next?</h1>
<p>As I mentioned earlier, the indexer doesn’t yet index types, so that
would be an obvious next step. There are a handful of weird corner
cases that aren’t handled correctly, particularly around record
selectors, and it would be good to iron those out.</p>
<p>Longer term ideally the Glean data would be rich enough to produce the
Haddock docs. In fact Meta’s internal code browser does produce
documentation on the fly from Glean data for some languages - Hack and
C++ in particular. Doing it for Haskell is a bit tricky because while
I believe the <code>.hie</code> file does contain enough information to do this,
it’s not easy to reconstruct the full ASTs for declarations. Doing it
by running the compiler—perhaps using the Haddock API—would be
an option, but that involves a deeper integration with Cabal so it’s
somewhat more awkward to go that route.</p>
<p>Could HLS use Glean? Perhaps it would be useful to have a full Hackage
index to be able to go-to-definition from library references? As a
plugin this might make sense, but there are a lot of things to fix and
polish before it’s really practical.</p>
<p>Longer term should we be thinking about replacing hiedb with Glean?
Again, we’re some way off from that. The issue of incremental updates
is an interesting one - Glean does support <a href="https://glean.software/docs/implementation/incrementality/">incremental
indexing</a>
but so far it’s been aimed at speeding up whole-repository indexing
rather than supporting IDE features.</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Rethinking Static Reference Tables in GHC</title>
    <link href="https://simonmar.github.io/posts/2018-06-22-New-SRTs.html" />
    <id>https://simonmar.github.io/posts/2018-06-22-New-SRTs.html</id>
    <published>2018-06-22T00:00:00Z</published>
    <updated>2018-06-22T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Rethinking Static Reference Tables in GHC</h1>
  <span class="post-date">June 22, 2018</span>
  <p>It seems rare these days to be able to make an improvement that’s
unambiguously better on every axis. Most changes involve a tradeoff
of some kind. With a compiler, the tradeoff is often between
performance and code size (e.g. specialising code to make it faster
leaves us with more code), or between performance and complexity
(e.g. adding a fancy new optimisation), or between compile-time
performance and runtime performance.</p>
<p>Recently I was lucky enough to be able to finish a project I’ve been
working on intermittently in GHC for several years, and the result was
satisfyingly better on just about every axis.</p>
<ul>
<li><p>Code size: overall binary sizes are reduced by ~5% for large
programs, ~3% for smaller programs.</p></li>
<li><p>Runtime performance: no measurable change on benchmarks, although
some really bad corner cases where the old code performed terribly
should now be gone.</p></li>
<li><p>Complexity: some complex representations were removed from the
runtime, making GC simpler, and the compiler itself also became
simpler.</p></li>
<li><p>Compile-time performance: slightly improved (0.2%).</p></li>
</ul>
<p>To explain what the change is, first we’ll need some background.</p>
<h2 id="garbage-collecting-cafs">Garbage collecting CAFs</h2>
<p>A Constant Applicative Form (CAF) is a top-level thunk. For example:</p>
<pre><code>myMap :: HashMap Text Int
myMap = HashMap.fromList [
  -- lots of data
  ]</code></pre>
<p>Now, <code>myMap</code> is represented in the compiled program by a static
closure that looks like this:</p>
<p><img src="/images/static-closure.png" /></p>
<p>When the program demands the value of <code>myMap</code> for the first time, the
representation will change to this:</p>
<p><img src="/images/evaluated-static-closure.png" /></p>
<p>At this point, we have a reference from the original static closure,
which is part of the compiled program, into the dynamic heap. The
garbage collector needs to know about this reference, because it has
to treat the value of <code>myMap</code> as live data, and ensure that this
reference remains valid.</p>
<p>How could we do that? One way would be to just keep all the CAFs
alive for ever. We could keep a list of them and use the list as a
source of roots in the GC. That would work, but we’d never be able to
garbage-collect any top-level data. Back in the distant past GHC used
to work this way, but it interacted badly with the full-laziness
optimisation which likes to float things out to the top level - we had
to be really careful not to float things out as CAFs because the data
would be retained for ever.</p>
<p>Or, we could track the liveness of CAFs properly, like we do for other
data. But how can we find all the references to <code>myMap</code>? The problem
with top-level closures is that their references appear in <em>code</em>, not
just <em>data</em>. For example, somewhere else in our program we might have</p>
<pre><code>myLookup :: String -&gt; Maybe Int
myLookup name = HashMap.lookup name myMap</code></pre>
<p>and in the compiled code for <code>myLookup</code> will be a reference to
<code>myMap</code>.</p>
<p>To be able to know when we should keep <code>myMap</code> alive, the garbage
collector has to traverse all the references from code as well as
data.</p>
<p>Of course, actually searching through the code for symbols isn’t
practical, so GHC produces an additional data structure for all the
code it compiles, called the Static Reference Table (SRT). The SRT
for <code>myLookup</code> will contain a reference to <code>myMap</code>.</p>
<p>The naive way to do this would be to just have a table of all the
static references for each code block. But it turns out that there’s
quite a lot of opportunities for sharing between SRTs - lots of code
blocks refer to the same things - so it makes sense to try to use a
more optimised representation.</p>
<p>The representation that GHC 8.4 and earlier used was this:</p>
<p><img src="/images/old-srt.png" /></p>
<p>All the static references in a module were collected together into a
single table (<code>ThisModule_srt</code> in the diagram), and every static
closure selects the entries it needs with a combination of a pointer
(<code>srt</code>) into the table and a bitmap (<code>srt_bitmap</code>).</p>
<p>This had a few problems:</p>
<ul>
<li><p>On a 64-bit machine we need at least 96 bits for the SRT in every
static closure and continuation that has at least one static
reference: 64 bits to point to the table and a 32-bit bitmap.</p></li>
<li><p>Sometimes the heuristics in the compiler for generating the table
worked really badly. I observed some cases with particularly large
modules where we generated an SRT containing two entries that were
thousands of entries apart in the table, which required a huge
bitmap.</p></li>
<li><p>There was complex code in the RTS for traversing these bitmaps, and
complex code in the compiler to generate this table that nobody
really understood.</p></li>
</ul>
<h2 id="the-shiny-new-way">The shiny new way</h2>
<p>The basic idea is quite straightforward: instead of the single table
and bitmap representation, each code block that needs an SRT will have
an associated SRT object, like this:</p>
<p><img src="/images/new-srt.png" /></p>
<p>Firstly, this representation is a lot simpler, because an SRT object
has exactly the same representation as a static constructor, so we
need no new code in the GC to handle it. All the code to deal with
bitmaps goes away.</p>
<p>However, just making this representation change by itself will cause a
lot of code growth, because we lose many of the optimisations and
sharing that we were able to do with the table and bitmap
representation.</p>
<p>But the new representation has some great opportunities for
optimisation of its own, and exploiting all these optimisations
results in more compact code than before.</p>
<h3 id="we-never-need-a-singleton-srt">We never need a singleton SRT</h3>
<p>If an SRT has one reference in it, we replace the pointer to the SRT
with the pointer to the reference itself.</p>
<p><img src="/images/singleton-srt.png" /></p>
<h3 id="the-srt-field-for-each-code-block-can-be-32-bits-not-96">The SRT field for each code block can be 32 bits, not 96</h3>
<p>Since we only need a pointer, not a pointer and a bitmap, the overhead
goes down to 64 bits. Furthermore, by exploiting the fact that we can
represent local pointers by 32-bit offsets (on x86_64), the overhead
goes down to 32 bits.</p>
<p><img src="/images/relative-srt-ref.png" /></p>
<h3 id="we-can-common-up-identical-srts">We can common up identical SRTs</h3>
<p>This is an obvious one: if multiple code blocks have the same set of
static references, they can share a single SRT object.</p>
<h3 id="we-can-drop-duplicate-references-from-an-srt">We can drop duplicate references from an SRT</h3>
<p>Sometimes an SRT refers to a closure that is also referred to by
something that is reachable from the same SRT. For example:</p>
<p><img src="/images/new-srt-drop.png" /></p>
<p>In this case we can drop the reference to <code>x</code> in the outer SRT,
because it’s already contained in the inner SRT. That leaves the
outer SRT with a single reference, which means the SRT object itself
can just disappear, by the singleton optimisation mentioned earlier.</p>
<h3 id="for-a-function-we-can-combine-the-srt-with-the-static-closure-itself">For a function, we can combine the SRT with the static closure itself</h3>
<p>A top-level function with an SRT would look like this:</p>
<p><img src="/images/new-srt-fun.png" /></p>
<p>We might as well just merge the two objects together, and put the SRT
entries in the function closure, to give this:</p>
<p><img src="/images/new-srt-fun2.png" /></p>
<p>Together, these optimisations were enough to reduce code size compared
with the old table/bitmap representation.</p>
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
<li><a href="https://phabricator.haskell.org/D4632">An overhaul of the SRT representation </a></li>
<li><a href="https://phabricator.haskell.org/D4634">Save a word in the info table on x86_64</a></li>
<li><a href="https://phabricator.haskell.org/D4637">Merge FUN_STATIC closure with its SRT</a></li>
</ul>
<p>Look out for (slightly) smaller binaries in GHC 8.6.1.</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Fixing 17 space leaks in GHCi, and keeping them fixed</title>
    <link href="https://simonmar.github.io/posts/2018-06-20-Finding-fixing-space-leaks.html" />
    <id>https://simonmar.github.io/posts/2018-06-20-Finding-fixing-space-leaks.html</id>
    <published>2018-06-20T00:00:00Z</published>
    <updated>2018-06-20T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Fixing 17 space leaks in GHCi, and keeping them fixed</h1>
  <span class="post-date">June 20, 2018</span>
  <p>In this post I want to tackle a couple of problems that have irritated
me from time to time when working with Haskell.</p>
<ul>
<li><p><strong>GHC provides some powerful tools for debugging space leaks, but
sometimes they’re not enough</strong>. The heap profiler shows you what’s in
the heap, but it doesn’t provide detailed visibility into the chain of
references that cause a particular data structure to be
retained. Retainer profiling was supposed to help with this, but in
practice it’s pretty hard to extract the signal you need - retainer
profiling will show you one relationship at a time, but you want to
see the whole chain of references.</p></li>
<li><p><strong>Once you’ve fixed a space leak, how can you write a regression test
for it</strong>? Sometimes you can make a test case that will use <code>O(n)</code>
memory if it leaks instead of <code>O(1)</code>, and then it’s
straightforward. But what if your leak is only a constant factor?</p></li>
</ul>
<p>We recently noticed an interesting space leak in GHCi. If we loaded a
set of modules, and then loaded the same set of modules again, GHCi
would need twice as much memory as just loading the modules
once. That’s not supposed to happen - GHCi should release whatever
data it was holding about the first set of modules when loading a new
set. What’s more, after further investigation we found that this
effect wasn’t repeated the <em>third</em> time we loaded the modules; only
one extra set of modules was being retained.</p>
<p><img src="/images/ghci-leak.png" /></p>
<p>Conventional methods for finding the space leak were not helpful in
this case. GHCi is a complex beast, and just reproducing the problem
proved difficult. So I decided to try a trick I’d thought about for a
long time but never actually put into practice: using GHC’s <em>weak
pointers</em> to detect data that should be dead, but isn’t.</p>
<h2 id="weak-pointers-can-detect-space-leaks">Weak pointers can detect space leaks</h2>
<p>The <a
href="http://hackage.haskell.org/package/base-4.11.1.0/docs/System-Mem-Weak.html">System.Mem.Weak</a>
library provides operations for creating “weak” pointers. A weak
pointer is a reference to an object that doesn’t keep the object
alive. If we have a weak pointer, we can attempt to <em>dereference</em> it,
which will either succeed and return the value it points to, or it
will fail in the event that the value has been garbage collected. So
a weak pointer can detect when things are garbage collected, which is
exactly what we want for detecting space leaks.</p>
<p>Here’s the idea:</p>
<ol type="1">
<li>Call <code>mkWeakPtr v Nothing</code> where <code>v</code> is the value you’re interested in.</li>
<li>Wait until you believe <code>v</code> should be garbage.</li>
<li>Call <code>System.Mem.performGC</code> to force a full GC.</li>
<li>Call <code>System.Mem.Weak.deRefWeak</code> on the weak pointer to see if <code>v</code> is alive or not.</li>
</ol>
<p>Here’s <a href="https://phabricator.haskell.org/D4658">how I
implemented this for GHCi</a>. One thing to note is that just because
<code>v</code> was garbage-collected doesn’t mean that there aren’t still pieces
of <code>v</code> being retained, so you might need to have several weak pointers
to different components of <code>v</code>, like I did in the GHC patch. These
really did detect multiple different space leaks.</p>
<p>This patch reliably detected leaks in trivial examples, including many
of the tests in GHCi’s own test suite. That meant we had a way to
reproduce the problem without having to use unpredictable measurement
methods like memory usage or heap profiles. This made it much easier
to iterate on finding the problems.</p>
<h2 id="back-to-the-space-leaks-in-ghci">Back to the space leaks in GHCi</h2>
<p>That still leaves us with the problem of how to actually diagnose the
leak and find the cause. Here the techniques are going to get a bit
more grungy: we’ll use <code>gdb</code> to poke around in the heap at runtime,
along with some custom utilities in the GHC runtime to help us search
through the heap.</p>
<p>To set things up for debugging, we need to</p>
<ol type="1">
<li>Compile GHC with <code>-g</code> and <code>-debug</code>, to add debugging info to the binary and debugging functionality to the runtime, respectively.</li>
<li>load up GHCi in gdb (that’s a bit fiddly and I won’t go into the details here),</li>
<li>Set things up to reproduce the test case.</li>
</ol>
<pre><code>*Main&gt; :l
Ok, no modules loaded.
-fghci-leak-check: Linkable is still alive!
Prelude&gt;</code></pre>
<p>The <code>-fghci-leak-check</code> code just spat out a message when it
detected a leak. We can <code>Ctrl-C</code> to break into <code>gdb</code>:</p>
<pre><code>Program received signal SIGINT, Interrupt.
0x00007ffff17c05b3 in __select_nocancel ()
    at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.</code></pre>
<p>Next I’m going to search the heap for instances of the <code>LM</code>
constructor, which corresponds to the <code>Linkable</code> type that the leak
detector found. There should be none of these alive, because the <code>:l</code>
command tells GHCi to unload everything, so any <code>LM</code>
constructors we find must be leaking:</p>
<pre><code>(gdb) p findPtr(ghc_HscTypes_LM_con_info,1)
0x4201a073d8 = ghc:HscTypes.LM(0x4201a074b0, 0x4201a074c8, 0x4201a074e2)
--&gt;
0x4200ec2000 = WEAK(key=0x4201a073d9 value=0x4201a073d9 finalizer=0x7ffff2a077d0)
0x4200ec2000 = WEAK(key=0x4201a073d9 value=0x4201a073d9 finalizer=0x7ffff2a077d0)
0x42017e2088 = ghc-prim:GHC.Types.:(0x4201a073d9, 0x7ffff2e9f679)
0x42017e2ae0 = ghc-prim:GHC.Types.:(0x4201a073d9, 0x7ffff2e9f679)
$1 = void</code></pre>
<p>The <code>findPtr</code> function comes from the RTS, it’s a function designed
specifically for searching through the heap for things from inside
<code>gdb</code>. I asked it to search for <code>ghc_HscTypes_LM_con_info</code>,
which is the info pointer for the <code>LM</code> constructor - every
instance of that constructor will have this pointer as its first word.</p>
<p>The <code>findPtr</code> function doesn’t just search for objects in the heap, it
also attempts to find the object’s parent, and will continue tracing
back through the chain of ancestors until it finds multiple parents.</p>
<p>In this case, it found a single <code>LM</code> constructor, which had four
parents: two <code>WEAK</code> objects and two <code>ghc-prim:GHC.Types.:</code> objects,
which are the list constructor <code>(:)</code>. The <code>WEAK</code> objects we know
about: those are the weak pointers used by the leak-checking code. So
we need to trace the parents of the other objects, which we can do with
another call to <code>findPtr</code>:</p>
<pre><code>(gdb) p findPtr(0x42017e2088,1)
0x42016e9c08 = ghc:Linker.PersistentLinkerState(0x42017e2061, 0x7ffff3c2bc63, 0x42017e208a, 0x7ffff2e9f679, 0x42016e974a, 0x7ffff2e9f679)
--&gt;
0x42016e9728 = THUNK(0x7ffff74790c0, 0x42016e9c41, 0x42016e9c09)
--&gt;
0x42016e9080 = ghc:Linker.PersistentLinkerState(0x42016e9728, 0x7ffff3c2e7bb, 0x7ffff2e9f679, 0x7ffff2e9f679, 0x42016e974a, 0x7ffff2e9f679)
--&gt;
0x4200dbe8a0 = THUNK(0x7ffff7479138, 0x42016e9081, 0x42016e90b9, 0x42016e90d1, 0x42016e90e9)
--&gt;
0x42016e0b00 = MVAR(head=END_TSO_QUEUE, tail=END_TSO_QUEUE, value=0x4200dbe8a0)
--&gt;
0x42016e0828 = base:GHC.MVar.MVar(0x42016e0b00)
--&gt;
0x42016e0500 = MUT_VAR_CLEAN(var=0x42016e0829)
--&gt;
0x4200ec6b80 = base:GHC.STRef.STRef(0x42016e0500)
--&gt;
$2 = void</code></pre>
<p>This time we traced through several objects, until we came to an
<code>STRef</code>, and <code>findPtr</code> found no further parents. Perhaps the next
parent is a CAF (a top-level thunk) which <code>findPtr</code> won’t find because
it only searches the heap. Anyway, in the chain we have two
<code>PersistentLinkerState</code> objects, and some <code>THUNK</code>s - it looks like
perhaps we’re holding onto an old version of the
<code>PersistentLinkerState</code>, which contains the leaking <code>Linkable</code> object.</p>
<p>Let’s pick one <code>THUNK</code> and take a closer look.</p>
<pre><code>(gdb) p4 0x42016e9728
0x42016e9740:	0x42016e9c09
0x42016e9738:	0x42016e9c41
0x42016e9730:	0x0
0x42016e9728:	0x7ffff74790c0 &lt;sorW_info&gt;</code></pre>
<p>The <code>p4</code> command is just a macro for dumping memory (you can get these
macros from <a
href="https://ghc.haskell.org/trac/ghc/wiki/Debugging/CompiledCode">here</a>).</p>
<p>The header of the object is <code>0x7ffff74790c0 &lt;sorW_info&gt;</code>, which is just a
compiler-generated symbol. How can we find out what code this object
corresponds to? Fortunately, GHC’s new <code>-g</code> option generates DWARF
debugging information which <code>gdb</code> can understand, and because we
compiled GHC itself with <code>-g</code> we can get <code>gdb</code> to tell us what code
this address corresponds to:</p>
<pre><code>(gdb) list *0x7ffff74790c0
0x7ffff74790c0 is in sorW_info (compiler/ghci/Linker.hs:1129).
1124
1125	      itbl_env&#39;     = filterNameEnv keep_name (itbl_env pls)
1126	      closure_env&#39;  = filterNameEnv keep_name (closure_env pls)
1127	
1128	      new_pls = pls { itbl_env = itbl_env&#39;,
1129	                      closure_env = closure_env&#39;,
1130	                      bcos_loaded = remaining_bcos_loaded,
1131	                      objs_loaded = remaining_objs_loaded }
1132	
1133	  return new_pls</code></pre>
<p>In this case it told us that the object corresponds to line 1129 of
<code>compiler/ghci/Linker.hs</code>. This is all part of the function
<code>unload_wkr</code>, which is part of the code for unloading compiled
code in GHCi. It looks like we’re on the right track.</p>
<p>Now, <code>-g</code> isn’t perfect - the line it pointed to isn’t actually a
thunk. But it’s close: the line it points to refers to <code>closure_env'</code> which is defined on line 1126, and it is
indeed a thunk. Moreover, we can see that it has a reference to <code>pls</code>,
which is the original <code>PersistentLinkerState</code> before the unloading
operation.</p>
<p>To avoid this leak, we could pattern-match on <code>pls</code> eagerly rather
than doing the lazy record selection <code>(closure_env pls)</code> in the
definition of <code>closure_env'</code>. That’s exactly what I did to fix this
particular leak, as you can see in <a
href="https://phabricator.haskell.org/D4872">the patch that fixes
it</a>.</p>
<p>Fixing one leak isn’t necessarily enough: the data structure might be
retained in multiple different ways, and it won’t be garbage collected
until all the references are squashed. In total I found</p>
<ul>
<li><a href="https://phabricator.haskell.org/D4659">7 leaks in GHCi</a> that were
collectively responsible for the original leak, and</li>
<li><a href="https://phabricator.haskell.org/D4872">A further 10 leaks</a>
that only appeared when GHC was compiled without optimisation. (It
seems that GHC’s optimiser is pretty good at fixing space leaks by
itself)</li>
</ul>
<p>You might ask how anyone could have found these without undergoing
this complicated debugging process. And whether there are more lurking
that we haven’t found yet. These are really good questions, and I
don’t have a good answer for either. But at least we’re in a better
place now:</p>
<ul>
<li>The leaks are fixed, and we have a regression test to prevent them
being reintroduced.</li>
<li>If you happen to write a patch that introduces a leak, you’ll
know what the patch is, so you have a head start in debugging it.</li>
</ul>
<h2 id="could-we-do-better">Could we do better?</h2>
<p>Obviously this is all a bit painful and we could definitely build
better tools to make this process easier. Perhaps something based on
<code>heap-view</code> which was <a
href="https://phabricator.haskell.org/D3055">recently added to
GHC</a>? I’d love to see someone tackle this.</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Hotswapping Haskell</title>
    <link href="https://simonmar.github.io/posts/2017-10-17-hotswapping-haskell.html" />
    <id>https://simonmar.github.io/posts/2017-10-17-hotswapping-haskell.html</id>
    <published>2017-10-17T00:00:00Z</published>
    <updated>2017-10-17T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Hotswapping Haskell</h1>
  <span class="post-date">October 17, 2017</span>
  <p><em>This is a guest post by <a href="https://github.com/JonCoens">Jon
Coens</a>. Jon worked on the Haxl project since the beginning in
2013, and nowadays he works on broadening Haskell use within Facebook.</em></p>
<p>From developing code through deployment, Facebook needs to move fast. This is especially true for one of our <a href="https://code.facebook.com/posts/745068642270222/fighting-spam-with-haskell/">anti-abuse systems</a> that deploys hundreds of code changes every day. Releasing a large application (hundreds of Kloc) that many times a day presents plenty of intriguing challenges. Haskell’s strict type system means we’re able to confidently push new code knowing that we can’t crash the server, but getting those changes out to many thousands of machines as fast as possible requires some ingenuity.</p>
<p>Given the application size and deployment speed constraints:</p>
<ul>
<li><p>Building a new application binary for every change would take too long</p></li>
<li><p>Starting and tearing down millions of heavy processes a day would create undue churn on other infrastructure</p></li>
<li><p>Splitting the service into multiple smaller services would slow down developers.</p></li>
</ul>
<p>To overcome these constraints, our solution is to build a shared object file that contains only the set of frequently changing business logic and dynamically load it into our server process. With some clever house-keeping, the server drops old unneeded shared objects to make way for new ones without dropping any requests.</p>
<p>It’s like driving a car down the road, having a new engine fall into your lap, installing it on-the-fly, and dumping the old engine behind you, all while never touching the brakes.</p>
<h2 id="show-me-the-code">Show Me The Code!</h2>
<p>For those who want a demo, look <a href="https://github.com/fbsamples/ghc-hotswap">here</a>. Make sure you have GHC 8.2.1 or later, then follow the <a href="https://github.com/fbsamples/ghc-hotswap/blob/master/README.md"><code>README</code></a> for how to configure the projects.</p>
<h2 id="what-about">What about…</h2>
<h3 id="a-statically-built-server">A Statically built server</h3>
<p>The usual way of deploying updates requires building a fully statically-linked binary and shipping that to every machine. This has many benefits, the biggest of which being streamlined and well-understood deployment, but results in long update times due to the size of our large final binary. Each business logic change, no matter how small, needs to re-link the entire binary and be shipped out to all machines. Both binary link time and distribution time are correlated with file size, so the larger the binary, the longer the updates. In our case, the application binary’s size is too large for us to do frequent updates by this method.</p>
<h3 id="ghci-as-a-service">GHCi-as-a-service</h3>
<p>GHCi’s incremental module reloading is another way of updating code quickly. Mimicking the local development workflow, you could ship code updates to each service, and instruct them to reload as necessary. Continually re-interpreting the code significantly decreases the amount of time to distribute an update. In fact, a previous version of our application (not based on Haskell) worked this way. This approach severely hinders performance, however. Running interpreted code is strictly slower than optimized compiled code, and GHCi can’t currently handle running multiple requests at the same time.</p>
<p>The model of reloading libraries in GHCi closely matches what we want our end behavior to look like. What about loading those libraries into a non-interpreted Haskell binary?</p>
<h2 id="shipping-shared-objects-for-great-good">Shipping shared objects for great good</h2>
<p>Using the <code>GHCi.Linker</code> API, our update deployment looks roughly as follows:</p>
<ul>
<li><p>Commit a code change onto trunk</p></li>
<li><p>Incrementally build a shared object file containing the frequently-changing business logic</p></li>
<li><p>Ship that file to each machine</p></li>
<li><p>In each process, use GHCi’s dynamic linker to load in the new shared object and lookup a symbol from it (while continuing to serve requests using the previous code)</p></li>
<li><p>If all succeeds, start serving requests using the new code and mark the previous shared object for unloading by the GC</p></li>
</ul>
<p>This minimizes the amount of time between making a code change and having it running in an efficient production environment. It only rebuilds the minimum set of code, deploys a much smaller file to each server, and keeps the server running through each update.</p>
<p>Not every module or application can follow this update model as there are some crucial constraints to consider when figuring out what can go into the shared object.</p>
<ol type="1">
<li>The symbol API boundaries into and out of the shared object must remain constant</li>
<li>The main binary cannot persist any reference to code or data originating from the shared object, because that will prevent the GC from unloading the object.</li>
</ol>
<p>Fortunately, our use-case fits this mold.</p>
<h2 id="details">Details</h2>
<p>We’ll talk about a handful of libraries + example code</p>
<ul>
<li><p><a href="https://downloads.haskell.org/~ghc/master/libraries/ghci/ghci/GHCi-ObjLink.html"><strong>GHCi.ObjLink</strong></a> - A library provided by GHC</p></li>
<li><p><a href="https://github.com/fbsamples/ghc-hotswap/tree/master/ghc-hotswap"><strong>ghc-hotswap</strong></a> - A library to use</p></li>
<li><p><a href="https://github.com/fbsamples/ghc-hotswap/tree/master/ghc-hotswap-types"><strong>ghc-hotswap-types</strong></a> - User-written code to define the API</p></li>
<li><p><a href="https://github.com/fbsamples/ghc-hotswap/tree/master/ghc-hotswap-so"><strong>ghc-hotswap-so</strong></a> - User-written code that lives in the shared object</p></li>
<li><p><a href="https://github.com/fbsamples/ghc-hotswap/tree/master/ghc-hotswap-demo"><strong>ghc-hotswap-demo</strong></a> - User-written application utilizing the above</p></li>
</ul>
<h3 id="loading-and-extracting-from-the-shared-object">Loading and extracting from the shared object</h3>
<p>Let’s start with bringing in a new shared object, the guts of which can be found in <a href="https://github.com/fbsamples/ghc-hotswap/blob/master/ghc-hotswap/GHC/Hotswap.hs">loadNewSO</a>. It makes heavy use of the <a href="https://github.com/ghc/ghc/blob/master/libraries/ghci/GHCi/ObjLink.hs">GHCi.ObjLink</a> library.
We need the name of an exported symbol to lookup inside the shared object (<code>symName</code>) and the file path to where the shared object lives (<code>newSO</code>). With these, we can return an instance of some data that originates from that shared object.</p>
<pre><code>initObjLinker DontRetainCAFs</code></pre>
<p>GHCi’s linker needs to be initialized before use, and fortunately the call is idempotent. “DontRetainCAFs” tells the linker and GC not to retain CAFs (Constant Applicative Forms, i.e. top-level values) in the shared object. GHCi normally retains all CAFs as the user can type an expression that refers to anything at all, but for hot-swapping this would prevent the object from being unloaded as we would have references into the object from the heap-resident CAFs.</p>
<pre><code>loadObj newSO
resolved &lt;- resolveObjs
unless resolved $
  ...</code></pre>
<p>This maps the shared object into the memory of the main process, brings the shared object’s symbols into GHCi’s symbol table, and ensures any undefined symbols in the SO are present in the main binary. If any of these fail, an exception is thrown.</p>
<pre><code>c_sym &lt;- lookupSymbol symName</code></pre>
<p>Here we ask GHCi’s symbol table if the given name exists, and returns a pointer to that symbol.</p>
<pre><code>h &lt;- case c_sym of
  Nothing -&gt; throwIO ...
  Just p_sym -&gt;
    bracket (callExport $ castPtrToFunPtr p_sym) freeStablePtr deRefStablePtr</code></pre>
<p>When getting a pointer to the symbol (<code>Just p_sym</code>), a couple things happen. We know that the underlying symbol is a function (as we’ll ensure later), so we cast it to a function pointer. A <code>FunPtr</code> doesn’t do us much good on its own, so use <code>callExport</code> to turn it into a callable Haskell function as well as execute the function. This call is the first thing to run code originating from the shared object. Since our call returns a <code>StablePtr a</code>, we dereference and then free the stable pointer, resulting in our value of type a from the shared object.</p>
<p>We want to query the shared object and get a Haskell value back. The best way to do that safely and without baking in too much low-level knowledge is for the shared object to expose a function using <code>foreign export</code>. The Haskell value must therefore be returned wrapped in a <code>StablePtr</code>, and so we have to get at the value itself using <code>deRefStablePtr</code>, before finally releasing the <code>StablePtr</code> with <code>freeStablePtr</code>.</p>
<pre><code>purgeObj newSO
return h</code></pre>
<p>Assuming everything has gone well, we purge GHCi’s symbol table of all symbols defined from our shared object and then return the value we retrieved. Purging the symbols makes room for the next shared object to come in and resolve successfully without fully unloading the shared object that we’re actively holding references to. We could tell GHCi to unload the shared object at this point, but this would cause the GC to aggressively crawl the entire shared object every single time, which is a lot of unnecessary work. Purging retains the code in the process to make the GC’s work lighter while making room for the next shared object. See <em>Safely Transition Updates</em> for when to unload the shared object.</p>
<p>The project that defines the code for the shared object must be generated in a relocatable fashion. It must be configured with the <code>—enable-library-for-ghci</code> flag, otherwise <code>loadObj</code> and <code>resolveObj</code> will throw a fit.</p>
<h3 id="defining-the-shared-objects-api">Defining the shared object’s API</h3>
<p>During compilation, the function names from code turn into quasi-human-readable symbol names. Ensuring you look up the correct symbol name from a shared object can become brittle if you rely on hardcoded munged names. To mitigate this, we define a single data type to house all the symbols we want to expose to the main application, and export a ccall using Haskell’s Foreign library. This guarantees we can export a particular symbol with a name we control.
Placing all our data behind a single symbol (that both the shared object and main binary can depend on), we reduce the coupling to only a couple of points.</p>
<p>Let’s look at <a href="https://github.com/fbsamples/ghc-hotswap/blob/master/ghc-hotswap-types/Types.hs">Types.hs</a>.</p>
<pre><code>data SOHandles = SOHandles
  { someData :: Text
  , someFn :: Int -&gt; IO ()
  } deriving (Generic, NFData)</code></pre>
<p>Here’s our common structure for everything we want to expose out of the shared object. Notice that you can put constants, like <code>someData</code>, as well as full functions to execute, like <code>someFn</code>.</p>
<pre><code>type SOHandleExport = IO (StablePtr SOHandles)</code></pre>
<p>This defines the type for the extraction function the main binary will run to get an instance of the handles from the shared object</p>
<pre><code>foreign import ccall &quot;dynamic&quot;
  callExport :: FunPtr SOHandleExport -&gt; SOHandleExport</code></pre>
<p>Here we invoke Haskell’s FFI to generate a function that calls a function pointer to our export function as an actual Haskell function. The “dynamic” parameter to ccall <a href="https://www.haskell.org/onlinereport/haskell2010/haskellch8.html">does exactly this</a>. We saw using this earlier when loading in a shared object.</p>
<p>Next let’s look at code for the <a href="https://github.com/fbsamples/ghc-hotswap/blob/master/ghc-hotswap-so/SO/Handles.hs">shared object itself</a>.
Note that we depend on and import the <code>Types</code> module defined in <code>ghc-hotswap-types</code>.</p>
<pre><code>foreign export ccall &quot;hs_soHandles&quot;
  hsNewSOHandle :: SOHandleExport</code></pre>
<p>This uses the FFI to explicitly export a Haskell function called <code>hsNewSOHandle</code> as a symbol named <code>“hs_soHandles”</code>. This is the function our main binary is going to end up calling, so set its type to our export function.</p>
<pre><code>hsNewSOHandle = newStablePtr SOHandles
  { ...
  }</code></pre>
<p>In our definition of this function, we return a stable pointer to an instance of our data type, which will end up being read by our main application</p>
<p>Using these common types, we’ve limited the amount of coupling down to using <code>callExport</code>, exporting the symbol as “hs_soHandles” from the shared object, and can combine these in our usage of <code>loadNewSO</code>.</p>
<h3 id="safely-transition-updates">Safely Transition Updates</h3>
<p>With some extra care, we can cleanly transition to new shared objects while minimizing the amount of work the GC needs to do.</p>
<p>Let’s look closer at <a href="https://github.com/fbsamples/ghc-hotswap/blob/master/ghc-hotswap/GHC/Hotswap.hs">Hotswap.hs</a>.</p>
<p><code>registerHotswap</code> uses <code>loadNewSO</code> to load the first shared object and then provides some accessor functions on the data extracted. We save some state associated with the shared object: the path to the object, the value we extract, as well as a lock to keep track of usage.</p>
<p>The <code>unWrap</code> function reads the state for the latest shared object and runs a user-supplied action on the extracted value. Wrapping the user-function in the read lock ensures we won’t accidentally try to remove the underlying code while actively using it. Without this, we run the risk of creating unnecessary stress on the GC.</p>
<p>The updater function (<code>updateState</code>) assumes we already have one shared object mapped into memory with its symbol table purged.</p>
<pre><code>newVal &lt;- force &lt;$&gt; loadNewSO dynamicCall symbolName nextPath</code></pre>
<p>We first attempt to load in the next shared object located at <code>nextPath</code>, using the same export call and symbol name as before. At this point we actually have two shared objects mapped into memory at the same time; one being the old object that’s actively being used and the other being the new object with our desired updates.</p>
<p>Next we build some state associated with this object, and swap our state MVar.</p>
<pre><code>oldState &lt;- swapMVar mvar newState</code></pre>
<p>After this call, any user that uses <code>unWrap</code> will get the new version of code that was just loaded up. This is when we would observe the update being “live” in our application.</p>
<pre><code>L.withWrite (lock oldState) $
  unloadObj (path oldState)</code></pre>
<p>Here we finally ask the GC to unload the old object. Once the write lock is obtained, no readers are present, so nothing can be running code from this old shared object (unless one is nefariously holding onto some state). Calling <code>unloadObj</code> doesn’t immediately unmap the object, as it only informs the GC that the object is valid to be dumped. The next major GC ensures that no code is referencing anything from that shared object and will fully dump it out.</p>
<p>At this point we now have only the next shared object mapped in memory and being used in the main application.</p>
<h2 id="shortcomings-future-work">Shortcomings / Future work</h2>
<h3 id="beware-sticky-shared-objects">Beware sticky shared objects</h3>
<p>The trickiest problem we’ve come across has been when the GC doesn’t want to drop old shared objects. Eventually so many shared objects are linked at once that the process runs out of space to load in a new object, stalling all updates until the process is restarted. We’ll call this problem <em>shared object retention</em>, or just <em>retention</em>.</p>
<p>An object is unloaded when (a) we’ve called <code>unloadObj</code> on it, and (b) the GC determines that there are no references from heap data into the object. Retention can therefore only happen if we have some persistent data that lives across a shared object swap. Obviously it’s better if you can avoid this, but sometimes it’s necessary: e.g. in Sigma the persistent data consists of the pre-initialized data sources that we use with the Haxl monad, amongst other things. The first step in avoiding retention is to be very clear about what this data is, and to fully audit it.</p>
<p>To get retention, the persistent data must be mutable in some way (e.g. contain an <code>IORef</code>), and for retention to occur we must write something into the persistent <code>IORef</code> during the course of executing code from the shared object. The data we wrote into the <code>IORef</code> can end up referring to the shared object in two ways:</p>
<ul>
<li><p>If it contains a thunk or a function, these will refer to code in the shared object.</p></li>
<li><p>If it contains data where the datatype is defined in the shared object (rather than in the packages that the object depends on, which are statically linked), then again we have a reference from the heap-resident data into the shared object, which will cause retention.</p></li>
</ul>
<p>So to avoid retention while having mutable persistent data, the rules of thumb are:</p>
<ol type="1">
<li><p><code>rnf</code> everything before writing into the persistent <code>IORef</code>, and ensure that any manual <code>NFData</code> instances don’t lie.</p></li>
<li><p>Don’t store values that contain functions</p></li>
<li><p>Don’t store values that use datatypes defined in the shared object</p></li>
</ol>
<p>Debugging retention problems can be really hard, involving attaching to the process with gdb and then following the offending references from the heap. We hope that the new DWARF support in GHC 8.2 will be able to help here.</p>
<h3 id="linker-addressable-memory-is-limited">Linker addressable memory is limited</h3>
<p>Calling the built file a shared object is a bit of a misnomer, as it isn’t compiled with <code>-fPIC</code> and is actually just an object file. Files like these can only be loaded into the lower 2GB of memory (x86_64 small memory model uses 32 bit relative jumps), which can become restrictive when your object file gets large. Since the update mechanism relies on having multiple objects in memory at the same time, fragmentation of the mappable address space can become a problem. We’ve already made a few improvements to the GHCi linker to reduce the impact of these problems, but we’re running out of options.</p>
<p>Ideally we’d switch to using true shared objects (built with <code>-fPIC</code>) to remove this limitation. It requires some work to get there, though: GHC’s dynamic linking support is designed to support a model where each package is in a separate shared library, whereas we want a mixed static/dynamic model.</p>
<p><a href="https://github.com/JonCoens"><em>Jon Coens</em><a></p>
</div>
]]></summary>
</entry>
<entry>
    <title>Asynchronous Exceptions in Practice</title>
    <link href="https://simonmar.github.io/posts/2017-01-24-asynchronous-exceptions.html" />
    <id>https://simonmar.github.io/posts/2017-01-24-asynchronous-exceptions.html</id>
    <published>2017-01-24T00:00:00Z</published>
    <updated>2017-01-24T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Asynchronous Exceptions in Practice</h1>
  <span class="post-date">January 24, 2017</span>
  <p>Asynchronous exceptions are a controversial feature of Haskell. You
can throw an exception to another thread, at any time; all you need is
its <code>ThreadId</code>:</p>
<pre><code>throwTo :: Exception e =&gt; ThreadId -&gt; e -&gt; IO ()</code></pre>
<p>The other thread will receive the exception immediately, whatever it
is doing. So you have to be ready for an asynchronous exception to
fire at any point in your code. Isn’t that a scary thought?</p>
<p>It’s an old idea - in fact, when we originally added asynchronous
exceptions to Haskell (and wrote <a href="http://simonmar.github.io/bib/papers/async.pdf">a paper</a> about it), it
was shortly after Java had removed the equivalent feature, because it
was impossible to program with.</p>
<p>So how do we get away with it in Haskell? I wrote a little about the
rationale in <a
href="http://chimera.labs.oreilly.com/books/1230000000929/ch09.html">my
book</a>. Basically it comes down to this: if we want to be able to
interrupt purely functional code, asynchronous exceptions are the only
way, because polling would be a side-effect. Therefore the remaining
problem is how to make asynchronous exceptions safe for the impure
parts of our code. Haskell provides functionality for disabling
asynchronous exceptions during critical sections (<code>mask</code>) and
abstractions based around it that can be used for safe resource
acquisition (<code>bracket</code>).</p>
<p>At Facebook I’ve had the opportunity to work with asynchronous
exceptions in a large-scale real-world setting, and here’s what I’ve
learned:</p>
<ul>
<li><p>They’re really useful, particularly for catching bugs that cause
excessive use of resources.</p></li>
<li><p>In the vast majority of our Haskell codebase we don’t need to worry
about them at all. The documentation that we give to our users who
write Haskell code to run on our platform doesn’t mention
asynchronous exceptions.</p></li>
<li><p>But some parts of the code can be <em>really hard to get right</em>. Code
in the <code>IO</code> monad dealing with multithreading or talking to foreign
libraries, for example, has to care about cleaning up resources and
recovering safely in the event of an asynchronous exception.</p></li>
</ul>
<p>Let me take each of those points in turn and elaborate.</p>
<h2 id="where-asynchronous-exceptions-are-useful">Where asynchronous exceptions are useful</h2>
<p>The motivating example often used is timeouts, for example of
connections in a network service. But this example is not all that
convincing: in a network server we’re probably writing code that’s
mostly in the <code>IO</code> monad, we know the places where we’re blocking, and
we could use other mechanisms to implement timeouts that would be less
“dangerous” but almost as reliable as asynchronous exceptions.</p>
<p>In <a
href="https://code.facebook.com/posts/745068642270222/fighting-spam-with-haskell/">Sigma</a>,
we use asynchronous exceptions to
prevent huge requests from degrading the performance of our server
for other clients.</p>
<p>In a complex system, it’s highly likely that some requests will end up
using an excessive amount of resources. Perhaps there’s a bug in the
code that sometimes causes it to use a lot of CPU (or even an infinite
loop), or perhaps the code fetches some data to operate on, and the
data ends up being unexpectedly large. In principle we could find all
these cases and fix them, but in practice, large systems can have
surprising emergent behaviour and we can’t guarantee to find all the
bugs outside production.</p>
<h3 id="beware-elephants">Beware Elephants</h3>
<p>So sometimes a request turns out to be an elephant, and we have to
deal with it. If we do nothing, the elephant will trample around,
slowing everything down, or maxing out some resource like memory or
network bandwidth, which can cause failures for other requests
running on the system.</p>
<p><img src="/images/elephant.jpg" /></p>
<p>One way or another something is going to die. We would rather it was
the elephant, and not the many other requests currently running on the
same machine. Stopping the elephant minimises the destruction. The
elephant’s owner will then fix their problem, and we’ve mitigated a
bug with minimal disruption.</p>
<p>Our elephant gun is called <em>Allocation Limits</em>. The Haskell runtime
keeps track of how much memory each Haskell thread has allocated in
total, and if that total exceeds the limit we set, the thread receives
an asynchronous exception, namely <code>AllocationLimitExceeded</code>. The user
code running on our platform is not permitted to catch this exception,
instead the server catches it, logs some data to aid debugging, and
sends an error back to the client that initiated the request.</p>
<p>We’re using “memory allocated” as a proxy for “work done”. Most
computation in Haskell allocates memory, so this is a more predictable
measure than wall-clock time. It’s a fairly crude way to identify
excessively large requests, but it works well for us.</p>
<p>Here’s what happened when we enabled allocation limits early on during
Sigma’s development. The graph tracks the maximum amount of live memory
across different groups of machines. It turns out there were
a very small fraction of requests consuming a huge amount of
resources, and enabling allocation limits squashed them nicely:</p>
<p><img src="/images/alloclimits.jpg" /></p>
<p>Allocation limits have helped protect us from disaster on several
occasions. One time, an infinite loop made its way into production;
the result was that our monitoring showed an increase in requests
hitting the allocation limit. The data being logged allowed it to be
narrowed down to one particular type of request, we were quickly able
to identify the change that caused the problem, undo it, and notify
the owner. Nobody else noticed.</p>
<h2 id="in-the-vast-majority-of-code-we-dont-need-to-worry-about-asynchronous-exceptions">In the vast majority of code, we don’t need to worry about asynchronous exceptions</h2>
<p>Because you don’t have to poll for an asynchronous exception, they
work almost everywhere. All pure code works with asynchronous
exceptions without change.</p>
<p>In our platform, clients write code on top of the <a href="https://github.com/facebook/Haxl">Haxl</a>
framework in which I/O is provided only via a fixed set of APIs that
we control, so we can guarantee that those APIs are safe, and
therefore all of the client code is safe by virtue of abstraction.</p>
<h2 id="some-parts-of-the-code-can-be-really-hard-to-get-right">Some parts of the code can be <em>really hard to get right</em></h2>
<p>That leaves the parts of the code that implement the I/O libraries and
other lower level functionality. These are the places where we have
to care about asynchronous exceptions: if an async exception fires
when we have just opened a connection to a remote server, we have to
close it again and free all the resources associated with the
connection, for example.</p>
<p>In principle, you can follow a few guidelines to be safe.</p>
<ul>
<li><p>Use <code>bracket</code> when allocating any kind of resource that needs to be
explicitly released. This is not specific to asynchronous
exceptions: coping with ordinary synchronous exceptions
requires a good resource-allocation discipline, so your code should
be using <code>bracket</code> anyway.</p></li>
<li><p>Use the <code>async</code> package which avoids some of the common problems,
such as making sure that you fork a thread inside <code>mask</code> to avoid
asynchronous exceptions leaking.</p></li>
</ul>
<p>Nevertheless it’s still possible to go wrong. Here are some ways:</p>
<ul>
<li><p>If you want asynchronous exceptions to work, be careful you don’t
accidentally run inside <code>mask</code>, or <code>uninterruptibleMask</code>. We’ve seen
examples of third-party libraries that run callbacks inside <code>mask</code>
(e.g. the <code>hinotify</code> library <a
href="https://github.com/kolmodin/hinotify/pull/22">until
recently</a>). Use <code>getMaskingState</code> to assert that you’re not
masked when you don’t want to be.</p></li>
<li><p>Be careful that those asynchronous exceptions don’t escape from a
thread if the thread is created by calling a <code>foreign export</code>,
because uncaught exceptions will terminate the whole process.
Unlike when using <code>async</code>, a <code>foreign export</code> can’t be created
inside <code>mask</code>. (this is something that should be fixed in GHC,
really).</p></li>
<li><p>Catching all exceptions seems like a good idea when you want to be
bullet-proof, but if you catch and discard the <code>ThreadKilled</code>
exception it becomes really hard to actually kill that thread.</p></li>
<li><p>If you’re coordinating with some foreign code and the Haskell code
gets an asynchronous exception, make sure that the foreign code will
also clean up properly.</p></li>
</ul>
<p>The type system is of no help at all with finding these bugs, the only
way you can find them is with careful eyeballs, good abstractions,
lots of testing, and plenty of assertions.</p>
<h2 id="its-worth-it">It’s worth it</h2>
<p>My claim is, even though some of the low-level code can be hard to get
right, the benefits are worth it.</p>
<p>Asynchronous exceptions generalise several exceptional conditions that
relate to resource consumption: stack overflow, timeouts, allocation
limits, and heap overflow exceptions. We only have to make our code
asynchronous-exception-safe once, and it’ll work with all these
different kinds of errors. What’s more, being able to terminate
threads with confidence that they will clean up promptly and exit is
really useful. (It would be nice to do a comparison with Erlang here,
but not having written a lot of this kind of code in Erlang I can’t
speak with any authority.)</p>
<p>In a high-volume network service, having a guarantee that a class of
runaway requests will be caught and killed off can help reliability,
and give you breathing room when things go wrong.</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Haskell in the Datacentre</title>
    <link href="https://simonmar.github.io/posts/2016-12-08-Haskell-in-the-datacentre.html" />
    <id>https://simonmar.github.io/posts/2016-12-08-Haskell-in-the-datacentre.html</id>
    <published>2016-12-08T00:00:00Z</published>
    <updated>2016-12-08T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Haskell in the Datacentre</h1>
  <span class="post-date">December  8, 2016</span>
  <p>At Facebook we run Haskell on thousands of servers, together handling
over a million requests per second. Obviously we’d like to make the
most efficient use of hardware and get the most throughput per server
that we can. So how do you tune a Haskell-based server to run well?</p>
<p>Over the past few months we’ve been tuning our server to squeeze out
as much performance as we can per machine, and this has involved
changes throughout the stack. In this post I’ll tell you about some
changes we made to GHC’s runtime scheduler.</p>
<h2 id="summary">Summary</h2>
<p>We made one primary change: GHC’s runtime is based around an M:N
threading model which is designed to map a large number (M) of
lightweight Haskell threads onto a small number (N) of heavyweight OS
threads. In our application M is fixed and not all that big: we can
max out a server’s resources when M is about 3-4x the number of
cores, and meanwhile setting N to the number of cores wasn’t enough to
let us use all the CPU (I’ll explain why shortly).</p>
<p>To cut to the chase, we ended up increasing N to be the same as M (or
close to it), and this bought us an extra 10-20% throughput per
machine. It wasn’t as simple as just setting some command-line
options, because GHC’s garbage collector is designed to run with N
equal to the number of cores, so I had to make some changes to the way
GHC schedules things to make this work.</p>
<p>All these improvements are <a
href="https://phabricator.haskell.org/rGHC76ee260778991367b8dbf07ecf7afd31f826c824">upstream
<a
href="https://phabricator.haskell.org/rGHCf703fd6b50f0ae58bc5f5ddb927a2ce28eeaddf6">in</a>
<a
href="https://phabricator.haskell.org/rGHCe68195a96529cf1cc2d9cc6a9bc05183fce5ecea">GHC</a>,
and they’ll be available in GHC 8.2.1, due early 2017.</p>
<h2 id="background-capabilities">Background: Capabilities</h2>
<p>When the GHC runtime starts, it creates a number of <em>capabilities</em>
(also sometimes called HEC, for Haskell Execution Context). The
number of capabilities is determined by the <code>-N</code> flag when you start
the Haskell program, e.g. <code>prog +RTS -N4</code> would run <code>prog</code> with 4
capabilities.</p>
<p>A capability is the <em>ability to run Haskell code</em>. It consists of
an allocation area (also called <em>nursery</em>) for allocating memory, a
queue of lightweight Haskell threads to run, and one or more OS
threads (called <em>workers</em>) that will run the Haskell code. Each
capability can run a single Haskell thread at a time; if the Haskell
thread blocks, the next Haskell thread in the queue runs, and so on.</p>
<p>Typically we choose the number of capabilities to be equal to the
number of physical cores on the machine. This makes sense: there is
no advantage in trying to run more Haskell threads simultaneously than
we have physical cores.</p>
<h2 id="how-our-server-maps-onto-this">How our server maps onto this</h2>
<p>Our system is based on the C++ Thrift server, which provides a fixed
set of worker threads that pull requests from a queue and execute
them. We choose the number of worker threads to be high enough that
we can fully utilize the server, but not too high that we create too
much contention and increase latency under maximum load.</p>
<p>Each worker thread calls into Haskell via a <code>foreign export</code> to do the
actual work. The GHC runtime then chooses a capability to run the
call. It normally picks an idle capability, and the call executes
immediately. If there are no idle capabilities, the call blocks on
the queue of a capability until the capability yields control to it.</p>
<h2 id="the-problem">The problem</h2>
<p>At high load, even though we have enough threads to fully utilize the
CPU cores, the intermediate layer of scheduling where GHC assigns
threads to capabilities means that we sometimes have threads idle that
could be running. Sometimes there are multiple runnable
workers on one capability while other capabilities are idle, and the
runtime takes a little while to load-balance during which time we’re
not using all the available CPU capacity.</p>
<p>Meanwhile the kernel is doing its own scheduling, trying to map those
OS threads onto CPUs. Obviously the kernel has a rather more
sophisticated scheduler than GHC and could do a better job of mapping
those M threads onto its N cores, but we aren’t letting it. In this
scenario, the extra layer of scheduling in GHC is just a drag on
performance.</p>
<h2 id="first-up-a-bug-in-the-load-balancer.">First up, a bug in the load-balancer.</h2>
<p>While investigating this I found a <a href="https://phabricator.haskell.org/rGHC1fa92ca9b1ed4cf44e2745830c9e9ccc2bee12d5">bug in the way GHC’s load-balancing
worked</a> - it could cause a large number of spurious wakeups of other
capabilities while load-balancing. Fixing this was worth a few
percent right away, but I had my sights set on larger gains.</p>
<h2 id="couldnt-we-just-increase-the-number-of-capabilities">Couldn’t we just increase the number of capabilities?</h2>
<p>Well yes, and of course we tried just bumping up the <code>-N</code> value, but
increasing <code>-N</code> beyond the number of cores just tends to increase CPU
usage without increasing throughput.</p>
<p>Why? Well, the problem is the garbage collector. The GC keeps all its
threads running trying to steal work from each other, and when we have
more threads than we have real cores, the spinning threads are
slowing down the threads doing the actual work.</p>
<h2 id="increasing-the-number-of-capabilities-without-slowing-down-gc">Increasing the number of capabilities without slowing down GC</h2>
<p>What we’d like to do is to have a larger set of mutator threads, but
only use a subset of those when it’s time to GC. That’s exactly what
this new flag does:</p>
<pre><code>+RTS -qn&lt;threads&gt;</code></pre>
<p>For example, on a 24-core machine you might use <code>+RTS -N48 -qn24</code> to
have 48 mutator threads, but only 24 threads during GC. This is great
for using hyperthreads too, because hyperthreads work well for the
mutator but not for the GC.</p>
<p>Which threads does the runtime choose to do the GC? The scheduler has
a heuristic which looks at which capabilities are currently inactive
and chooses those to be idle, to avoid having to synchronise with
threads that are currently asleep.</p>
<h3 id="rts--qn-will-now-be-turned-on-by-default"><code>+RTS -qn</code> will now be turned on by default!</h3>
<p>This is a slight digression, but it turns out that setting <code>+RTS -qn</code>
to the number of CPU cores is always a good idea if <code>-N</code> is too large.
So the runtime will be <a
href="https://phabricator.haskell.org/rGHC6c47f2efa3f8f4639f375d34f54c01a60c9a1a82">doing
this by default from now on</a>. If <code>-N</code> accidentally gets set too
large, performance won’t drop quite so badly as it did with GHC 8.0
and earlier.</p>
<h2 id="capability-affinity">Capability affinity</h2>
<p>Now we can safely increase the number of capabilities well beyond the
number of real cores, provided we set a smaller number of GC threads
with <code>+RTS -qn</code>.</p>
<p>The final step that we took in Sigma is to map our server threads 1:1
with capabilities. When the C++ server thread calls into Haskell,
it immediately gets a capability, there’s never any blocking, and nor
does the GHC runtime need to do any load-balancing.</p>
<p>How is this done? There’s a new C API exposed by the RTS:</p>
<pre><code>void rts_setInCallCapability (int preferred_capability, int affinity);</code></pre>
<p>In each thread you call this to map that thread to a particular
capability. For example you might call it like this:</p>
<pre><code>static std::atomic&lt;int&gt; counter;
...
rts_setInCallCapability(counter.fetch_add(1), 0);</code></pre>
<p>And ensure that you call this once per thread. The <code>affinity</code>
argument is for binding a thread to a CPU core, which might be useful
if you’re also using GHC’s affinity setting (<code>+RTS -qa</code>). In our case
we haven’t found this to be useful.</p>
<h2 id="future">Future</h2>
<p>You might be thinking, <em>but isn’t the great thing about Haskell
that we have lightweight threads?</em> Yes, absolutely. We do make
use of lightweight threads in our system, but the main server threads
that we inherit from the C++ Thrift server are heavyweight OS threads.</p>
<p>Fortunately in our case we can fully load the system with 3-4
heavyweight threads per core, and this solution works nicely with the
constraints of our platform. But if the ratio of I/O waiting to CPU
work in our workload increased, we would need more threads per core to
keep the CPU busy, and the balance tips towards wanting lightweight
threads. Furthermore, using lightweight threads would make the system
more resilient to increases in latency from downstream services.</p>
<p>In the future we’ll probably move to lightweight threads, but in the
meantime these changes to scheduling mean that we can squeeze all the
available throughput from the existing architecture.</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Haskell positions at Facebook</title>
    <link href="https://simonmar.github.io/posts/2016-08-24-haskell-positions-at-facebook.html" />
    <id>https://simonmar.github.io/posts/2016-08-24-haskell-positions-at-facebook.html</id>
    <published>2016-08-24T00:00:00Z</published>
    <updated>2016-08-24T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Haskell positions at Facebook</h1>
  <span class="post-date">August 24, 2016</span>
  <p>Want to write Haskell for a living? At Facebook we’re looking for
Spam Fighters. A large part of this job involves writing Haskell code
to run on our Sigma/Haxl platform.</p>
<p>It’s a fascinating and exciting area to work in, using state of the
art tools and systems, working with amazing people, and of course you
get to write Haskell every day. Come and see what it’s like to write
Haskell code that runs at Facebook scale!</p>
<p><a
href="https://www.facebook.com/careers/jobs/a0I1200000IA7KYEA1/">Job description and application</a></p>
<p>Note: this is for the Menlo Park (California, USA) office.</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Stack traces in GHCi, coming in GHC 8.0.1</title>
    <link href="https://simonmar.github.io/posts/2016-02-12-Stack-traces-in-GHCi.html" />
    <id>https://simonmar.github.io/posts/2016-02-12-Stack-traces-in-GHCi.html</id>
    <published>2016-02-12T00:00:00Z</published>
    <updated>2016-02-12T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Stack traces in GHCi, coming in GHC 8.0.1</h1>
  <span class="post-date">February 12, 2016</span>
  <p><strong>tl;dr</strong></p>
<p>In the upcoming GHC 8.0.1 release, if you start GHCi with <code>ghci -fexternal-interpreter -prof</code> (any packages you use must be built for
profiling), then you get access to detailed stack traces for all the
code you load into GHCi. Stack traces can be accessed via <code>assert</code>,
<code>error</code>, <a
href="http://haddock.stackage.org/lts-5.1/base-4.8.2.0/Debug-Trace.html#v:traceStack">Debug.Trace.traceStack</a>, and the API in <a href="http://haddock.stackage.org/lts-5.1/base-4.8.2.0/GHC-Stack.html">GHC.Stack</a>.</p>
<h2 id="background">Background</h2>
<p>Haxl users at Facebook do a lot of development and testing inside
GHCi. In fact, we’ve built a customized version of GHCi that runs
code in our <code>Haxl</code> monad by default instead of the <code>IO</code> monad, and has
a handful of extra commands to support common workflows needed by our
developers.</p>
<p>Some of our codebase is pre-compiled, but the code being actively
worked on is just loaded on the fly into GHCi during development and
run with the interpreter. This works surprisingly well even for large
codebases like ours, especially if you enable parallel compilation and
use a bigger heap (e.g. <code>ghci -j8 +RTS -A128m</code>). This is a pretty
smooth setup: right inside GHCi we can test the production code against
real data, and interact with all of the services that our production
systems talk to, while having a nice interactive edit/compile/test
cycle.</p>
<p>However, one thing is missed by many developers, especially those
coming from other languages: easy access to a <strong>stack trace</strong> when
debugging. So, towards the end of last year, I set about finding a
workable solution that we could deploy to our users without impacting
their workflows.</p>
<h2 id="show-me-a-stack-trace">Show me a stack trace!</h2>
<p>To cut to the chase, in GHC 8.0.1 you can fire up ghci like this:</p>
<pre><code>$ ghci -fexternal-interpreter -prof</code></pre>
<p>and you have stack traces on, by default, for all the code you load
into ghci. Let’s try an example.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Exception</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>myTail xs <span class="ot">=</span> assert (<span class="fu">not</span> (<span class="fu">null</span> xs)) <span class="op">$</span> <span class="fu">tail</span> xs</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="ot">myMap ::</span> (a <span class="ot">-&gt;</span> b) <span class="ot">-&gt;</span> [a] <span class="ot">-&gt;</span> [b]</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>myMap f [] <span class="ot">=</span> []</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>myMap f (x<span class="op">:</span>xs) <span class="ot">=</span> f x <span class="op">:</span> myMap f xs</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="fu">print</span> (myMap myTail [[<span class="dv">3</span>],[]])</span></code></pre></div>
<p>We have a map-alike function called <code>myMap</code>, and a tail-alike function
called <code>myTail</code>. We want to find out if <code>myTail</code> is called with an
empty list, so we added an assert. Ok, so it’s a contrived example,
but let’s see what happens:</p>
<pre><code>$ ghci -fexternal-interpreter -prof
GHCi, version 8.1.20160127: http://www.haskell.org/ghc/  :? for help
Prelude&gt; :l ~/scratch/tailtest.hs 
[1 of 1] Compiling Main             ( /home/smarlow/scratch/tailtest.hs, interpreted )
Ok, modules loaded: Main.
*Main&gt; main
[[],*** Exception: Assertion failed
CallStack (from ImplicitParams):
  assert, called at /home/smarlow//scratch/tailtest.hs:3:13 in main:Main
  myTail, called at /home/smarlow//scratch/tailtest.hs:9:21 in main:Main
CallStack (from -prof):
  Main.myTail (/home/smarlow/scratch/tailtest.hs:3:13-34)
  Main.myTail (/home/smarlow/scratch/tailtest.hs:3:13-44)
  Main.myMap (/home/smarlow/scratch/tailtest.hs:7:18-20)
  Main.myMap (/home/smarlow/scratch/tailtest.hs:7:18-33)
  Main.main (/home/smarlow/scratch/tailtest.hs:9:15-35)
  Main.main (/home/smarlow/scratch/tailtest.hs:9:8-36)
*Main&gt; </code></pre>
<p>Now, we got two stack traces, both printed by <code>assert</code>. The first
comes from <a
href="http://downloads.haskell.org/~ghc/latest/docs/html/users_guide/other-type-extensions.html#special-implicit-params">ImplicitParams</a>,
which knows the location of the call site of <code>assert</code> because <code>assert</code>
has a special <code>?callStack :: CallStack</code> constraint in its type.</p>
<p>The second stack trace is the new one, generated by GHCi running in
<code>-prof</code> mode, and has the full call stack all the way from <code>main</code>,
including the fact that <code>myTail</code> was called by <code>myMap</code>. That is, it’s
a dynamic call stack, not a lexical one.</p>
<h2 id="dumping-the-stack-from-anywhere">Dumping the stack from anywhere</h2>
<p>Using <code>assert</code> is one way to get access to a stack trace, but
sometimes you just want to print out the stack when a particular
condition is hit, or when a function is called, to see what’s going
on. For this reason we have <a
href="http://haddock.stackage.org/lts-5.1/base-4.8.2.0/Debug-Trace.html#v:traceStack"><code>Debug.Trace.traceStack</code></a>.
This is like <code>trace</code>, but it also prints out the current stack trace.
For example, I just picked a random place in the code of Happy,
inserted a call to <code>traceStack</code>, loaded Happy into <code>ghci -fexternal-interpreter -prof</code>, ran it and got this:</p>
<pre><code>closure1
CallStack (from -prof):
  LALR.closure1.addItems.fn (LALR.lhs:106:28-48)
  LALR.closure1.addItems.fn (LALR.lhs:(106,28)-(110,84))
  LALR.closure1.addItems.fn (LALR.lhs:(104,40)-(111,31))
  LALR.closure1.addItems.new_new_items (LALR.lhs:100:59-74)
  LALR.closure1.addItems.new_new_items (LALR.lhs:100:37-75)
  LALR.closure1.addItems.new_new_items (LALR.lhs:(99,33)-(101,53))
  GenUtils.mkClosure (GenUtils.lhs:28:28-36)
  GenUtils.mkClosure (GenUtils.lhs:28:20-36)
  LALR.closure1 (LALR.lhs:91:16-67)
  LALR.closure1 (LALR.lhs:91:11-68)
  LALR.genActionTable.possActions (LALR.lhs:489:44-64)
  LALR.genActionTable.possActions (LALR.lhs:(489,33)-(490,60))
  LALR.genActionTable.actionTable (LALR.lhs:471:34-53)
  LALR.genActionTable.actionTable (LALR.lhs:(469,26)-(471,54))
  LALR.genActionTable.actionTable (LALR.lhs:(468,23)-(472,61))
  Main.main2.runParserGen.action (Main.lhs:114:49-77)
  Main.main2.runParserGen.action (Main.lhs:114:27-78)
  Main.main2.runParserGen (Main.lhs:(96,9)-(276,9))
  Main.main2.runParserGen (Main.lhs:(90,9)-(276,10))
  Main.main2.runParserGen (Main.lhs:(86,9)-(276,10))
  Main.main2.runParserGen (Main.lhs:(85,9)-(276,10))
  Main.main2 (Main.lhs:74:20-43)
  Main.main2 (Main.lhs:(64,9)-(78,61))
  Main.main (Main.lhs:57:9-18)</code></pre>
<p>You’ll notice that each function appears on the stack multiple
times—this is because the the annotations are based on scopes, and
GHC tries to insert annotations in useful-looking places. There might
well be room for refinement here in the future.</p>
<h2 id="any-drawbacks">Any drawbacks?</h2>
<ol type="1">
<li><p>You have to compile your packages with profiling. Use
<code>--enable-library-profiling</code> when running Cabal, or set
<code>library-profiling: True</code> in your <code>.cabal/config</code>, or do the Stack
equivalent.</p></li>
<li><p>Results with calls to <code>error</code> are mixed, because the <code>error</code> calls
are often lifted to the top level as a CAF, which breaks the stack
simulation that the profiler does. I have ideas for some workarounds
for this that I plan to try in the future.</p></li>
<li><p>Interpreted code will run more slowly. But this is only for
debugging—we didn’t change the source code, so everything still runs
at full speed when compiled normally. You can also pre-compile some
of your code; don’t forget to use <code>-prof</code>, and add <code>-fprof-auto-calls</code>
to get stack-trace annotations for the code you compile. You can
<code>:set -fobject-code -fprof-auto-calls</code> inside GHCi itself to use
compiled code by default.</p></li>
</ol>
<h2 id="how-does-it-work">How does it work?</h2>
<p>We’re using the existing stack-simulation that happens in GHC’s
profiler, called “cost-centre stacks”. However, running in profiled
mode wasn’t supported by the interpreter, and there were some serious
shenanigans involved to make it possible to run profiled code in GHCi
without slowing down GHCi itself.</p>
<p>There are various differences in the way the Haskell code runs in
profiling mode. The layout of heap objects is different, because every
heap object points to information about the call stack that created
it. This is necessary to get accurate stack simulations in the
presence of things like higher-order functions, but it’s also
important for the heap profiler, so that it can tell who created each
heap object. When running in profiling mode, we have to do various
things to maintain the runtime’s simulation of the call stack.</p>
<p>The first step was to make the interpreter itself work in profiling
mode (as in, interpret code correctly and not crash). Fortunately
this wasn’t nearly as difficult as I’d anticipated: the interpreter
and byte-code compiler were already nicely abstracted over the things
that change in profiling mode. At this point we can already do things
that weren’t possible before: profile GHCi itself, and profile
Template Haskell.</p>
<p>Next, I had to make the interpreter actually simulate the call stack
for interpreted code. Again, this was reasonably straightforward, and
involved using the breakpoints that GHCi already inserts into the
interpreted code as SCC annotations for the profiler.</p>
<p>So far so good: this actually worked quite nicely, but there was one
huge drawback. To actually use it, we have to compile GHC itself with
profiling. Which works, except that it slows down GHCi when
compiling code by a factor of 2-3. That was too big a hit to deploy
this as part of the standard workflow for our Haxl users at Facebook,
so I needed to find a way to make it work without the overhead on the
compiler.</p>
<h3 id="enter-remote-ghci">Enter Remote GHCi</h3>
<p>The solution is to separate the compiler from the interpreter, using a
scheme that I’ve called Remote GHCi. The idea is that by putting the
compiler and the interpreter in separate processes, the compiler can
be running at full speed on a normal non-profiled runtime, while the
interpreter is running in a separate process using the profiled
runtime.</p>
<p><img src="/images/ghc-iserv.png" /></p>
<p>The main complication is arranging that all the interactions between
the compiler and the interpreter happen via serialized messages over a
pipe. We currently have about 50 different message types, you can see
them all <a
href="https://phabricator.haskell.org/diffusion/GHC/browse/master/libraries/ghci/GHCi/Message.hs">here</a>.
We’re currently using the <code>binary</code> library together with <code>Generic</code>
instance generation, but serialization and deserialization using
<code>binary</code> is definitely a bottleneck so I’m looking forward to moving
to the new CBOR-based serialization library when it’s ready.</p>
<p>It turns out that making this separation has a number of advantages
aside from stack traces, which are listed on <a
href="https://ghc.haskell.org/trac/ghc/wiki/RemoteGHCi">the RemoteGHCi
wiki page</a>.</p>
<p>GHCJS has been doing something similar for a while to support Template
Haskell. In fact, I used the GHCJS Template Haskell code as a
starting point, integrated it with GHC proper and built it out to
fully support GHCi (with a couple of exceptions, notably the debugger
doesn’t currently work, and <code>dynCompileExpr</code> in the GHC API cannot be
supported in general).</p>
<p>Remote GHCi also works for Template Haskell and Quasi Quotes, and has
the advantage that when compiling TH code with <code>-prof -fexternal-interpreter</code>, you don’t need to first compile it without
<code>-prof</code>, because we can run the <code>-prof</code> code directly in the external
interpreter process.</p>
<h2 id="three-kinds-of-stack-trace-in-ghc-8.0.1">Three kinds of stack trace in GHC 8.0.1</h2>
<p>There’s a lot happening on the stack trace front. We now have no less
than three ways to get a stack trace:</p>
<ul>
<li>Profiling: <code>ghc -prof -fprof-auto</code> and <code>ghci -fexternal-interprter -prof</code></li>
<li>ImplicitParams, with the magic <code>?callStack :: CallStack</code> constraint (now called <code>HasCallStack</code>).</li>
<li>DWARF: <code>ghc -g</code></li>
</ul>
<p>Each of these has advantages and disadvantages, and none of them are
subsumed by any of the others (sadly!). I’ll try to summarise:</p>
<ul>
<li><p><strong>Profiling</strong></p>
<ul>
<li>Detailed, dynamic, call stacks</li>
</ul>
<p>But:</p>
<ul>
<li>Requires recompiling your code, or loading it into GHCi</li>
<li>2-3x runtime overhead compiled, 20-40x interpreted</li>
<li>Not so great for <code>error</code> and <code>undefined</code> right now</li>
</ul></li>
<li><p><strong>ImplicitParams</strong></p>
<ul>
<li>Good for finding the call site of particular functions, like
<code>error</code> or <code>undefined</code></li>
</ul>
<p>But:</p>
<ul>
<li>Requires explicit code changes to propagate the stack</li>
<li>Some runtime overhead (stacks get constructed and passed around at
runtime)</li>
<li>Shows up in types as <code>HasCallStack</code> constraints</li>
<li>Lexical, not dynamic. (In <code>g = map f</code>, <code>g</code> calls <code>f</code> rather than
<code>map</code> calling <code>f</code>)</li>
</ul>
<p>Could you change GHC so that it automatically adds <code>HasCallStack</code>
constraints everywhere and also hides them from the user, to get
the effect of full call-stack coverage? Maybe - that would be an
alternative to the scheme I’ve implemented on top of profiling.
One difficult area is CAFs, though. If a constraint is added to a
CAF, then the CAF is re-evaluated each time it is called, which is
obviously undesirable. The profiler goes to some lengths to avoid
changing the asymptotic cost of things, but trades off some information in
the stack simulation in the process, which is why calls to <code>error</code>
sometimes don’t get accurate call stack information.</p></li>
<li><p><strong>DWARF</strong></p>
<ul>
<li>No runtime overhead, can be deployed in production.</li>
<li>Good when you’re not willing to sacrifice any performance, but
having some information is better than none when something goes
wrong.</li>
</ul>
<p>But:</p>
<ul>
<li>Gives the raw execution stack, so we lose information due to
tail-calls and lazy evaluation.</li>
</ul></li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>We now have full stack traces inside GHCi, provided you compile your
packages for profiling, and use <code>ghci -fexternal-interpreter -prof</code>.</p>
<p>Remote GHCi is not the default in GHC 8.0.1, but it’s available with
the flag <code>-fexternal-interpreter</code>. Please try it out and let me know
how you get on!</p>
</div>
]]></summary>
</entry>
<entry>
    <title>Fun With Haxl (Part 1)</title>
    <link href="https://simonmar.github.io/posts/2015-10-20-Fun-With-Haxl-1.html" />
    <id>https://simonmar.github.io/posts/2015-10-20-Fun-With-Haxl-1.html</id>
    <published>2015-10-20T00:00:00Z</published>
    <updated>2015-10-20T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="post">
  <h1 class="post-title">Fun With Haxl (Part 1)</h1>
  <span class="post-date">October 20, 2015</span>
  <p>This is a blog-post version of a talk I recently gave at the <a
href="https://skillsmatter.com/conferences/7069-haskell-exchange-2015">Haskell
eXchange 2015</a>. The video of the talk is <a
href="https://skillsmatter.com/skillscasts/6644-keynote-from-simon-marlow">here</a>,
but there were a lot of questions during the talk which aren’t very
audible on the video, so hopefully this post will be useful to folks
who weren’t at the event.</p>
<p>If you want to play with the examples yourself, the code is available
<a href="https://github.com/simonmar/haskell-eXchange-2015">on
github</a>, and to run the examples you’ll need to <code>cabal install haxl sqlite</code> first, or the <code>stack</code> equivalent.</p>
<h2 id="what-is-haxl">What is Haxl?</h2>
<p><a href="https://github.com/facebook/Haxl">Haxl</a> is a library that
was developed for solving a very specific problem at Facebook: we
wanted to write purely functional code, including data-fetching
operations, and have the data-fetches automatically batched and
performed concurrently as far as possible. This is exactly what Haxl
does, and it has been <a
href="https://code.facebook.com/posts/745068642270222/fighting-spam-with-haskell/">running
in production at Facebook</a> as part of the anti-abuse infrastructure
for nearly a year now.</p>
<p>Although it was designed for this specific purpose, we can put Haxl to
use for a wide range of tasks where implicit concurrency is needed:
not just data-fetching, but other remote data operations (including
writes), and it works perfectly well for batching and overlapping
local I/O operations too. In this blog post (series) I’ll start by
reflecting on how to use Haxl for what it was intended for, and then
move on to give examples of some of the other things we can use Haxl
for. In the final example, I’ll use Haxl to implement a parallel
build system.</p>
<h2 id="example-accessing-data-for-a-blog">Example: accessing data for a blog</h2>
<p>Let’s suppose you’re writing a blog (an old-fashioned one with
dynamically-generated pages!) and you want to store the content and
metadata for the blog in a database. I’ve made an example database
called <code>blog.sqlite</code>, and we can poke around to see what’s in it:</p>
<pre><code>$ sqlite3 blog.sqlite
SQLite version 3.8.2 2013-12-06 14:53:30
Enter &quot;.help&quot; for instructions
Enter SQL statements terminated with a &quot;;&quot;
sqlite&gt; .tables
postcontent  postinfo     postviews  
sqlite&gt; .schema postinfo
CREATE TABLE postinfo(postid int, postdate timestamp, posttopic text);
sqlite&gt; .schema postcontent
CREATE TABLE postcontent(postid int, content text);
sqlite&gt; select * from postinfo;
1|2014-11-20 10:00:00|topic1
2|2014-11-20 10:01:00|topic2
3|2014-11-20 10:02:00|topic3
...
sqlite&gt; select * from postcontent;
1|example content 1
2|example content 2
3|example content 3
...</code></pre>
<p>There are a couple of tables that we’re interested in: <code>postinfo</code>,
which contains the metadata, and <code>postcontent</code>, which contains the
content. Both are indexed by <code>postid</code>, an integer key for each post.</p>
<p>Now, let’s make a little Haskell API for accessing the blog data.
I’ll do this twice: first by calling an SQL library directly, and then
using Haxl, to compare the two.</p>
<p>The code for the direct implementation is in <a href="https://github.com/simonmar/haskell-eXchange-2015/blob/3ae0e34a051201eb77721bee2e940ec1f764a0df/BlogDB.hs">BlogDB.hs</a>, using
the simple <code>sqlite</code> package for accessing the sqlite DB (there are
other more elaborate and type-safe abstractions for accessing
databases, but that is orthogonal to the issues we’re interested in
here, so I’m using <code>sqlite</code> to keep things simple).</p>
<p>In our simple API, there’s a monad, <code>Blog</code>, in which we can access the
blog data, a function <code>run</code> for executing a <code>Blog</code> computation, and
two operations, <code>getPostIds</code> and <code>getPostContent</code> for making specific
queries in the <code>Blog</code> monad. To summarise:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Blog</span> a  <span class="co">-- a monad</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="ot">run ::</span> <span class="dt">Blog</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> a</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">PostId</span> <span class="ot">=</span> <span class="dt">Int</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">PostContent</span> <span class="ot">=</span> <span class="dt">String</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="ot">getPostIds     ::</span> <span class="dt">Blog</span> [<span class="dt">PostId</span>]</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="ot">getPostContent ::</span> <span class="dt">PostId</span> <span class="ot">-&gt;</span> <span class="dt">Blog</span> <span class="dt">PostContent</span></span></code></pre></div>
<p>The implementation of the API will print out the queries it is making,
so that we can see what’s happening when we call these functions.
Let’s use this API to query our example DB:</p>
<pre><code>GHCi, version 7.11.20150924: http://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling BlogDB           ( BlogDB.hs, interpreted )
Ok, modules loaded: BlogDB.
*BlogDB&gt; run getPostIds
select postid from postinfo;
[1,2,3,4,5,6,7,8,9,10,11,12]
*BlogDB&gt; run $ getPostIds &gt;&gt;= mapM getPostContent
select postid from postinfo;
select content from postcontent where postid = 1;
select content from postcontent where postid = 2;
select content from postcontent where postid = 3;
select content from postcontent where postid = 4;
select content from postcontent where postid = 5;
select content from postcontent where postid = 6;
select content from postcontent where postid = 7;
select content from postcontent where postid = 8;
select content from postcontent where postid = 9;
select content from postcontent where postid = 10;
select content from postcontent where postid = 11;
select content from postcontent where postid = 12;
[&quot;example content 1&quot;,&quot;example content 2&quot;,&quot;example content 3&quot;,&quot;example content 4&quot;,&quot;example content 5&quot;,&quot;example content 6&quot;,&quot;example content 7&quot;,&quot;example content 8&quot;,&quot;example content 9&quot;,&quot;example content 10&quot;,&quot;example content 11&quot;,&quot;example content 12&quot;]
*BlogDB&gt; </code></pre>
<h2 id="the-problem-batching-queries">The problem: batching queries</h2>
<p>Now, the issue with this API is that every call to <code>getPostContent</code>
results in a separate <code>select</code> query. The <code>mapM</code> call in the above
example gave rise to one <code>select</code> query to fetch the contents of each
post separately.</p>
<p>Ideally, rather than</p>
<pre><code>select content from postcontent where postid = 1;
select content from postcontent where postid = 2;
select content from postcontent where postid = 3;</code></pre>
<p>What we would like to see is something like</p>
<pre><code>select content from postcontent where postid in (1,2,3);</code></pre>
<p>This kind of batching is particularly important when the database is
remote, or large, or both.</p>
<p>One way to solve the problem is to add a new API for this query, e.g.:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">multiGetPostContents ::</span> [<span class="dt">PostId</span>] <span class="ot">-&gt;</span> <span class="dt">IO</span> [<span class="dt">PostContent</span>]</span></code></pre></div>
<p>But there are several problems with this:</p>
<ul>
<li><p>Clients have to remember to call it, rather than using <code>mapM</code>.</p></li>
<li><p>If we’re fetching post content in multiple parts of our code, we
would have to arrange to do the fetching in one place and plumb the
results to the places that need the data, which might involve
restructuring our code in an unnatural way, purely for efficiency
reasons.</p></li>
<li><p>From a taste perspective, <code>multiGetPostContents</code> duplicates
the functionality of <code>mapM getPostContent</code>, which is ugly.</p></li>
</ul>
<p>This is the problem that Haxl was designed to solve. We’ll look at
how to implement this API on top of Haxl in the next couple of sections, but
just to demonstrate the effect, let’s try it out first:</p>
<pre><code>Prelude&gt; :l HaxlBlog
[1 of 2] Compiling BlogDataSource   ( BlogDataSource.hs, interpreted )
[2 of 2] Compiling HaxlBlog         ( HaxlBlog.hs, interpreted )
Ok, modules loaded: HaxlBlog, BlogDataSource.
*HaxlBlog&gt; run $ getPostIds &gt;&gt;= mapM getPostContent
select postid from postinfo;
select postid,content from postcontent where postid in (12,11,10,9,8,7,6,5,4,3,2,1)
[&quot;example content 1&quot;,&quot;example content 2&quot;,&quot;example content 3&quot;,&quot;example content 4&quot;,&quot;example content 5&quot;,&quot;example content 6&quot;,&quot;example content 7&quot;,&quot;example content 8&quot;,&quot;example content 9&quot;,&quot;example content 10&quot;,&quot;example content 11&quot;,&quot;example content 12&quot;]
*HaxlBlog&gt;</code></pre>
<p>Even though we used the standard <code>mapM</code> function to perform multiple
<code>getPostContent</code> calls, they were batched together and executed as a
single <code>select</code> query.</p>
<h2 id="introduction-to-haxl">Introduction to Haxl</h2>
<p>You can find the full documentation for Haxl <a
href="http://hackage.haskell.org/package/haxl">here</a>, but in this
section I’ll walk through the most important parts, and then we’ll
implement our own data source for the blog database.</p>
<p>Haxl is a Monad:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">GenHaxl</span> u a</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> (<span class="dt">GenHaxl</span> u)</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Applicative</span> (<span class="dt">GenHaxl</span> u)</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Monad</span> (<span class="dt">GenHaxl</span> u)</span></code></pre></div>
<p>It is generalised over a type variable <code>u</code>, which can be used to pass
around some user-defined data throughout a Haxl computation. For
example, in our application at Facebook we instantiate <code>u</code> with the
data passed in with the request that we’re processing.</p>
<p>Essentially there is a <code>Reader</code> monad built-in to Haxl. (this might
not be the cleanest design, but it is the way it is.) Throughout the
following we’re not going to be using the <code>u</code> parameter, and I’ll
often instantiate it with <code>()</code>, like this:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Haxl</span> a <span class="ot">=</span> <span class="dt">GenHaxl</span> () a</span></code></pre></div>
<p>The most important operation in Haxl is <code>dataFetch</code>:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">dataFetch ::</span> (<span class="dt">DataSource</span> u r, <span class="dt">Request</span> r a) <span class="ot">=&gt;</span> r a <span class="ot">-&gt;</span> <span class="dt">GenHaxl</span> u a</span></code></pre></div>
<p>This is how a user of Haxl fetches some data from a <em>data source</em>
(in our example, from the blog database). The Haxl library is designed
so that you can use multiple user-defined data sources simultaneously.</p>
<p>The argument of type <code>r a</code> is a request, where <code>r</code> is the request type
constructor, and <code>a</code> is the type of the result we’re expecting. The
<code>r</code> type is defined by the data source you’re using, which should also
supply appropriate instances of <code>DataSource</code> and <code>Request</code>. For
example, the request type for our blog looks like this:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">BlogRequest</span> a <span class="kw">where</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>  <span class="dt">FetchPosts</span><span class="ot">       ::</span> <span class="dt">BlogRequest</span> [<span class="dt">PostId</span>]</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>  <span class="dt">FetchPostContent</span><span class="ot"> ::</span> <span class="dt">PostId</span> <span class="ot">-&gt;</span> <span class="dt">BlogRequest</span> <span class="dt">PostContent</span></span></code></pre></div>
<p>Note that we’re using a GADT, because we have two different requests
which each produce a result of a different type.</p>
<p>Next, our request type needs to satisfy the <code>Request</code> constraint.
<code>Request</code> is defined like this:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Request</span> req a <span class="ot">=</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>  ( <span class="dt">Eq</span> (req a)</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>  , <span class="dt">Hashable</span> (req a)</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>  , <span class="dt">Typeable</span> (req a)</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>  , <span class="dt">Show</span> (req a)</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>  , <span class="dt">Show</span> a</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>  )</span></code></pre></div>
<p>That is, it is a synonym for a handful of type class constraints that
are all straightforward boilerplate. (defining constraint-synonyms
like this requires the <code>ConstraintKinds</code> extension, and it’s a handy
trick to know).</p>
<p>The other constraint we need to satisfy is <code>DataSource</code>, which is
defined like this:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> (<span class="dt">DataSourceName</span> req, <span class="dt">StateKey</span> req, <span class="dt">Show1</span> req)</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>       <span class="ot">=&gt;</span> <span class="dt">DataSource</span> u req <span class="kw">where</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>  fetch</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="ot">    ::</span> <span class="dt">State</span> req</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> <span class="dt">Flags</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> u</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> [<span class="dt">BlockedFetch</span> req]</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a>    <span class="ot">-&gt;</span> <span class="dt">PerformFetch</span></span></code></pre></div>
<p><code>DataSource</code> has a single method, <code>fetch</code>, which is used by Haxl to
execute requests for this data source. The key point is that <code>fetch</code>
is passed a list of <code>BlockedFetch</code> values, each of which contains
a single request. The <code>BlockedFetch</code> type is defined like this:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">BlockedFetch</span> r <span class="ot">=</span> <span class="kw">forall</span> a<span class="op">.</span> <span class="dt">BlockedFetch</span> (r a) (<span class="dt">ResultVar</span> a)</span></code></pre></div>
<p>That is, it contains a request of type <code>r a</code>, and a <code>ResultVar a</code>
which is a container to store the result in. The <code>fetch</code>
implementation can store the result using one of these two functions:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">putSuccess ::</span> <span class="dt">ResultVar</span> a <span class="ot">-&gt;</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ot">putFailure ::</span> (<span class="dt">Exception</span> e) <span class="ot">=&gt;</span> <span class="dt">ResultVar</span> a <span class="ot">-&gt;</span> e <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span></code></pre></div>
<p>Because <code>fetch</code> is passed a <em>list</em> of <code>BlockedFetch</code>, it can collect
together requests and satisfy them using a single query to the
database, or perform them concurrently, or use whatever methods are
available for performing multiple requests simultaneously.</p>
<p>The <code>fetch</code> method returns <code>PerformFetch</code>, which is defined like this:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">PerformFetch</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>  <span class="ot">=</span> <span class="dt">SyncFetch</span>  (<span class="dt">IO</span> ())</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>  <span class="op">|</span> <span class="dt">AsyncFetch</span> (<span class="dt">IO</span> () <span class="ot">-&gt;</span> <span class="dt">IO</span> ())</span></code></pre></div>
<p>For our purposes here, we’ll only use <code>SyncFetch</code>, which should contain an
<code>IO</code> action whose job it is to fill in all the results in the
<code>BlockedFetch</code>es before it returns. The alternative <code>AsyncFetch</code> can
be used to overlap requests from multiple data sources.</p>
<p>Lastly, let’s talk about state. Most data sources will need some
state; in the case of our blog database we need to keep track of the
handle to the database so that we don’t have to open a fresh one each
time we make some queries. In Haxl, data source state is represented
using an associated data type called <code>State</code>, which is defined by the
<code>StateKey</code> class:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Typeable</span> f <span class="ot">=&gt;</span> <span class="dt">StateKey</span> (<span class="ot">f ::</span> <span class="op">*</span> <span class="ot">-&gt;</span> <span class="op">*</span>) <span class="kw">where</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">data</span> <span class="dt">State</span> f</span></code></pre></div>
<p>So every data source with request type <code>req</code> defines a state of type
<code>State req</code>, which can of course be empty if the data source doesn’t
need any state. Our blog data source defines it like this:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">StateKey</span> <span class="dt">BlogRequest</span> <span class="kw">where</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">data</span> <span class="dt">State</span> <span class="dt">BlogRequest</span> <span class="ot">=</span> <span class="dt">BlogDataState</span> <span class="dt">SQLiteHandle</span></span></code></pre></div>
<p>The <code>State req</code> for a data source is passed to <code>fetch</code> each time it is
called.</p>
<p>The full implementation of our example data source is in <a
href="https://github.com/simonmar/haskell-eXchange-2015/blob/3ae0e34a051201eb77721bee2e940ec1f764a0df/BlogDataSource.hs">BlogDataSource.hs</a>.</p>
<h2 id="how-do-we-run-some-haxl">How do we run some Haxl?</h2>
<p>There’s a <code>runHaxl</code> function:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runHaxl ::</span> <span class="dt">Env</span> u <span class="ot">-&gt;</span> <span class="dt">GenHaxl</span> u a <span class="ot">-&gt;</span> <span class="dt">IO</span> a</span></code></pre></div>
<p>Which needs something of type <code>Env u</code>. This is the “environment” that
a Haxl computation runs in, and contains various things needed by the
framework. It also contains the data source state, and to build an
<code>Env</code> we need to supply the initial state. Here’s how to get an <code>Env</code>:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">initEnv ::</span> <span class="dt">StateStore</span> <span class="ot">-&gt;</span> u <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">Env</span> u)</span></code></pre></div>
<p>The <code>StateStore</code> contains the states for all the data sources we’re
using. It is constructed with these two functions:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">stateEmpty ::</span> <span class="dt">StateStore</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="ot">stateSet ::</span> <span class="dt">StateKey</span> f <span class="ot">=&gt;</span> <span class="dt">State</span> f <span class="ot">-&gt;</span> <span class="dt">StateStore</span> <span class="ot">-&gt;</span> <span class="dt">StateStore</span></span></code></pre></div>
<p>To see how to put these together, take a look at <a
href="https://github.com/simonmar/haskell-eXchange-2015/blob/3ae0e34a051201eb77721bee2e940ec1f764a0df/HaxlBlog.hs">HaxlBlog.hs</a>.</p>
<h2 id="trying-it-out">Trying it out</h2>
<p>We saw a small example of our Haxl data source working earlier, but
just to round off this first part of the series and whet your appetite
for the next part, here are a couple more examples.</p>
<p>Haxl batches things together when we use the <code>Applicative</code> operators:</p>
<pre><code>*HaxlBlog&gt; run $ (,) &lt;$&gt; getPostContent 1 &lt;*&gt; getPostContent 2
select postid,content from postcontent where postid in (2,1)
(&quot;example content 1&quot;,&quot;example content 2&quot;)</code></pre>
<p>Even if we have multiple <code>mapM</code> calls, they get batched together:</p>
<pre><code>*HaxlBlog&gt; run $ (,) &lt;$&gt; mapM getPostContent [1..3] &lt;*&gt; mapM getPostContent [4..6]
select postid,content from postcontent where postid in (6,5,4,3,2,1)
([&quot;example content 1&quot;,&quot;example content 2&quot;,&quot;example content 3&quot;],[&quot;example content 4&quot;,&quot;example content 5&quot;,&quot;example content 6&quot;])</code></pre>
<p>In Part 2 we’ll talk more about batching, and introduce the upcoming
<code>ApplicativeDo</code> extension which will allow Haxl to automatically
parallelize sequential-looking <code>do</code>-expressions.</p>
</div>
]]></summary>
</entry>

</feed>
