<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ian Howson on Ian Howson</title>
    <link>https://ianhowson.com/rss.xml</link>
    <description>Recent content in Ian Howson on Ian Howson</description>
    <generator>Hugo -- gohugo.io</generator>
    <managingEditor>ian@mutexlabs.com (Ian Howson)</managingEditor>
    <webMaster>ian@mutexlabs.com (Ian Howson)</webMaster>
    <lastBuildDate>2017-11-24 10:52:17.835451 +0100 CET m=+0.221773031</lastBuildDate>
    <atom:link href="https://ianhowson.com/rss.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Why the MacBook Pro 15&#34; has a discrete GPU</title>
      <link>https://ianhowson.com/blog/macbook-pro-15-discrete-gpu/</link>
      <pubDate>Thu, 19 Jan 2017 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/macbook-pro-15-discrete-gpu/</guid>
      <description>

&lt;p&gt;For a long time, I couldn&amp;rsquo;t figure out why the 2016 MBPr15 had a discrete GPU. It&amp;rsquo;s not significantly faster than the integrated GPU, it&amp;rsquo;s no good for games, there aren&amp;rsquo;t many games on Mac &lt;em&gt;anyway&lt;/em&gt;, and it doesn&amp;rsquo;t provide any significant improvement for creatives (Photoshop, Final Cut, etc).&lt;/p&gt;

&lt;p&gt;It comes at a large cost. It takes a huge amount of space, reducing battery capacity and increasing weight. It requires extra cooling, consuming space, reducing battery capacity and increasing weight. The extra power consumption&amp;hellip; you guessed it&amp;hellip; costs battery capacity and increases weight. dGPUs on the previous 15&amp;rdquo; line have been a nuisance (massive, unexplained power consumption, lower customer satisfaction). And because it&amp;rsquo;s AMD, you can&amp;rsquo;t do any significant GPGPU workloads (NVIDIA&amp;rsquo;s CUDA is miles ahead of AMD&amp;rsquo;s stuff).&lt;/p&gt;

&lt;p&gt;I think I&amp;rsquo;ve worked out why Apple had to put a dGPU on the MBP15.&lt;/p&gt;

&lt;h2 id=&#34;5k-is-apple-s-future&#34;&gt;5K is Apple&amp;rsquo;s future&lt;/h2&gt;

&lt;p&gt;Windows machines are going to standardise on 4K. It will be cheap, thanks to consumer TVs, and it&amp;rsquo;s well supported already by hardware.&lt;/p&gt;

&lt;p&gt;Apple is pushing 5K.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical challenges will keep the PC manufacturers out&lt;/li&gt;
&lt;li&gt;27&amp;rdquo; is about as big as you can make a display before it&amp;rsquo;s impractically large&lt;/li&gt;
&lt;li&gt;2560px wide (unscaled) is the right pixel density for a 27&amp;rdquo; display&lt;/li&gt;
&lt;li&gt;For video editors, 5K lets you run an unscaled 4K video and have some space for software controls and chrome&lt;/li&gt;
&lt;li&gt;Photo and video work needs to be pixel-perfect (no scaling) so you can&amp;rsquo;t just scale 4K to 2560px wide&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1920px wide is too coarse on 27&amp;rdquo; and 3840px is too fine. Windows &lt;em&gt;still&lt;/em&gt; lags with HiDPI support (it&amp;rsquo;s basically there, but there are a lot of rough edges). With the sole exception of the MacBook Air 13&amp;rdquo;, Apple now has HiDPI on &lt;em&gt;every single device&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;When Apple released the 5K iMac, they had to design and manufacture custom chips. No PC manufacturer wants to do that. They ship whatever they can buy off the shelf.&lt;/p&gt;

&lt;h2 id=&#34;5k-requires-multistream&#34;&gt;5K requires multistream&lt;/h2&gt;

&lt;p&gt;Current 5K tech requires two display streams. It&amp;rsquo;s not transmitted over the wire as one super-high-resolution display. It&amp;rsquo;s split into two smaller ones and reconstructed on the display. From the PC&amp;rsquo;s point of view (but hidden by the OS), a single 5K display is two small displays arranged a certain way.&lt;/p&gt;

&lt;h2 id=&#34;intel-s-integrated-gpus-have-a-maximum-of-three-outputs&#34;&gt;Intel&amp;rsquo;s integrated GPUs have a maximum of three outputs&lt;/h2&gt;

&lt;p&gt;You can see where I&amp;rsquo;m going with this. The iGPUs have three outputs. You need two to run a 5K display. So with the iGPU, you can run the internal display and &lt;em&gt;one&lt;/em&gt; external 5K display.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s kindy shoddy for a top-of-the-line machine; people expect to run two external displays. So the only thing Apple can do is add more outputs. How do you add more outputs? You need to add a whole GPU!&lt;/p&gt;

&lt;p&gt;AMD was probably chosen because NVIDIA&amp;rsquo;s parts usually have the same three output limitation. (Possibly one could run the iGPU and dGPU together, but I imagine there would be technical headaches and the performance gap would be difficult to explain). AMD&amp;rsquo;s GPUs typically support six outputs. That&amp;rsquo;s exactly three 5K displays.&lt;/p&gt;

&lt;p&gt;Also, you don&amp;rsquo;t want to add a &lt;em&gt;big&lt;/em&gt; GPU on a laptop (unless it&amp;rsquo;s gaming specific). GPU peak power consumption is &lt;em&gt;way&lt;/em&gt; higher than for CPUs. You need to design the machine to remove the consequent heat output. Some software runs the GPU hard unneccessarily, just as it does the CPU. This isn&amp;rsquo;t obvious to users, but they complain that random software (like Flash) kills their battery life and makes the machine hot. For these users (the vast majority!) the best thing you can do is &lt;em&gt;limit their power consumption&lt;/em&gt; by giving them underpowered hardware. They probably won&amp;rsquo;t notice and can&amp;rsquo;t do much damage.&lt;/p&gt;

&lt;h2 id=&#34;the-future&#34;&gt;The future?&lt;/h2&gt;

&lt;p&gt;Intel&amp;rsquo;s iGPUs will probably eventually support more outputs (and/or with single-stream 5K). The dGPU will then be totally redundant and can be removed for weight reduction or battery life improvement. This won&amp;rsquo;t be a reality for a few more years, at least. In the meantime, Apple has a unique feature and competitive advantage.&lt;/p&gt;

&lt;p&gt;Intel is putting more and more GPU power onto their regular CPUs all the time. There&amp;rsquo;s not really a middle ground any more. Current iGPUs are good enough for all non-gaming tasks. Gaming requires a high-power dGPU, and Apple&amp;rsquo;s not catering to that market.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GRAPE: the Generic Risk Assessment Process Explained</title>
      <link>https://ianhowson.com/blog/grape-risk-assessment/</link>
      <pubDate>Tue, 10 Jan 2017 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/grape-risk-assessment/</guid>
      <description>

&lt;style&gt;
.shade {
  background-color: #ccddff !important;
}

.hide {
  display: none !important;
}
&lt;/style&gt;

&lt;p&gt;Practically every risk assessment process is the same. Rather than reading boring stuff, use my handy checklist. I will then authorise you an additional fifteen (15) minutes of Facebook time to be used outside of working hours.&lt;/p&gt;

&lt;h2 id=&#34;1-write-down-all-of-the-things-that-can-go-wrong&#34;&gt;1. Write down all of the things that can go wrong&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;All&lt;/em&gt; of them.&lt;/p&gt;

&lt;p&gt;Come on, use your imagination.&lt;/p&gt;

&lt;h2 id=&#34;2-for-each-thing-that-can-go-wrong-tell-me&#34;&gt;2. For each thing that can go wrong, tell me:&lt;/h2&gt;

&lt;h3 id=&#34;what-is-it&#34;&gt;What is it?&lt;/h3&gt;

&lt;div class=&#39;ui large fluid input&#39;&gt;&lt;input type=&#39;text&#39; placeholder=&#39;Aliens come from space and turn cows into chickens&#39;&gt;&lt;/div&gt;

&lt;!-- FIXME: you need to add the &#39;input&#39; module to semantic to make this work --&gt;

&lt;h3 id=&#34;how-likely-is-it-to-happen-likelihood&#34;&gt;How likely is it to happen? (Likelihood)&lt;/h3&gt;

&lt;div class=&#39;ui middle aligned selection vertical list&#39;&gt;
  &lt;div class=&#39;item lik&#39; data-val=&#39;1&#39;&gt;
    &lt;img class=&#39;ui avatar image&#39; src=&#39;https://ianhowson.com/images/grape-trump.png&#39;&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Unlikely&lt;/div&gt;
      &lt;div class=&#39;description&#39;&gt;e.g. Trump gets elected President&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;

&lt;p&gt;&lt;div class=&#39;item lik&#39; data-val=&#39;2&#39;&gt;
    &lt;img class=&#39;ui avatar image&#39; src=&#39;https://ianhowson.com/images/grape-cake.png&#39;&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Moderate&lt;/div&gt;
      &lt;div class=&#39;description&#39;&gt;You eat a second dessert with dinner tonight&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;&lt;/p&gt;

&lt;p&gt;&lt;div class=&#39;item lik&#39; data-val=&#39;3&#39;&gt;
    &lt;img class=&#39;ui avatar image&#39; src=&#39;https://ianhowson.com/images/grape-bear.png&#39;&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Likely&lt;/div&gt;
      &lt;div class=&#39;description&#39;&gt;Does the pope shit in the woods?&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;h3 id=&#34;supposing-it-happens-how-bad-will-it-be-impact&#34;&gt;Supposing it happens, how bad will it be? (Impact)&lt;/h3&gt;

&lt;div class=&#39;ui middle aligned selection vertical list&#39;&gt;
  &lt;div class=&#39;item imp&#39; data-val=&#39;1&#39;&gt;
    &lt;img class=&#39;ui avatar image&#39; src=&#39;https://ianhowson.com/images/grape-bagel.png&#39;&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Mild&lt;/div&gt;
      &lt;div class=&#39;description&#39;&gt;Somebody ate the last bagel&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;

&lt;p&gt;&lt;div class=&#39;item imp&#39; data-val=&#39;2&#39;&gt;
    &lt;i class=&#39;big blue thumbs down icon&#39;&gt;&lt;/i&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Moderate&lt;/div&gt;
      &lt;div class=&#39;description&#39;&gt;Facebook is down&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;&lt;/p&gt;

&lt;p&gt;&lt;div class=&#39;item imp&#39; data-val=&#39;3&#39;&gt;
    &lt;i class=&#39;big red fire icon&#39;&gt;&lt;/i&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Severe&lt;/div&gt;
      &lt;div class=&#39;description&#39;&gt;I&amp;rsquo;d lose my job. Oh, and people might die or something.&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;h3 id=&#34;will-you-go-to-prom-with-me-seriously&#34;&gt;Will you go to prom with me? (Seriously)&lt;/h3&gt;

&lt;div class=&#39;ui selection horizontal list&#39;&gt;
  &lt;div class=&#39;item prom-yes&#39;&gt;
    &lt;i class=&#39;large radio icon prom-yes-icon&#39;&gt;&lt;/i&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;Yes&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&#39;item prom-no&#39;&gt;
    &lt;i class=&#39;large radio icon&#39;&gt;&lt;/i&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;header&#39;&gt;No&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&#39;item prom-retort&#39; style=&#39;display: none&#39;&gt;
    &lt;i class=&#39;icon&#39;&gt;&lt;/i&gt;
    &lt;div class=&#39;content&#39;&gt;
      &lt;div class=&#39;description&#39;&gt;&lt;strong&gt;Did you mean: &lt;i&gt;Yes&lt;/i&gt;&lt;/strong&gt;&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id=&#34;3-how-bad-is-it-risk&#34;&gt;3. How bad is it? (Risk)&lt;/h2&gt;

&lt;div class=&#39;ui info message risk&#39;&gt;How should I know? Just check some boxes!&lt;/div&gt;

&lt;div class=&#39;hide ui icon green message risk&#39; data-val=&#39;2&#39;&gt;
  &lt;i class=&#39;tiny thumbs up icon&#39;&gt;&lt;/i&gt;
  &lt;div class=&#34;content&#34;&gt;
    &lt;div class=&#34;header&#34;&gt;It&#39;s fine&lt;/div&gt;
    &lt;p&gt;Go and sleep soundly.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#39;hide ui icon yellow message risk&#39; data-val=&#39;3&#39;&gt;
  &lt;i class=&#39;tiny meh icon&#39;&gt;&lt;/i&gt;
  &lt;div class=&#34;content&#34;&gt;
    &lt;div class=&#34;header&#34;&gt;Low&lt;/div&gt;
    &lt;p&gt;It&#39;s probably nothing... but you should go check it out.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#39;hide ui icon orange message risk&#39; data-val=&#39;4&#39;&gt;
  &lt;i class=&#39;tiny calendar icon&#39;&gt;&lt;/i&gt;
  &lt;div class=&#34;content&#34;&gt;
    &lt;div class=&#34;header&#34;&gt;Moderate&lt;/div&gt;
    &lt;p&gt;Time to call a meeting.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#39;hide ui icon red message risk&#39; data-val=&#39;5&#39;&gt;
  &lt;img class=&#39;ui tiny circular image&#39; src=&#39;https://ianhowson.com/images/grape-kard.png&#39; style=&#39;margin-right: 24px&#39;&gt; 
  &lt;div class=&#34;content&#34;&gt;
    &lt;div class=&#34;header&#34;&gt;Super bad&lt;/div&gt;
    &lt;p&gt;Like Keeping Up With The Kardashians season 12.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#39;hide ui icon black message risk&#39; data-val=&#39;6&#39;&gt;
  &lt;i class=&#39;tiny red bomb icon&#39;&gt;&lt;/i&gt;
  &lt;div class=&#34;content&#34;&gt;
    &lt;div class=&#34;header&#34;&gt;Extreme&lt;/div&gt;
    &lt;p&gt;Are you sure the building isn&#39;t on fire &lt;i&gt;right now&lt;/i&gt;?&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#39;ui segment&#39;&gt;
  &lt;div class=&#34;ui equal width four column center aligned padded grid&#34;&gt;
    &lt;div class=&#34;row&#34;&gt;
      &lt;div class=&#39;column&#39;&gt;&lt;strong&gt;Impact&lt;/strong&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&#34;row&#34;&gt;
      &lt;div class=&#34;column&#34;&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;green column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;11&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;yellow column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;12&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;orange column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;13&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&#34;row&#34;&gt;
      &lt;div class=&#34;column&#34;&gt;&lt;strong&gt;Likelihood&lt;/strong&gt;&lt;/div&gt;
      &lt;div class=&#34;yellow column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;21&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;orange column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;22&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;red column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;23&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&#34;row&#34;&gt;
      &lt;div class=&#34;column&#34;&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;orange column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;31&#39;&gt;&lt;/i&gt;&amp;nbsp; &lt;/div&gt;
      &lt;div class=&#34;red column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;32&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
      &lt;div class=&#34;black column&#34;&gt;&lt;i class=&#39;hide check circle icon mat&#39; data-val=&#39;33&#39;&gt;&lt;/i&gt;&amp;nbsp;&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;h2 id=&#34;4-bonus-points-can-you-do-anything-to-control-the-risk&#34;&gt;4. Bonus points: can you do anything to &lt;em&gt;control&lt;/em&gt; the risk?&lt;/h2&gt;

&lt;p&gt;Go back and change the likelihood and impact figures yourself. What am I, your mother?&lt;/p&gt;

&lt;h2 id=&#34;5-are-you-happy-with-your-current-level-of-risk&#34;&gt;5. Are you happy with your current level of risk?&lt;/h2&gt;

&lt;p&gt;You might want to tell an adult. Your manager, your CEO or your dog are good candidates. Really, tell as many people as you can so that when they come looking for a scapegoat, you have someone else to point to.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;disclaimer&#34;&gt;Disclaimer&lt;/h2&gt;

&lt;p&gt;This is meant to be funny. It&amp;rsquo;s still a better process than no process. Risk management is serious business and you should take all due care. I am not a lawyer, get professional advice, take two aspirin and call me in the morning, etc.&lt;/p&gt;

&lt;script src=&#34;https://cdnjs.cloudflare.com/ajax/libs/jquery/1.12.4/jquery.min.js&#34; integrity=&#34;sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=&#34; crossorigin=&#34;anonymous&#34;&gt;&lt;/script&gt;
&lt;script&gt;
$(document).ready(function () {
  // Totally aware that this blows. Did you pay for this? No? Then quit yer complaining.

  var imp = 0;
  var lik = 0;

  var updateRisk = function () {
    if (imp == 0 || lik == 0) {
        return;
    }

    // You&#39;re not seriously thinking of using this for your process, are you? Don&#39;t do this!
    var risk = parseInt(imp) + parseInt(lik);

    $(&#39;.risk&#39;).addClass(&#39;hide&#39;);
    $(&#39;.risk[data-val=&#39; + risk + &#39;]&#39;).removeClass(&#39;hide&#39;);

    $(&#39;.mat&#39;).addClass(&#39;hide&#39;);
    $(&#39;.mat[data-val=&#39; + imp + lik + &#39;]&#39;).removeClass(&#39;hide&#39;);
  };

  $(&#39;.prom-no&#39;).click(function () {
    $(&#39;.prom-retort&#39;).show();
  });

  $(&#39;.prom-yes&#39;).click(function () {
    $(&#39;.prom-yes-icon&#39;).addClass(&#39;red heart&#39;);
    $(&#39;.prom-yes-icon&#39;).removeClass(&#39;radio&#39;);
  });

  $(&#39;.lik&#39;).click(function (event) {
    var obj = $(event.currentTarget);
    $(&#39;.lik&#39;).removeClass(&#39;shade&#39;);
    obj.addClass(&#39;shade&#39;);
    imp = obj.attr(&#39;data-val&#39;);
    updateRisk();
  });

  $(&#39;.imp&#39;).click(function (event) {
    var obj = $(event.currentTarget);
    $(&#39;.imp&#39;).removeClass(&#39;shade&#39;);
    obj.addClass(&#39;shade&#39;);
    lik = obj.attr(&#39;data-val&#39;);
    updateRisk();
  });
});
&lt;/script&gt;
</description>
    </item>
    
    <item>
      <title>Attacks on embedded systems</title>
      <link>https://ianhowson.com/iot/attacks-on-embedded-systems/</link>
      <pubDate>Fri, 06 Jan 2017 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/attacks-on-embedded-systems/</guid>
      <description>

&lt;h2 id=&#34;system&#34;&gt;System&lt;/h2&gt;

&lt;p&gt;I&amp;rsquo;m not going into too much detail here as they&amp;rsquo;re well covered in other material. Consider attacks on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The physical device &amp;ndash; what physical controls does the device to prevent tampering?

&lt;ul&gt;
&lt;li&gt;Tamper switches&lt;/li&gt;
&lt;li&gt;Locks&lt;/li&gt;
&lt;li&gt;Armor (e.g. safes)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Network interfaces

&lt;ul&gt;
&lt;li&gt;Particularly those with no encryption &amp;ndash; Nordic, BLE, Zigbee&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Web interfaces&lt;/li&gt;
&lt;li&gt;Software interfaces&lt;/li&gt;
&lt;li&gt;Social engineering (your staff)&lt;/li&gt;
&lt;li&gt;Cryptographic protocols. Many embedded systems use cryptosystems with known attacks, very short key lengths, or simply incorrect implementations. Because you have control over the host hardware, timing attacks are also much easier to execute.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;extracting-firmware&#34;&gt;Extracting firmware&lt;/h2&gt;

&lt;p&gt;The chief weakness of an embedded device is that it&amp;rsquo;s physically not in your control. The attacker has total control of a single device, and if they learn enough about the software stack, they can develop exploits that work across many devices.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bypassing microcontroller code locks&lt;/li&gt;
&lt;li&gt;RAM/parallel bus sniffing&lt;/li&gt;
&lt;li&gt;Read firmware through the bootloader&lt;/li&gt;
&lt;li&gt;Reading a serial flash chip without removing it from the PCB&lt;/li&gt;
&lt;li&gt;Reading code from microcontrollers&lt;/li&gt;
&lt;li&gt;Remove a flash chip from the PCB&lt;/li&gt;
&lt;li&gt;SPI bus sniffing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;extracting-keys&#34;&gt;Extracting keys&lt;/h2&gt;

&lt;p&gt;Many embedded devices carry valuable private crypto keys. Methods to extract these include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Acoustic key extraction from chips&lt;/li&gt;
&lt;li&gt;Differential power analysis&lt;/li&gt;
&lt;li&gt;Glitching&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;attacks-on-chips&#34;&gt;Attacks on chips&lt;/h2&gt;

&lt;p&gt;Very few ICs are designed with security in mind. They contain valuable firmware and crypto keys. Methods to attack them include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decapping&lt;/li&gt;
&lt;li&gt;Microprobing&lt;/li&gt;
&lt;li&gt;Optical key extraction from chip backside&lt;/li&gt;
&lt;li&gt;Optical ROM extraction&lt;/li&gt;
&lt;li&gt;Partial flash reprogramming through light exposure&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;other&#34;&gt;Other&lt;/h2&gt;

&lt;p&gt;The hardware of embedded devices can be manipulated in interesting ways to expose security problems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connecting debuggers&lt;/li&gt;
&lt;li&gt;Device cloning (by manufacturer or third party)&lt;/li&gt;
&lt;li&gt;Finding JTAG ports&lt;/li&gt;
&lt;li&gt;Finding serial ports&lt;/li&gt;
&lt;li&gt;Identity cloning by connecting multiple devices to the same legitimate identity chip&lt;/li&gt;
&lt;li&gt;Inhibiting clocks&lt;/li&gt;
&lt;li&gt;Inhibiting reset&lt;/li&gt;
&lt;li&gt;Manipulating RTCs&lt;/li&gt;
&lt;li&gt;Modifying serial numbers and identity chips&lt;/li&gt;
&lt;/ul&gt;

&lt;!-- TODO fill in a tutorial for each or at least a list of references and papers --&gt;

&lt;h2 id=&#34;firmware-analysis&#34;&gt;Firmware analysis&lt;/h2&gt;

&lt;p&gt;Firmware can be obtained from a running device or as an update package from the vendor. The challenge, then, is to make sense of it and find security flaws.&lt;/p&gt;

&lt;p&gt;Binary analysis is well covered in the existing reverse engineering literature, but there are some embedded-specific tools available.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;binwalk&lt;/li&gt;
&lt;li&gt;decompilers&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Is responsible disclosure appropriate for IoT devices?</title>
      <link>https://ianhowson.com/iot/responsible-disclosure/</link>
      <pubDate>Tue, 03 Jan 2017 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/responsible-disclosure/</guid>
      <description>

&lt;p&gt;Story time.&lt;/p&gt;

&lt;p&gt;When I was an undergraduate, we did a research project. I looked at how difficult it was to break symmetric crypto using FPGAs. Nothing super novel, but it added some data points to the literature.&lt;/p&gt;

&lt;p&gt;We presented these projects to industry. An IBM employee asked, &amp;ldquo;why are you &lt;em&gt;breaking&lt;/em&gt; crypto? Are you a terrorist?&amp;rdquo; I did a double take. She was serious.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;d forgotten that regular people on the street haven&amp;rsquo;t internalised why we intentionally break things. So let&amp;rsquo;s lay it out there and see if it still works.&lt;/p&gt;

&lt;h2 id=&#34;why-we-break-things&#34;&gt;Why we break things&lt;/h2&gt;

&lt;p&gt;We don&amp;rsquo;t know how secure our {software|devices|systems} are. We can&amp;rsquo;t &lt;em&gt;prove&lt;/em&gt; the &lt;em&gt;absence&lt;/em&gt; of security problems. We &lt;em&gt;can&lt;/em&gt; prove the &lt;em&gt;presence&lt;/em&gt; of security problems.&lt;/p&gt;

&lt;p&gt;We don&amp;rsquo;t know what other people know, either. If we have reports of what is &lt;em&gt;broken&lt;/em&gt;, we conservatively assume that attackers already &lt;em&gt;know this&lt;/em&gt; and assess our risk accordingly. Absence of reports gives us weak  confidence that a system has not been broken (but &lt;a href=&#34;https://ianhowson.com/iot/negative-reporting/&#34;&gt;negative reporting&lt;/a&gt; is not commonplace at this time, so it doesn&amp;rsquo;t tell us much).&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s take a concrete example: breaking DES. I want to use DES in an application. My one-line threat model is &amp;ldquo;I want it to withstand attack by opposing militaries&amp;rdquo;. Should I use DES?&lt;/p&gt;

&lt;p&gt;Well, the literature (my paper!) says that back in 2003, you can break a DES-encrypted message with a $100 FPGA board in about three weeks. My 2016 estimate is that you can break DES in about 6 seconds, if you can muster the computing power of the Bitcoin network.&lt;/p&gt;

&lt;p&gt;A large proportion of the Bitcoin network is controlled by one government. If I needed to keep something secret from an opposing military for more than 6 seconds, I would not use DES.&lt;/p&gt;

&lt;p&gt;If my threat model says &amp;ldquo;I want it to withstand attack by college students&amp;rdquo;, then yeah&amp;hellip; you can probably use DES! Maybe not against Engineering students, but most of them, sure.&lt;/p&gt;

&lt;p&gt;Crypto and attacks by foreign powers are a clear-cut case. We know that computing and attacks improve constantly. Defenders (people, governments, militaries, companies) have a legitimate case to use cryptography. Attacking cryptography and publishing results of breaks is therefore valuable.&lt;/p&gt;

&lt;h2 id=&#34;responsible-disclosure&#34;&gt;Responsible disclosure&lt;/h2&gt;

&lt;p&gt;Another common case is where we attack software &amp;ndash; say, a public-facing web application. Let&amp;rsquo;s call it BookFace. You attack BookFace wanting to know if your private data is safe there. You find problems. What do you do with this information?&lt;/p&gt;

&lt;h3 id=&#34;keep-it-to-yourself&#34;&gt;Keep it to yourself?&lt;/h3&gt;

&lt;p&gt;Don&amp;rsquo;t tell anyone. This does the public a disservice; an attack is possible but they don&amp;rsquo;t know about it. Most government infosec organisations (e.g. the NSA) take this approach. They want to keep attacks to themselves so they can be used later.&lt;/p&gt;

&lt;h3 id=&#34;sell-it&#34;&gt;Sell it?&lt;/h3&gt;

&lt;p&gt;People will buy information on how to perform an attack. This is where malware black markets come from. This puts a &lt;em&gt;lower bound on the value of a discovered attack&lt;/em&gt;. If you (the vendor) want some other form of disclosure, you&amp;rsquo;d better make sure that what you&amp;rsquo;re offering is better than the market price of the attack. Not all attackers are motivated by the mad props they get by publishing papers.&lt;/p&gt;

&lt;h3 id=&#34;tell-the-world-about-it&#34;&gt;Tell the world about it?&lt;/h3&gt;

&lt;p&gt;You could just tell the world. You get your mad props immediately. Like in the academic world, there is no risk that someone else will publish your discovery and steal your mad props.&lt;/p&gt;

&lt;p&gt;Crypto research takes the &amp;lsquo;tell the world&amp;rsquo; approach. Crypto is widely deployed and there&amp;rsquo;s no single entity that can do anything about a break. Adversaries are usually governments. You might as well just publish the result and flee to a country with no extradition laws.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve just breached BookFace, anyone can now breach BookFace and steal people&amp;rsquo;s private data. You might have good intentions, but not everyone does.&lt;/p&gt;

&lt;h3 id=&#34;tell-the-vendor-about-it&#34;&gt;Tell the vendor about it?&lt;/h3&gt;

&lt;p&gt;You could tell the vendor first. For a while, this would result in lawsuits and gag orders from the vendor trying to force you to never reveal details of the attack.&lt;/p&gt;

&lt;p&gt;Most vendors nowadays are more enlightened and will work with you. This is the path that Responsible Disclosure takes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You tell the vendor in private.&lt;/li&gt;
&lt;li&gt;They get a reasonable amount of time to &lt;strong&gt;fix the problem&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;You can then publish details of the attack.&lt;/li&gt;
&lt;li&gt;You get mad props.&lt;/li&gt;
&lt;li&gt;The vendor gets a more secure system.&lt;/li&gt;
&lt;li&gt;Nobody&amp;rsquo;s nudes get leaked.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let&amp;rsquo;s make a checklist of Desirable Outcomes For Responsible Disclosure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vendor can fix the system&lt;/li&gt;
&lt;li&gt;Researcher gets mad props&lt;/li&gt;
&lt;li&gt;Users remain secure in their warm beddy-bies&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;but-for-iot&#34;&gt;But for IoT&amp;hellip;&lt;/h2&gt;

&lt;p&gt;IoT vendors usually can&amp;rsquo;t fix their problems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The vendor does not physically control the device&lt;/li&gt;
&lt;li&gt;The vendor usually cannot push remote updates to a device&lt;/li&gt;
&lt;li&gt;Users might not know that the device exists&lt;/li&gt;
&lt;li&gt;Users might not care&lt;/li&gt;
&lt;li&gt;Updating the device might require downtime, and not every IoT device is permitted downtime&lt;/li&gt;
&lt;li&gt;Infrastructure devices take a &lt;em&gt;long&lt;/em&gt; time to update&lt;/li&gt;
&lt;li&gt;Regulatory restrictions will make approval to distribute an update very slow and with tremendous cost to the vendor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So let&amp;rsquo;s review the checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Researcher gets mad props: check&lt;/li&gt;
&lt;li&gt;Vendor can fix the system: not really&lt;/li&gt;
&lt;li&gt;Users remain secure: no&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By publishing the attack, we didn&amp;rsquo;t really improve the state of the world. Miscreants can reuse the attack but vendors and users remain insecure.&lt;/p&gt;

&lt;h2 id=&#34;who-s-the-vendor&#34;&gt;Who&amp;rsquo;s the vendor?&lt;/h2&gt;

&lt;p&gt;Crypto algorithms are used by everyone. We don&amp;rsquo;t bother to disclose attacks because there is no single authority.&lt;/p&gt;

&lt;p&gt;Websites and software have a single controlling entity. Websites are the ideal case for responsible disclosure because they are completely in control of a single entity. They update, everyone updates.&lt;/p&gt;

&lt;p&gt;Desktop software and apps are controlled by the user, but &lt;em&gt;most&lt;/em&gt; users will update &lt;em&gt;most&lt;/em&gt; of the time. Not always. Windows updates are forced; this comes at a cost to the user but is a responsible thing for the broader community.&lt;/p&gt;

&lt;p&gt;IoT devices have a single vendor, but very little control over the deployed device. Disclosing an attack to an IoT vendor doesn&amp;rsquo;t really help them.&lt;/p&gt;

&lt;h2 id=&#34;are-you-on-the-right-side&#34;&gt;Are you on the right side?&lt;/h2&gt;

&lt;!-- Say you&#39;re living under a totalitarian government that hides information from its citizens and disappears people. It&#39;s in the citizen&#39;s interests to break the government messages and systems. That&#39;s an easy decision, from the perspective of a citizen. --&gt;

&lt;p&gt;Responsible disclosure is about improving vendor and user security while making sure researchers are recognised for their research efforts.&lt;/p&gt;

&lt;p&gt;Consider attacks on content control systems &amp;ndash; DVD, Blu-ray, XBox, Playstation, iOS, Pay TV. The security system is designed to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prevent anyone other than the manufacturer from distributing content (i.e. stop piracy)&lt;/li&gt;
&lt;li&gt;for games, ensure a secure and fair computing platform (i.e. stop cheaters)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So let&amp;rsquo;s say you publish an attack on one of these. The vendor will improve their future systems, but they can&amp;rsquo;t do anything about the existing ones (these are all embedded devices in the people&amp;rsquo;s homes, of course). Depending on the severity of the breach, that content might be open forever and cheating rampant.&lt;/p&gt;

&lt;p&gt;So, let&amp;rsquo;s look at our Responsible Disclosure checklist again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vendor can fix the system: sort of. Maybe the attack can be mitigated by a software update. Maybe not.&lt;/li&gt;
&lt;li&gt;Researcher gets mad props: Oh yeah. These are high-value systems with significant effort put into defence. These are gold.&lt;/li&gt;
&lt;li&gt;Users remain secure: Well&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user might not &lt;em&gt;want&lt;/em&gt; to be secure. Users generally don&amp;rsquo;t like content protection. The security system protects the vendor, not the user.&lt;/p&gt;

&lt;p&gt;So by breaking content protection, did you do good? Did you improve the world? That&amp;rsquo;s a tricky question which I&amp;rsquo;m not touching; &amp;ldquo;content wants to be free&amp;rdquo;, &amp;ldquo;commercial interests need to be protected&amp;rdquo;, &amp;ldquo;artists won&amp;rsquo;t produce art if they won&amp;rsquo;t be compensated&amp;rdquo; and so on. I&amp;rsquo;m not getting into that.&lt;/p&gt;

&lt;h2 id=&#34;so-what-to-do&#34;&gt;So what to do?&lt;/h2&gt;

&lt;p&gt;Let&amp;rsquo;s think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you (the researcher) publish, jerkasses will use your attack. Vendors and users can&amp;rsquo;t do much about it.&lt;/li&gt;
&lt;li&gt;If you don&amp;rsquo;t publish, you don&amp;rsquo;t get recognition, and users/vendors remain ignorant to the vulnerabilities in a product.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One major problem with academic science research is that papers often lack enough detail to reproduce the result. Specifically, software and datasets are not released. The researcher gets props but does not advance human knowledge. We don&amp;rsquo;t even know if they did the research or just made it up.&lt;/p&gt;

&lt;p&gt;Recently, there is movement in the &amp;lsquo;full disclosure&amp;rsquo; direction: &lt;em&gt;everything&lt;/em&gt; is released. People can repeat the analyses. This can be bad for the publishing researcher (the new information can be used to dispute the result) but is good for scientific progress as we have a fuller, more confident &amp;lsquo;truth&amp;rsquo; than before.&lt;/p&gt;

&lt;p&gt;Likewise, with security research, you could publish &lt;em&gt;everything&lt;/em&gt;. Method, firmware images, exploit. Make it easy for someone to verify your result. The poses a particular risk for IoT systems where it is difficult for the vendor to react to the publication.&lt;/p&gt;

&lt;p&gt;You could publish &lt;em&gt;just enough&lt;/em&gt; to prove that an attack is possible, in line with traditional scientific research. The recent Pay TV hacks are a good example; the &lt;em&gt;methods&lt;/em&gt; are shown along some pretty compelling evidence. We don&amp;rsquo;t know for sure that it works; a demonstration device is great evidence but not absolute proof. There&amp;rsquo;s not enough information that you could do it yourself. The existence of the paper and talks makes it much easier to do, but the attack is technically challenging enough that the most common attackers (people at home) probably can&amp;rsquo;t execute it.&lt;/p&gt;

&lt;p&gt;This is a good position. The vendor and users are not significantly hurt. The researcher shows evidence of a successful attack on a challenging system.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s no universal answer. For IoT, the fundamental problem is that devices aren&amp;rsquo;t updated. It&amp;rsquo;s not even clear that &amp;ldquo;vendors must do remote OTA updates&amp;rdquo; is a good strategy. Not all devices can afford downtime.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Why extract firmware?</title>
      <link>https://ianhowson.com/iot/why-extract-firmware/</link>
      <pubDate>Tue, 20 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/why-extract-firmware/</guid>
      <description>

&lt;p&gt;In a black-box penetration test &amp;ndash; say, for a web application &amp;ndash; the attacker has very limited knowledge of how the software works. All that you know is what you can gather from the outside. This makes it difficult to detect vulnerabilities. At the other extreme, copy protection of desktop software has been completely unsuccessful. The attacker controls the hardware and therefore can manipulate the software, bypassing security controls.&lt;/p&gt;

&lt;p&gt;For an attacker, an IoT device starts somewhere in the middle. The attacker does not have the firmware and must probe from the outside. If the attacker can obtain or manipulate the firmware, their job becomes much easier. Common ways that an attacker can obtain the firmware are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;download it from the Internet (e.g. as a device update that your company releases)&lt;/li&gt;
&lt;li&gt;convince the device&amp;rsquo;s firmware to send it&lt;/li&gt;
&lt;li&gt;extract it from the device hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the attacker has a copy of the firmware, it&amp;rsquo;s usually easy to figure out what&amp;rsquo;s inside using standard software security techniques. &lt;a href=&#34;http://binwalk.org/&#34;&gt;Binwalk&lt;/a&gt; is a firmware analysis tool which can tell you what&amp;rsquo;s inside a firmware image.&lt;/p&gt;

&lt;h2 id=&#34;external-threats&#34;&gt;External threats&lt;/h2&gt;

&lt;p&gt;The attacker might be able to find out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operating system type and version (and can then learn any known vulnerabilities for same)&lt;/li&gt;
&lt;li&gt;any third party software (and documented bundled vulnerabilities)&lt;/li&gt;
&lt;li&gt;any hidden services, especially those used for manufacturing and testing&lt;/li&gt;
&lt;li&gt;password hashes and &amp;ndash; surprisingly frequent &amp;ndash; plaintext passwords&lt;/li&gt;
&lt;li&gt;web routes that are not visible from the public web service (again, often leftovers manufacturing and testing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given a firmware image, the attacker may be able to run it on their own hardware (with greater privileges) or obtain the source code from the Internet. This all gives the attacker more information to work with and more opportunities to find a vulnerability.&lt;/p&gt;

&lt;p&gt;An attacker that can &lt;em&gt;modify&lt;/em&gt; firmware and have your device run it has some new opportunities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They can use their hardware to run their own software.&lt;/li&gt;
&lt;li&gt;They can bypass software controls. For example, a door lock could be modified to unlock in response to a special card. A multiplayer video game could show players that are hidden behind walls.&lt;/li&gt;
&lt;li&gt;They can mount impersonation attacks. For example, an attacker could remove a device from a target site, modify the firmware (removing or modifying controls) and reinstall it in the target site.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;protection-of-secrets&#34;&gt;Protection of secrets&lt;/h2&gt;

&lt;p&gt;Some device firmware contains secrets &amp;ndash; valuable IP, maintenance passwords, company secrets or content decryption keys.&lt;/p&gt;

&lt;h2 id=&#34;device-cloning&#34;&gt;Device cloning&lt;/h2&gt;

&lt;p&gt;A common fear of vendors is that their manufacturing partners will manufacture and sell devices without the involvement of the vendor. &lt;a href=&#34;https://www.bunniestudios.com/blog/?page_id=1022&#34;&gt;Pirate SD cards&lt;/a&gt; are a real problem, and there have been stories of &lt;a href=&#34;http://qz.com/771727/chinas-factories-in-shenzhen-can-copy-products-at-breakneck-speed-and-its-time-for-the-rest-of-the-world-to-get-over-it/&#34;&gt;Kickstarter projects being cloned and sold in China&lt;/a&gt;. Low-quality clone devices cost both profits and reputation for the business.&lt;/p&gt;

&lt;p&gt;In response, some vendors only have their manufacturing partners write basic bringup firmware to the device &amp;ndash; just enough to test that the hardware is working correctly. This probably helps, but there are a multitude of other ways to obtain firmware &amp;ndash; assuming that the pirate manufacturer doesn&amp;rsquo;t just write their own.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How to extract firmware from a device</title>
      <link>https://ianhowson.com/iot/extracting-firmware/</link>
      <pubDate>Tue, 20 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/extracting-firmware/</guid>
      <description>

&lt;h2 id=&#34;through-the-application&#34;&gt;Through the application&lt;/h2&gt;

&lt;p&gt;If you (the attacker) can get a shell on the device, usually it is trivial to SCP the filesystem contents out of the device. This shell might come from any of the traditional software or network vulnerabilities. It is also common for serial ports on the device to expose a login shell.&lt;/p&gt;

&lt;p&gt;Standard software and network security countermeasures apply. In particular, try to disable debug ports in production firmware.&lt;/p&gt;

&lt;h2 id=&#34;through-the-bootloader&#34;&gt;Through the bootloader&lt;/h2&gt;

&lt;p&gt;The bootloader is in a particularly weak position in an IoT device:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It is usually unencrypted&lt;/li&gt;
&lt;li&gt;There are minimal security controls&lt;/li&gt;
&lt;li&gt;It is accessible through multiple ports &amp;ndash; usually an onboard serial port and often through network interfaces&lt;/li&gt;
&lt;li&gt;It must be able to access main device Flash in order to boot the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is common for just the bootloader to be written to the device during manufacturing. The final firmware image is written to the device through the bootloader at a later stage. It is &lt;strong&gt;unusual&lt;/strong&gt; that the bootloader is configured to disable console and Flash access after device programming is complete.&lt;/p&gt;

&lt;p&gt;Conveniently for old folks like me, bootloaders usually still use RS232 serial ports for access. The old XModem/ZModem/Kermit protocols are often shipped and can be used to copy files off device Flash.&lt;/p&gt;

&lt;p&gt;Countermeasures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customise the bootloader to allow the bare minimum facilities required to bring up the device (booting, Flash writes and network drivers). Disable Flash/memory reads and filesystem access.&lt;/li&gt;
&lt;li&gt;Disable debug ports after device configuration is complete&lt;/li&gt;
&lt;li&gt;Use your platform&amp;rsquo;s secure boot facilities (if any)&lt;/li&gt;
&lt;li&gt;Once device firmware has been written, reconfigure the bootloader to only allow booting, not memory/Flash reads and writes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;through-jtag-programming-and-debug-headers&#34;&gt;Through JTAG, programming and debug headers&lt;/h2&gt;

&lt;p&gt;It&amp;rsquo;s extremely unlikely that you can remove these facilities altogether; they&amp;rsquo;re important for manufacturing. You can make small gains by making them difficult or inconvenient to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t connect main Flash to the JTAG chain&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t label the ports&lt;/li&gt;
&lt;li&gt;Use test pads instead of pins, connectors or holes in the PCB. Bonus points if you can cover the pads (e.g. conformal coating) after firmware has been written.&lt;/li&gt;
&lt;li&gt;Groups of pads or pins are especially suspicious to the attacker. The number of pads and some voltage measurements will usually reveal the purpose of the whole group. Spread the pads around the PCB.&lt;/li&gt;
&lt;li&gt;Put traces on inner layers on the PCB where they&amp;rsquo;re more difficult to access. Consider a security mesh over the top to disable the device if broken.&lt;/li&gt;
&lt;li&gt;Disable CPU debug facilities (both in software and by not connecting the pads to the PCB)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;by-accessing-the-memory-bus&#34;&gt;By accessing the memory bus&lt;/h2&gt;

&lt;p&gt;This is an unusual and difficult. Notably, this method was used to &lt;a href=&#34;http://bunniefoo.com/nostarch/HackingTheXbox_Free.pdf&#34;&gt;extract encryption keys from the original XBox (page 125)&lt;/a&gt;. Many of the same countermeasures apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Put the traces on inner layers of the PCB and add a security mesh&lt;/li&gt;
&lt;li&gt;Memory encryption is a possibility, but is generally ineffective and costly&lt;/li&gt;
&lt;li&gt;iPhone and many Android phones stack the RAM onto the same package as the CPU/SoC, making it physically challenging to access the memory bus without destroying the whole package&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;by-removing-the-flash-chip&#34;&gt;By removing the Flash chip&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I have never encountered an IoT device which did not give me a useful firmware image once its main Flash was removed and dumped.&lt;/strong&gt; There must be &lt;em&gt;someone&lt;/em&gt; shipping a device with encrypted Flash; please get in contact with me if you find one!&lt;/p&gt;

&lt;p&gt;Some devices ship with an SD card as their boot media. The solution here is obvious: remove the card and dump it using a regular computer. Once, I had to get the soldering iron out as the device didn&amp;rsquo;t use a socket. That&amp;rsquo;s as tough as it gets.&lt;/p&gt;

&lt;p&gt;(I enjoy the irony that an 8-bit CPU can boot from an SD card that contains a 32-bit CPU &amp;ndash; and that this is the most cost-effective way to do things in some applications.)&lt;/p&gt;

&lt;p&gt;Most devices will use a soldered-on Flash device. There are a few common variations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serial SPI NOR (usually small &amp;ndash; kilobytes to a few megabytes)&lt;/li&gt;
&lt;li&gt;Serial NAND (larger, bigger CPUs)&lt;/li&gt;
&lt;li&gt;Parallel NOR/NAND&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many designs will allow the Flash chip to be read while still on the PCB. This is great for the attacker; less risk of damage to the device and less work. For a serial Flash device, this is achieved by lifting the TX/RX/MISO/MOSI lines and connecting them to an off-the-shelf device programmer. You can do the same with a parallel part, but it&amp;rsquo;s usually easiest to just remove the whole chip and put it in a socket.&lt;/p&gt;

&lt;p&gt;Another option is to solder to the exposed Flash pins and disable the main CPU &amp;ndash; either by holding it in reset or disabling the clocks.&lt;/p&gt;

&lt;p&gt;Removing a Flash chip is easy. The exact method varies depending on the board, but the simplest thing for most devices is to heat it with a hot air rework station and lift the chip off the board with tweezers. Higher density boards make this more challenging. Sometimes as an attacker you don&amp;rsquo;t care about destroying the host PCB &amp;ndash; anything you learn from one device will be applicable to another anyway.&lt;/p&gt;

&lt;!-- *** TODO that macbook uefi clip might be a good pic here *** --&gt;

&lt;!-- TODO also the iphone flash removal post
A modern iPhone is challenging because of the component density, but still doable; see here --&gt;

&lt;h3 id=&#34;countermeasures&#34;&gt;Countermeasures&lt;/h3&gt;

&lt;p&gt;By far the best countermeasure to physical Flash attacks is to &lt;strong&gt;encrypt the firmware and use a trusted boot facility&lt;/strong&gt;. Then, an attacker has nothing of value to extract from the Flash. Even without trusted boot, encrypted firmware is a lot better than nothing.&lt;/p&gt;

&lt;p&gt;Failing that, you&amp;rsquo;re limited to physical controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Glob or epoxy the Flash to the board&lt;/li&gt;
&lt;li&gt;Use a parallel Flash chip; it raises the cost of attack&lt;/li&gt;
&lt;li&gt;Use a BGA part. This prevents in-place access to the Flash chip, raises the cost of inserting the part into a socket and makes reassembly more challenging. Make sure you don&amp;rsquo;t expose the vias to the outer layers of the PCB, or you&amp;rsquo;ve just given attackers a convenient place to solder to!&lt;/li&gt;
&lt;li&gt;Hide data traces on inner layers of the PCB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Xilinx FPGAs have an interesting solution to this. Their firmware (bitstream) is stored encrypted in an external Flash chip. The decryption keys are stored in write-only SRAM within the FPGA package. An external battery keeps the key storage SRAM alive. The keys never need to leave the chip.&lt;/p&gt;

&lt;p&gt;There are obvious downsides to this &amp;ndash; a battery is expensive and large &amp;ndash; but if you&amp;rsquo;re using an FPGA already, cost, space and power consumption are probably not big constraints.&lt;/p&gt;

&lt;h2 id=&#34;by-removing-the-microcontroller&#34;&gt;By removing the microcontroller&lt;/h2&gt;

&lt;p&gt;If you&amp;rsquo;re looking at a small device (usually 8-bit), it&amp;rsquo;s likely that the CPU, RAM and Flash are integrated onto the same die.&lt;/p&gt;

&lt;p&gt;These parts all have code locks that prevent the program Flash from being read externally. There are plenty of examples of people defeating those locks. Many of these attacks require the microcontroller to be removed and the package melted off. This is beyond the ability of most at-home attackers.&lt;/p&gt;

&lt;!-- TODO talk about some of the interesting attacks
package removal/decapping
security mesh
power supply glitching
selective deprogramming - tape over the main flash but leave the lock bit exposed --&gt;

&lt;ul&gt;
&lt;li&gt;Enable the code lock after production firmware has been written&lt;/li&gt;
&lt;li&gt;Consider globbing the part to the PCB&lt;/li&gt;
&lt;li&gt;Many microcontrollers ship with code security features such as self-destruction and encryption&lt;/li&gt;
&lt;/ul&gt;

&lt;!-- TODO *** not only the main flash is interesting; all of the sub devices, sub flash chips and boot chips are interesting. e.g. an spi boot rom that chain boots to a nand flash. ethernet eeproms can be modified or cloned. even random id chips can be spoofed. --&gt;

&lt;!-- TODO Most countermeasures merely slow down, not stop, an attacker. Your threat model will tell you what sort of attacker you&#39;re trying to guard against. People at home with a soldering iron can be slowed down easily. Dedicated electronics hackers with rework equipment and chip decapping facilities are harder. State-level attackers are almost impossible to stop at the hardware level. --&gt;

&lt;!-- TODO There aren&#39;t any great countermeasures at this stage. The best we have are things like chip credit cards which contain a private key; they&#39;re difficult to clone. But a whole computing environment with peripherals? that&#39;s hard. iphone does a good job; the cpu (SoC) and ram are physically bonded together; the crypto keys and tpm are in the same die. the device is also really expensive and the risk of device destruction extremely high. and at the end of the day, the code is public-key signed, so the device doesn&#39;t even contain private keys (the attack cannot be reused on another device without disassembling it). so there&#39;s no good attack there. --&gt;

&lt;!-- TODO inline encryption, obfuscation, &#39;white box cryptography&#39; (code protection measures) - still not super effective if someone can put a debugger on. perhaps someone wants to misuse the firmware/keys instead of just extract it --&gt;
</description>
    </item>
    
    <item>
      <title>Miscellaneous</title>
      <link>https://ianhowson.com/iot/miscellaneous/</link>
      <pubDate>Mon, 19 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/miscellaneous/</guid>
      <description>

&lt;h2 id=&#34;cryptography-demands-that-your-cpu-and-memory-subsystems-be-perfect&#34;&gt;Cryptography demands that your CPU and memory subsystems be perfect&lt;/h2&gt;

&lt;p&gt;In normal operation, your device can, surprisingly, tolerate a lot of small errors. The occasional bitflip in RAM won&amp;rsquo;t hurt anything and a slightly out-of-spec CPU (e.g. low voltage or noisy power supply) will work well enough. Most applications don&amp;rsquo;t do enough CPU/RAM work that errors are a big problem.&lt;/p&gt;

&lt;p&gt;Cryptography is different; it will stress your CPU and RAM for a long period. It is also completely intolerant of errors. A single-bit transition will completely ruin the result of a crypto operation.&lt;/p&gt;

&lt;p&gt;A quick-and-dirty test, for Linux systems at least, is to repeatedly hash a file in RAM. If you&amp;rsquo;ve got a &lt;code&gt;tmpfs&lt;/code&gt; mounted on &lt;code&gt;/tmp&lt;/code&gt;, for example, you can:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;dd &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;if&lt;/span&gt;=/dev/random of=/tmp/junk bs=1M count=&amp;lt;most of RAM - e.g. &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;12&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;for&lt;/span&gt; a 16MB machine&amp;gt;
md5sum /tmp/junk&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Repeatedly run the &lt;code&gt;md5sum&lt;/code&gt;. If you ever get different results, you&amp;rsquo;re seeing memory corruption.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Pressures on highly regulated industries</title>
      <link>https://ianhowson.com/iot/highly-regulated-industries/</link>
      <pubDate>Mon, 19 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/highly-regulated-industries/</guid>
      <description>

&lt;h2 id=&#34;change-is-expensive&#34;&gt;Change is expensive&lt;/h2&gt;

&lt;p&gt;Highly regulated industries like automotive, aerospace and medical, have a common pressure: &lt;strong&gt;change is expensive&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These industries need to comply with regulations before their product is allowed to be sold. On top of designing, developing, testing, marketing and selling a product, meeting regulatory requirements takes massive amounts of resources.&lt;/p&gt;

&lt;p&gt;Every market has different regulatory requirements, so products can&amp;rsquo;t be sold in a region until that regulator is satisfied.&lt;/p&gt;

&lt;p&gt;Regulators typically take months, occasionally years, to receive a submission, process it and respond. While the regulator is busy, the company will move its efforts to the next product. This works against security efforts; while regulators are doing their thing, engineers have forgotten about the product or left the company. By the time you come back to produce an update, nobody will remember why or how the old device worked.&lt;/p&gt;

&lt;p&gt;Should a regulator reject your submission, it will take a long time (weeks to months) before you can resubmit and try again. This delay kills startup companies.&lt;/p&gt;

&lt;p&gt;Most of the time, firmware is an item that regulators want to control. You&amp;rsquo;ll need to prove that the firmware is safe and has been well tested. Sometimes there are mandatory design standards that must be met. This goes beyond just the firmware that your company writes: any software that is included in your product must meet some standard. &lt;em&gt;This is a huge problem for any Linux-based system&lt;/em&gt; as you can&amp;rsquo;t realistically prove the safety of a large body of third-party code. You can test and submit it as a black box, sometimes.&lt;/p&gt;

&lt;p&gt;Complexity is your enemy, moreso than usual. Simpler designs mean less parts and less external software. This means simpler and faster regulatory submissions. The tradeoff? Security, of course! It&amp;rsquo;s tough to justify doing &lt;strong&gt;additional work&lt;/strong&gt; for a device which will &lt;strong&gt;disable functionality&lt;/strong&gt; for a &lt;strong&gt;possible future event&lt;/strong&gt;. You need to get this thing shipped now!&lt;/p&gt;

&lt;p&gt;Regulators pay attention to cryptography. Some markets restrict its use, some limit its strength, and some outlaw it entirely. Regulators pay attention. If your product uses crytography, you&amp;rsquo;ve limited the markets that you can sell to and made your interactions with regulators slower.&lt;/p&gt;

&lt;p&gt;Regulators are paying more attention to security, though there&amp;rsquo;s nothing concrete right now. Their concerns are mostly around user safety. DDoS from virus-infected pacemakers, not so much.&lt;/p&gt;

&lt;h2 id=&#34;so-what&#34;&gt;So what?&lt;/h2&gt;

&lt;p&gt;There&amp;rsquo;s a &lt;em&gt;massive&lt;/em&gt; cost to changing the design. You&amp;rsquo;re going to choose an old CPU that is boring, trusted and well documented. It needs to be available for a long time in the future.&lt;/p&gt;

&lt;p&gt;Despite tremendous advances in mobile CPUs, you probably can&amp;rsquo;t use them because they&amp;rsquo;ll only be manufactured for a small number of years. If your CPU becomes obsolete, you have to redo the hardware design and resubmit it to the regulators; that&amp;rsquo;s expensive.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re going to emphasise &lt;em&gt;simplicity&lt;/em&gt;. Less hardware components means better reliability. They mean less documentation where you have to prove that they&amp;rsquo;re safe. Smaller firmware means less testing and less documentation. Less can go wrong.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re going to develop more in-house rather than buy solutions in. Because you&amp;rsquo;re manufacturing and supporting for decades, your toolchains and any third-party software need to be stable for a long time. Stability is a problem for security, though &amp;ndash; a small security fix might need you to pull in newer, untested code. What works for web development (constant updates and change) works poorly for IoT.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re going to make as few changes to the design and the firmware as possible because changes might need to be approved by the regulators again. Even minor changes can have unexpected implications.&lt;/p&gt;

&lt;p&gt;For many devices, the firmware that they receive at manufacture time is the most recent firmware they&amp;rsquo;ll ever receive. It needs to be solid! You probably can&amp;rsquo;t just patch it remotely later. If your simple bugfix has a surprise bug, that bug will be out there forever.&lt;/p&gt;

&lt;h2 id=&#34;but-pacemakers-can-be-remotely-hacked&#34;&gt;But&amp;hellip; pacemakers can be remotely hacked!&lt;/h2&gt;

&lt;p&gt;Sort of. Not really.&lt;/p&gt;

&lt;p&gt;In this light, it is easy to understand the decisions made by pacemaker manufacturers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CPU won&amp;rsquo;t support much, if any cryptography &amp;ndash; it needs to be boring, low power and reliable.&lt;/li&gt;
&lt;li&gt;Their device has to run for decades.&lt;/li&gt;
&lt;li&gt;There&amp;rsquo;s a reasonable control against &amp;ldquo;attacker reprograms pacemaker&amp;rdquo;: they have to be physically close to the device. You could just stab the victim with a knife!&lt;/li&gt;
&lt;li&gt;The surgery is highly invasive. &lt;em&gt;Any&lt;/em&gt; reduction in reliability is really bad.&lt;/li&gt;
&lt;li&gt;IEC 62304 makes it difficult (but not impossible) to use externally sourced software. An SSL stack is a large, complex piece of software. Even symmetric cryptography raises new regulatory hurdles.&lt;/li&gt;
&lt;li&gt;People receiving pacemakers already have a serious health issue. You probably don&amp;rsquo;t want to delay any medical treatment by adding extra security controls (e.g. authentication) to the process.&lt;/li&gt;
&lt;li&gt;Medical device companies tend to protect their IP through lawyers and patents, not technology.&lt;/li&gt;
&lt;/ul&gt;

&lt;!-- TODO practically any device with RF interfaces goes through a regulatory phase. this makes dev costly. you don&#39;t have to recertify if you do firmware updates, do you? some devices control RF emissions through sofwtare, like cars control pollution emissions through sofwtare.
 --&gt;
</description>
    </item>
    
    <item>
      <title>Software developers shouldn&#39;t build threat models</title>
      <link>https://ianhowson.com/iot/developers-and-threat-models/</link>
      <pubDate>Mon, 19 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/developers-and-threat-models/</guid>
      <description>

&lt;p&gt;Often, the software or firmware developers end up building threat models. This is a terrible idea for two reasons:&lt;/p&gt;

&lt;h2 id=&#34;1-it-conflicts-with-their-goals&#34;&gt;1. It conflicts with their goals.&lt;/h2&gt;

&lt;p&gt;Software developers have one priority: ship the product faster. Faster faster faster. You&amp;rsquo;re asking them to produce a report (taking time) which will require them to implement controls (taking more time). Some of the controls will be burdensome and require major changes. So they will silently omit threats which are difficult to control, downgrade their probability/impact, or select an ineffective but easy-to-implement control.&lt;/p&gt;

&lt;h2 id=&#34;2-the-software-developers-in-your-company-probably-have-little-or-no-security-training&#34;&gt;2. The software developers in your company probably have little or no security training.&lt;/h2&gt;

&lt;p&gt;Wait, let me back up. Do you even &lt;em&gt;have&lt;/em&gt; software developers? A large proportion of IoT vendors don&amp;rsquo;t employ software developers at all &amp;ndash; the hardware/electrical engineers write the firmware. Sometimes the firmware is outsourced, so the firmware vendors aren&amp;rsquo;t interested in satisfying long-term needs.&lt;/p&gt;

&lt;p&gt;So, IF you have software people and IF they have security training and IF they ever use it, they might be in a position to discuss security issues. But given point (1) &amp;ndash; effective threat modelling conflicts with their goals &amp;ndash; it&amp;rsquo;s not a great idea.&lt;/p&gt;

&lt;h2 id=&#34;so-who-builds-the-threat-model&#34;&gt;So who builds the threat model?&lt;/h2&gt;

&lt;p&gt;Someone who isn&amp;rsquo;t interested in the ship date. Preferably, someone who has no relationship with your firmware team.&lt;/p&gt;

&lt;p&gt;If you use an external consultant, they can deliver the bad news and leave without causing conflict between staff.&lt;/p&gt;

&lt;h2 id=&#34;what-do-i-do-with-the-threat-model&#34;&gt;What do I do with the threat model?&lt;/h2&gt;

&lt;p&gt;It becomes software requirements. (You have requirements, right?)&lt;/p&gt;

&lt;p&gt;Because they&amp;rsquo;re requirements, the developers are more free to select an appropriate control. It fits in with their existing workflow and gives them something to test against. (You have tests, right?)&lt;/p&gt;

&lt;p&gt;I know, I know: most orgs don&amp;rsquo;t have requirements or tests. Hopefully your threat model does not show any significant risks to your business. If it does, you&amp;rsquo;re now in a much better position to advocate for proper testing. You can clearly show the risk and cost of NOT testing.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Why is IoT security different?</title>
      <link>https://ianhowson.com/iot/why-is-iot-security-different/</link>
      <pubDate>Mon, 19 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/why-is-iot-security-different/</guid>
      <description>

&lt;h2 id=&#34;cost&#34;&gt;Cost&lt;/h2&gt;

&lt;p&gt;Most IoT devices sold nowadays are sold as a piece of hardware. You pay once for the hardware and it is expected to work for a long time. As a result, the vendor is under tremendous pressure to keep the manufacture price of the hardware low; it entirely dictates their profit margin.&lt;/p&gt;

&lt;p&gt;Because there is pressure to reduce manufacturing costs, every component comes under scrutiny. Do we need 64MB of RAM, or can we get by with 32? A less capable CPU will shave 50 cents off the BOM cost. The hardware capabilities are reduced to the bare minimum, and unfortunately for security, features like cryptography tend to be demanding of hardware resources. The vast majority of IoT devices in the wild are simply not capable of strong crypto as it is presently used.&lt;/p&gt;

&lt;h2 id=&#34;business-models&#34;&gt;Business models&lt;/h2&gt;

&lt;p&gt;Traditional software costs nothing to duplicate. There are two common business models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Buy once, use forever&lt;/li&gt;
&lt;li&gt;Ongoing subscription&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The software industry is moving to subscription plans because consumers expect regular updates and support. They also expect to get new features, but hate to pay for them. Software vendors incur large costs to support and update a software product after it has already been sold.&lt;/p&gt;

&lt;p&gt;Much of the support and update burden is &lt;strong&gt;security patches&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;IoT devices mostly use the &amp;ldquo;buy once, use forever&amp;rdquo; model. Unfortunately, this means that the vendor has little incentive to update their device once it has been released. Updates cost money. They would prefer that customers buy a new device and throw away the old one.&lt;/p&gt;

&lt;p&gt;There are some businesses which discount hardware costs by selling an ongoing subscription (e.g. Internet or cell phone service), but in these cases the service is valuable and the hardware is a necessary cost. You wouldn&amp;rsquo;t pay $5/month to use an IoT lightbulb, for instance. Where a service must be provided for a long time, it is usually priced into the up-front purchase price. No IoT lightbulb costs $200 to manufacture.&lt;/p&gt;

&lt;p&gt;As a result, IoT vendors rarely release security patches for their products.&lt;/p&gt;

&lt;h2 id=&#34;hardware-capabilities&#34;&gt;Hardware capabilities&lt;/h2&gt;

&lt;p&gt;IoT devices use different CPUs to those found in a modern laptop or desktop. They are always less powerful &amp;ndash; sometimes dramatically so. Where a typical (2016) laptop will have 8GB of RAM, there are CPUs in IoT devices which have less than 20 bytes (yes, bytes!) of RAM. There are several reasons for this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost, obviously. Smaller CPUs are cheaper.&lt;/li&gt;
&lt;li&gt;Smaller CPUs use less power. Less power means less heat, longer battery life, less cooling, smaller size and lower manufacturing cost.&lt;/li&gt;
&lt;li&gt;IoT devices typically integrate most or all of their peripherals onto the CPU package, further reducing size/cost/power.&lt;/li&gt;
&lt;li&gt;Many IoT devices handle real-time tasks, and these are often easier to develop on a non-desktop-class operating system.&lt;/li&gt;
&lt;li&gt;Some IoT devices are manufactured for an extended period (over five years) with minimal hardware changes (aerospace, automotive, medical). The parts &lt;em&gt;must&lt;/em&gt; be manufactured for at least that time. Typically, older parts must be selected, with less capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are thousands of architectures in common use. I discuss a few common classifications in &lt;a href=&#34;https://ianhowson.com/iot/hardware-classes&#34;&gt;hardware classes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The result of all of this is that &lt;strong&gt;not all IoT devices are capable of strong cryptography&lt;/strong&gt;. At the time of writing (2016), IoT devices with the same crypto capabilities as a desktop PC are rare.&lt;/p&gt;

&lt;p&gt;This isn&amp;rsquo;t as simple as &amp;ldquo;Moore&amp;rsquo;s Law will fix it&amp;rdquo;. The non-cost benefits (power, development effort, predictability) of smaller CPUs are enormous. It&amp;rsquo;ll be a long time before we can fit something with the power of a Raspberry Pi (400MHz 32-bit ARM, 50-5000mW) into the space and power envelope of a TinyAVR (4MHz 8-bit AVR, 5mW) &amp;ndash; and the AVR would still boot faster.&lt;/p&gt;

&lt;h2 id=&#34;software-development-practices&#34;&gt;Software development practices&lt;/h2&gt;

&lt;p&gt;Software (firmware!) is usually developed alongside the hardware device. It&amp;rsquo;s very common for firwmare to be developed by someone who isn&amp;rsquo;t a specialised software developer (often they&amp;rsquo;re electrical engineers first and learn software development on-the-job). They are therefore less likely to be educated in security practice than a software developer.&lt;/p&gt;

&lt;p&gt;Software/firmware is also seen as a cost and usually takes longer than the hardware development. It therefore delays release to market and is abbreviated as much as possible.&lt;/p&gt;

&lt;h2 id=&#34;physical-environment&#34;&gt;Physical environment&lt;/h2&gt;

&lt;p&gt;IoT devices operate in all imaginable physical environments &amp;ndash; underwater, inside human bodies, in space.&lt;/p&gt;

&lt;p&gt;As a result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The vendor usually doesn&amp;rsquo;t control the physical environment&lt;/li&gt;
&lt;li&gt;The hardware can be damaged or operate incorrectly out of its designed physical environment&lt;/li&gt;
&lt;li&gt;The hardware itself is an avenue that attackers can use to learn more about the device&lt;/li&gt;
&lt;li&gt;Physical and temporal proximity are often used as security controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Remember, a Fitbit is a $100 computer whose purpose in life is to be &lt;em&gt;shaken&lt;/em&gt;. You would never do this to a regular computer!&lt;/p&gt;

&lt;h2 id=&#34;unattended-operation&#34;&gt;Unattended operation&lt;/h2&gt;

&lt;p&gt;Despite mainstream media reporting, IoT devices &lt;em&gt;already&lt;/em&gt; surround you and control much of your life. You don&amp;rsquo;t know that they exist. You&amp;rsquo;re certainly not aware that they need to be maintained.&lt;/p&gt;

&lt;p&gt;Many classes of IoT devices &amp;ndash; building controls, SCADA devices, medical implants &amp;ndash; need to operate for a long time with no human intervention. For the security practitioner, that means that they need to operate &lt;em&gt;securely&lt;/em&gt; for a long time with nobody patching them or monitoring them.&lt;/p&gt;

&lt;p&gt;An unpatched Windows XP machine on the Internet will be compromised within a few minutes, but at least someone will &lt;em&gt;notice&lt;/em&gt; that it has been compromised. The IoT devices in tomorrow&amp;rsquo;s news story have already been deployed somewhere and forgotten.&lt;/p&gt;

&lt;h2 id=&#34;huge-variability-in-architectures&#34;&gt;Huge variability in architectures&lt;/h2&gt;

&lt;p&gt;On the desktop, practically all machines run Windows on an Intel CPU. On servers, Linux or Windows on Intel. On phones, iOS or Android on ARM.&lt;/p&gt;

&lt;p&gt;On IoT devices, there is no dominant platform. There are hundreds of different CPUs and dozens of different operatings systems. Many devices use a custom operating system or no operating system at all. Even stock operating systems are heavily customised.&lt;/p&gt;

&lt;p&gt;If an attacker compromises iOS or Windows, they can reuse the same method over a massive install base. Because they&amp;rsquo;re constantly attacked and have strong corporate backing, they&amp;rsquo;re very robust at this point in time.&lt;/p&gt;

&lt;p&gt;IoT devices are all different. They&amp;rsquo;re generally very easy to compromise, but the same exploit isn&amp;rsquo;t usable against many devices. Given that attackers have a finite amount of time to spend attacking and exploiting devices, they&amp;rsquo;ll spend their effort on more lucrative (effort * impact) targets.&lt;/p&gt;

&lt;!--

TODO review

## Implications

* Consumers consistently prefer cheaper products. It&#39;s difficult to sell a device on the merits of having better security.
  * Sometimes the security features actively work against the consumer -- consider DVD players, for instance. A DVD player which can play discs from any region benefits the consumer. You can&#39;t sell a high-security DVD player (Sony!) because *they don&#39;t want that*.
* Vendors often go out of business, but the products stay in the wild. They will not receive security updates.
* Modern CPUs provide hardware features which provide additional protection against security problems (ASLR, virtualisation). Because IoT devices are using the cheapest CPU they can, they often do not have these features available.
* As a designer, your product will probably not be updated once it is released. **You need to design it upfront assuming that it is going to be breached.** I write more about this in &lt;&lt;design-for-failure&gt;&gt;.

--&gt;
</description>
    </item>
    
    <item>
      <title>How do we fix IoT security?</title>
      <link>https://ianhowson.com/iot/fix-iot-security/</link>
      <pubDate>Tue, 13 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/fix-iot-security/</guid>
      <description>

&lt;p&gt;So, given the constraints that I&amp;rsquo;ve ranted on about over and over again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The business can&amp;rsquo;t easily switch to a subscription model&lt;/li&gt;
&lt;li&gt;There is downward pressure on hardware costs&lt;/li&gt;
&lt;li&gt;Cheap devices will probably never have good security controls&lt;/li&gt;
&lt;li&gt;Devices in the field will never be updated&lt;/li&gt;
&lt;li&gt;Developers are only interested in the ship date, not security&lt;/li&gt;
&lt;li&gt;Users don&amp;rsquo;t understand security, won&amp;rsquo;t learn it and won&amp;rsquo;t pay more for it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What are we to do? How do we improve the current state of affairs?&lt;/p&gt;

&lt;h2 id=&#34;produce-a-threat-model-and-publish-it&#34;&gt;Produce a threat model and publish it&lt;/h2&gt;

&lt;p&gt;It&amp;rsquo;s a big dream, but I would like to see &lt;em&gt;every&lt;/em&gt; device have a threat model produced, and I would like the vendors to &lt;em&gt;publish&lt;/em&gt; those threat models.&lt;/p&gt;

&lt;p&gt;This serves two purposes. One, the vendor thinks about security, if only briefly. And two, consumers use the threat model to judge if it&amp;rsquo;s appropriate to their situation (or, more likely, judge based on whether it has been published at all).&lt;/p&gt;

&lt;h2 id=&#34;don-t-rely-on-the-user-to-make-a-security-decision&#34;&gt;Don&amp;rsquo;t rely on the user to make a security decision&lt;/h2&gt;

&lt;p&gt;Users, generally, do not take the time to learn about security. They also don&amp;rsquo;t necessarily make the right security decision. Where possible, you need to make it for them.&lt;/p&gt;

&lt;p&gt;This will sometimes conflict with your goals. If you&amp;rsquo;re selling a WiFi access point, you&amp;rsquo;ll get less support calls if you leave it open by default. Adding reasonable security (unique WPA2 passwords on each device) costs you money in manufacturing time, documentation, but especially support.&lt;/p&gt;

&lt;p&gt;Apple doesn&amp;rsquo;t &lt;em&gt;require&lt;/em&gt;, but it &lt;em&gt;strongly encourages&lt;/em&gt; users to use passcodes and Touch ID. It also enables disk encryption by default. These are good decisions. They&amp;rsquo;re well tolerated by most users.&lt;/p&gt;

&lt;p&gt;Many IP cameras use a default password and open UPnP ports by default. These are bad decisions. If the user does not explicitly intervene (they won&amp;rsquo;t!) then the camera is exposed to the world.&lt;/p&gt;

&lt;p&gt;Forcing software updates is a good step in this direction, though users tend to hate it.&lt;/p&gt;

&lt;h2 id=&#34;government-regulation&#34;&gt;Government regulation&lt;/h2&gt;

&lt;p&gt;Probably nothing will come of regulation. Design and manufacturing occurs across several countries; regulation would need to be on &lt;em&gt;sale&lt;/em&gt;, like the EU with RoHS. Unlike RoHS, which is easy to define (you can&amp;rsquo;t use this list of materials), &amp;lsquo;adequate security&amp;rsquo; is completely different depending on the type of device and the context in which it is used.&lt;/p&gt;

&lt;p&gt;For many product categories, adding regulatory overhead would completely kill it as a business.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s difficult regulation to write. You can&amp;rsquo;t write something blanket like &amp;ldquo;all comms must be encrypted&amp;rdquo;; this would make many devices impossible to design. Security needs to be tailored to the environment. As a result, a conventional risk management process is more appropriate (identify the risks, implement appropriate controls, then the regulator signs off to say that you&amp;rsquo;ve done that well enough.)&lt;/p&gt;

&lt;p&gt;Product categories where there is significant risk of personal or property damage (medical devices, cars, industrial equipment, aircraft) might need specific security regulation. Chances are, they already &lt;em&gt;have&lt;/em&gt; regulation by virtue of being high risk.&lt;/p&gt;

&lt;p&gt;Regulation can also be politically motivated. You don&amp;rsquo;t want bogus media coverage of &amp;lsquo;pacemakers can be remotely hacked&amp;rsquo; or &amp;lsquo;terrorists can explode your laptop&amp;rsquo; to impact your business.&lt;/p&gt;

&lt;h2 id=&#34;positive-and-negative-reporting&#34;&gt;Positive &lt;em&gt;and&lt;/em&gt; negative reporting&lt;/h2&gt;

&lt;p&gt;I&amp;rsquo;ve advocated for reports of &lt;a href=&#34;https://ianhowson.com/iot/negative-reporting/&#34;&gt;negative results&lt;/a&gt; &amp;ndash; i.e. &amp;ldquo;we pentested this device and did not find any problems&amp;rdquo;. I believe that these are more useful than positive (&amp;ldquo;dis shit be busted&amp;rdquo;) reports.&lt;/p&gt;

&lt;p&gt;If negative reports became more commonplace, they might give vendors a reason to actually &lt;em&gt;think&lt;/em&gt; about security. They want people to publish good things about them!&lt;/p&gt;

&lt;p&gt;Right now, researchers have to speculate as to the sort of security threats a device is designed to handle. If researchers publish a report claiming vulnerabilities that a device was never intended to handle, the vendor loses both ways: they paid the cost of implementing security &lt;em&gt;and&lt;/em&gt; they still got bad press. If vendors explained their threat model up-front, it primes the conversation; all future discussion will be from that reference point, and &lt;strong&gt;vendors get to choose&lt;/strong&gt; that reference point.&lt;/p&gt;

&lt;p&gt;Right now, &amp;lsquo;security researcher publishes bad report&amp;rsquo; is the most plausible security threat that many vendors face.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve got the &amp;lsquo;stick&amp;rsquo; side of incentives right &amp;ndash; vendors that ship bad security sometimes get bad press. We should have a &amp;lsquo;carrot&amp;rsquo; side too &amp;ndash; vendors that take the time to document and openly discuss their security decisions get good press.&lt;/p&gt;

&lt;h2 id=&#34;make-the-network-resilient&#34;&gt;Make the network resilient&lt;/h2&gt;

&lt;p&gt;Mirai and botnets are not an IoT phenomenon. They&amp;rsquo;ve been around for decades, often using unpatched desktop/laptop machines. There are still Windows XP machines out there, and they&amp;rsquo;re not receiving updates any more!&lt;/p&gt;

&lt;p&gt;We need to get more vendors producing updates, and we need to get more end-users &lt;em&gt;installing&lt;/em&gt; updates, but we can never patch &lt;em&gt;everything&lt;/em&gt;. I think a more pragmatic way to proceed is to make the network tolerate and/or prevent malicious behaviour.&lt;/p&gt;

&lt;p&gt;Broadly, botnets are used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DDoS&lt;/li&gt;
&lt;li&gt;Sending spam&lt;/li&gt;
&lt;li&gt;Bitcoin mining (probably not any more)&lt;/li&gt;
&lt;li&gt;Relays or proxies for other nefarious activities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can&amp;rsquo;t do much about relaying, but DDoS and spam are easy to detect: lots of outbound traffic. DDoS is usually a lot of IP/UDP to a small number of IPs. Spam is &lt;em&gt;particularly&lt;/em&gt; easy to detect because it&amp;rsquo;s identifiable by destination port number!&lt;/p&gt;

&lt;p&gt;Consumer routers could potentially rate limit this traffic and/or warn the user (though communicating with the user is an unsolved problem). Internet service providers, likewise, could detect this on the consumer side. Both, however, would incur costs to do so.&lt;/p&gt;

&lt;h2 id=&#34;network-enforced-killswitches&#34;&gt;Network-enforced killswitches&lt;/h2&gt;

&lt;p&gt;A more radical proposition would be to globally share information on compromised devices, like what we do with spam blacklists. Routers could automatically take corrective action (blocking UPnP or all network traffic) to bad devices.&lt;/p&gt;

&lt;p&gt;At a basic level, you&amp;rsquo;d want to be able to block ranges of MAC addresses. Some sort of software version detection would also be needed. This could potentially cover XP machines.&lt;/p&gt;

&lt;p&gt;Consumer Internet routers are unfortunately very price-sensitive. It&amp;rsquo;s unlikely that they would add the software to do this, especially given it might harm the vendor&amp;rsquo;s other products.&lt;/p&gt;

&lt;p&gt;Done properly, this might incentivise vendors to properly address security in their products rather than risk being blacklisted.&lt;/p&gt;

&lt;h2 id=&#34;consumer-education&#34;&gt;Consumer education&lt;/h2&gt;

&lt;p&gt;For the entire history of computing, consumer education on security has been a total failure. Users just don&amp;rsquo;t care about security and they don&amp;rsquo;t want to learn about it.&lt;/p&gt;

&lt;p&gt;The most effective security is either enforced on users (which they hate) or is built-in and convenient &amp;ndash; e.g. Touch ID.&lt;/p&gt;

&lt;h2 id=&#34;better-frameworks&#34;&gt;Better frameworks&lt;/h2&gt;

&lt;p&gt;Ehhh.&lt;/p&gt;

&lt;p&gt;So here&amp;rsquo;s the thing. The popular dev boards from Intel, Google, Raspberry Pi and so on &amp;ndash; they&amp;rsquo;re all Linux machines. Which is fine. But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Because they run Linux, only expensive devices can use them.&lt;/li&gt;
&lt;li&gt;Because they run Linux, you don&amp;rsquo;t &lt;em&gt;need&lt;/em&gt; to think that hard about the software stack. We&amp;rsquo;ve got decades of great network software stacks for desktop-class machines! You can afford to go crazy:

&lt;ul&gt;
&lt;li&gt;all comms go over OpenSSL&lt;/li&gt;
&lt;li&gt;all flash is encrypted&lt;/li&gt;
&lt;li&gt;use the trusted boot features&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;They don&amp;rsquo;t do anything about the IoT-specific issues like hardware security, your business being uninterested in shipping updates, or your electrical engineers having no security training.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So go and build your product on one &amp;ndash; you can find out quickly if it&amp;rsquo;s going to work. Building on cheap, small hardware is a premature optimisation. Just don&amp;rsquo;t be surprised if manufacturing and/or management ask you to save $20 BOM cost by switching away from Linux and your favourite framework.&lt;/p&gt;

&lt;!--

MERGE THESE NOTES


## Regulate and sanction IoT vendors


https://www.ftc.gov/news-events/press-releases/2014/02/ftc-approves-final-order-settling-charges-against-trendnet-inc

## Make the network resilient

## Education

engineers
management
consumers

consumers is probably where the work needs to go, but pretty much the entire history of effective security is assuming that consumers don&#39;t know anything.

engineers often know, but it just doesn&#39;t make sense for many businesses to emphasise security. management would prefer not to know - plausible deniability! &#39;nobody ever told us!&#39;


---

So if it&#39;s all so bleak, what do we do?

There are always going to be insecure devices on the Internet. You can&#39;t make everyone upgrade (hello, Windows XP!) and IoT devices are often never going to be upgraded.

So we need to make the network resilient to attacks from compromised devices. Ideas:

* &#39;Application profiles&#39; that describe to the upstream gateway what an allowable set of network permissions are, along the same lines as an AppArmor profile. For example, an IP camera would be allowed to receive up to 5 inbound connections, and it can talk to those, but it can&#39;t independently create connections on its own. The device would be preprogrammed with this (or they could be distributed online post-hoc in case a manufacturer does not provide one) and it is broadcast to the gateway.
* &#39;Respond only&#39; is a good policy for many devices. Many others will talk to a central server; you could whitelist IPs, for instance.
 --&gt;
</description>
    </item>
    
    <item>
      <title>Negative reporting and security research</title>
      <link>https://ianhowson.com/iot/negative-reporting/</link>
      <pubDate>Tue, 13 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/negative-reporting/</guid>
      <description>&lt;p&gt;In academia, there&amp;rsquo;s an entrenched problem where &lt;em&gt;negative results are not published&lt;/em&gt;. Researchers found something interesting and published a paper; that&amp;rsquo;s great! Now we need other teams to confirm or deny whether this happened. &lt;strong&gt;This does not happen&lt;/strong&gt;. Reproductions (either positive or negative) are rarely published, and negative results especially (we ran the test but did not observe the same outcome) do not get published.&lt;/p&gt;

&lt;p&gt;This is bad, because a single &lt;em&gt;claim&lt;/em&gt; of a result is not very strong evidence. If multiple independent sources achieve the same result, &lt;em&gt;that&lt;/em&gt; is something we can be confident of knowing is true.&lt;/p&gt;

&lt;p&gt;Likewise, in security, we only report positive results. We only report on things which are broken. All day, every day, every IoT device that people look at has security problems.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Every. Single. One&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So you might as well assume that everything is insecure. Security almost &lt;em&gt;demands&lt;/em&gt; that you take that approach &amp;ndash; it&amp;rsquo;s the safe, conservative way to build secure systems! Don&amp;rsquo;t know how secure something is? &lt;strong&gt;Assume that it is insecure&lt;/strong&gt; and plan accordingly.&lt;/p&gt;

&lt;p&gt;What &lt;em&gt;is&lt;/em&gt; useful in this environment is&amp;hellip; negative results! &amp;ldquo;We tested this device and were not able to break into it.&amp;rdquo; That&amp;rsquo;s &lt;em&gt;extremely&lt;/em&gt; useful information! Yeah, you want someone to double-check it, to try more things. You want to know what things the researchers found that smelled funny but didn&amp;rsquo;t consistitute a vulnerability. You want to know what the researchers tried. You can then refer back to your threat model, adjust your &amp;lsquo;probability&amp;rsquo; figures appropriately, and have a better idea of what risks your business is exposed to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please publish negative results&lt;/strong&gt;. Please tell us attacks you tried but which failed. You&amp;rsquo;ll stand out as doing something odd. You might look silly if someone contradicts you later. But it&amp;rsquo;s the best information we can get right now.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Design assuming your security controls will fail</title>
      <link>https://ianhowson.com/iot/design-for-failure/</link>
      <pubDate>Sat, 10 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/design-for-failure/</guid>
      <description>

&lt;p&gt;When you produce your threat model, you&amp;rsquo;re multiplying the &lt;em&gt;probability&lt;/em&gt; of a threat by the &lt;em&gt;impact&lt;/em&gt; of that threat.&lt;/p&gt;

&lt;p&gt;IoT devices have three characteristics that work against you, as a device designer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They probably won&amp;rsquo;t get updates. It&amp;rsquo;s not in the financial interests of the business.&lt;/li&gt;
&lt;li&gt;Even if an update is available, users aren&amp;rsquo;t likely to install it. Users often don&amp;rsquo;t realise that the IoT device &lt;em&gt;exists&lt;/em&gt;!&lt;/li&gt;
&lt;li&gt;Some IoT devices operate for a long time &amp;ndash; decades, in some cases. Think SCADA hardware, HVAC controls or implantable medical devices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a general rule, attacks only get better with time. As a designer, you&amp;rsquo;re not just defending against the attacks that exist today &amp;ndash; you need to defend against future attacks that haven&amp;rsquo;t been invented yet!&lt;/p&gt;

&lt;p&gt;For these reasons, I believe that you should design your device &lt;em&gt;assuming that it will be breached&lt;/em&gt;. Once you adopt the mindset that some or all of your security controls &lt;em&gt;will&lt;/em&gt; fail, you&amp;rsquo;re in a better position to design them to minimise the impact of a breach.&lt;/p&gt;

&lt;h2 id=&#34;key-revocation&#34;&gt;Key revocation&lt;/h2&gt;

&lt;p&gt;Key revocation isn&amp;rsquo;t a &lt;em&gt;great&lt;/em&gt; strategy given the constraints of IoT devices, but there&amp;rsquo;s plenty of prior work that you can draw from.&lt;/p&gt;

&lt;p&gt;For example, the designers for DVD and Blu-ray assumed that some of the encryption keys would be leaked. They&amp;rsquo;re distributed in every single device (or software instance), they&amp;rsquo;re difficult to control (DVD/Blu-ray players are cost sensitive) and attackers have a strong incentive to extract the keys (high quality duplication and distribution of content). Knowing this, they designed the system so that leaked keys could be revoked and new content could not be played on compromised hardware. This also incentivises the hardware designers slightly &amp;ndash; if one of their players lost a key, they would be &amp;lsquo;punished&amp;rsquo; by having their players unable to play new content. (One might debate if this is a punishment or a blessing; they would have to explain to customers why their player won&amp;rsquo;t play new content, but many customers would just buy a new player.)&lt;/p&gt;

&lt;h2 id=&#34;defence-in-depth&#34;&gt;Defence in depth&lt;/h2&gt;

&lt;p&gt;If you use the model that individual controls will fail, the obvious solution is to have &lt;strong&gt;multiple&lt;/strong&gt; controls for a particular risk.&lt;/p&gt;

&lt;p&gt;The controls need to be as independent as possible. Having two separate software checks for unauthorised access doesn&amp;rsquo;t help you if the attacker uses a debugger to bypass them both. Having a software check &lt;em&gt;and&lt;/em&gt; an external hardware check would help.&lt;/p&gt;

&lt;h2 id=&#34;partitioning&#34;&gt;Partitioning&lt;/h2&gt;

&lt;p&gt;Keep high-risk areas of the system separate from low-risk areas. Where possible, separate independent risky systems. You don&amp;rsquo;t want a breach in one subsystem to impact another.&lt;/p&gt;

&lt;p&gt;SSH (the software) does this through &amp;ldquo;privilege separation&amp;rdquo;. Parts of it need to run as the root (administrative) user, but most of it can operate as a less-privileged user. To reduce the attack surface, SSH is split into different sections with different privilege requirements.&lt;/p&gt;

&lt;p&gt;Cars are another great example. Modern cars want integration between all car systems &amp;ndash; the user wants to be able to control everything from one interface. On one hand, you could reduce manufacturing costs by sharing CPUs and networks between the two. On the other hand, you don&amp;rsquo;t want a vulnerability in your media system from affecting safety-critical systems. Best to keep them separate and control the interfaces between them carefull.&lt;/p&gt;

&lt;h2 id=&#34;canaries-and-tamper-detection&#34;&gt;Canaries and tamper detection&lt;/h2&gt;

&lt;p&gt;If you&amp;rsquo;re storing sensitive data (e.g. content decryption keys) you might be better off destroying the keys and/or device if an intrusion is detected.&lt;/p&gt;

&lt;p&gt;This can be done in software. What an intrusion looks like varies tremendously between applications, but you might look for changes in memory that should not change, commands on debug ports, or &amp;lsquo;kill&amp;rsquo; commands on interfaces that follow the same pattern as regular commands but are designed to catch fuzzing.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve got a Linux system, a great strategy would be to build AppArmor profiles for your application. Any AppArmor violations &lt;em&gt;could&lt;/em&gt; trip a self-destruct. (Just limiting your application is a great start, even if you do nothing about violations!)&lt;/p&gt;

&lt;p&gt;If you (the vendor) have remote telemetry coming from devices in the field, it might alert you to attacks in progress.&lt;/p&gt;

&lt;p&gt;You can also use hardware modules that store secrets in a tamperproof manner. They cost money, of course.&lt;/p&gt;

&lt;h2 id=&#34;remote-killswitch&#34;&gt;Remote killswitch&lt;/h2&gt;

&lt;p&gt;The device can be remotely deactivated &amp;ndash; perhaps by network control, perhaps by a physical button or radio signal.&lt;/p&gt;

&lt;p&gt;This is useful for devices which can cause personal injury if something goes wrong &amp;ndash; autonomous aircraft, industrial equipment, surgery robots, perhaps even cars.&lt;/p&gt;

&lt;p&gt;Ideally, you want the &amp;lsquo;killswitch computer&amp;rsquo; to be separate from the &amp;lsquo;application computer&amp;rsquo; &amp;ndash; a compromised or damaged application computer might not execute the kill command correctly.&lt;/p&gt;

&lt;p&gt;A recent example of this is the Galaxy Note 7 recall, where &lt;a href=&#34;http://arstechnica.com/gadgets/2016/12/report-samsung-planning-to-permanently-disable-us-note-7s-soon/&#34;&gt;an OTA update disables charging&lt;/a&gt; to reduce the risk of fires.&lt;/p&gt;

&lt;!-- ## Monitoring and self-healing --&gt;

&lt;!--## Programmed suicide

if the device does not get a &#39;stay alive you&#39;re up to date&#39; message periodically, self-kill. if your company is dead anyway, you might as well not be a burden on the living.

remmeber that time is not reliable most of the time.
--&gt;

&lt;!--
TODO using apparmor/selinux to segment processes on an iot device; it’s lightweight (no major requirements beyond kernel) and can significantly reduce the impact of an intrusion. you’re basically trying to build a system that survives an intrusion (risk is high, you need to reduce the impact)
 --&gt;
</description>
    </item>
    
    <item>
      <title>Frequently asked questions</title>
      <link>https://ianhowson.com/iot/faq/</link>
      <pubDate>Thu, 08 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/faq/</guid>
      <description>

&lt;h2 id=&#34;tl-dr-what-is-going-wrong&#34;&gt;TL;DR: what is going wrong?&lt;/h2&gt;

&lt;p&gt;There is tremendous pressure for IoT devices to be cheap to manufacture. Cheap means that hardware capabilities are the bare minimum, firmware and security design quality suffer, and there will be no support after the device is sold. As a result, at release time, most devices have obvious vulnerabilities. The older a design gets, the more we learn about breaking it and the weaker the device gets. As there is no support, none of these vulnerabilities will be resolved.&lt;/p&gt;

&lt;h2 id=&#34;tl-dr-how-do-we-fix-it&#34;&gt;TL;DR: how do we fix it?&lt;/h2&gt;

&lt;p&gt;As a designer, the OWASP IoT recommendations are a good place to start. Produce a threat model and a risk analysis.&lt;/p&gt;

&lt;p&gt;Globally, the device side can&amp;rsquo;t be fixed. Most new devices will be full of vulnerabilities forever. People need to know that this is the case and trust devices accordingly. Our networks need to be resilient against potentially malicious devices.&lt;/p&gt;

&lt;h2 id=&#34;why-is-iot-special&#34;&gt;Why is IoT special?&lt;/h2&gt;

&lt;p&gt;It isn&amp;rsquo;t. The vast majority of software, systems and cloud security applies to IoT with no modification.&lt;/p&gt;

&lt;p&gt;IoT varies because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hardware and firmware are bundled together, causing severe cost constraints&lt;/li&gt;
&lt;li&gt;hardware and firmware are often developed by the same team; they usually don&amp;rsquo;t have software security training&lt;/li&gt;
&lt;li&gt;the business usually gets revenue through hardware sales, not subscription services, and so there&amp;rsquo;s no incentive to produce security updates after the sale is made&lt;/li&gt;
&lt;li&gt;often hardware capabilities are tiny, so you don&amp;rsquo;t have the same range of security controls available&lt;/li&gt;
&lt;li&gt;you don&amp;rsquo;t physically control the hardware, opening up a range of physical and electronic attacks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;why-don-t-we-just-put-regulations-on-device-manufacturers&#34;&gt;Why don&amp;rsquo;t we just put regulations on device manufacturers?&lt;/h2&gt;

&lt;p&gt;Because all device manufacturing is done in China. How does the U.S. or EU mandate something as vague as &amp;lsquo;secure device&amp;rsquo; in China?&lt;/p&gt;

&lt;p&gt;Besides, as a consumer, would you pay another $40 for your Internet router? Or an ongoing fee to keep the firmware up-to-date? Of course not! You love cheap stuff and unless you&amp;rsquo;re in the narrow subset of the population that actually knows that infosec is a thing, you&amp;rsquo;re going to buy the cheapest device that does what you want.&lt;/p&gt;

&lt;h2 id=&#34;who-are-you-and-why-should-i-listen-to-you&#34;&gt;Who are you and why should I listen to you?&lt;/h2&gt;

&lt;p&gt;I&amp;rsquo;ve been shipping embedded systems for over a decade. Most of them connected to the Internet. Some of them have been implanted into people&amp;rsquo;s bodies. Some of them have been hacked. I&amp;rsquo;ve also spent a lot of time attacking them, both as a penetration tester and as part of my own test procedures.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The same keys on every device</title>
      <link>https://ianhowson.com/iot/same-keys-every-device/</link>
      <pubDate>Wed, 07 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/same-keys-every-device/</guid>
      <description>

&lt;p&gt;In &amp;ldquo;&lt;a href=&#34;https://ianhowson.com/iot/loading-device-firmware/&#34;&gt;How does firmware get onto the device?&lt;/a&gt;&amp;rdquo;, we learned that every single IoT device is identical after leaving the manufacturing floor. This is different to traditional network security where every computer has unique private keys and unique passwords.&lt;/p&gt;

&lt;h2 id=&#34;common-variations-between-devices&#34;&gt;Common variations between devices&lt;/h2&gt;

&lt;p&gt;Anything with an Ethernet, WiFi or Bluetooth interface gets a unique MAC address. Theoretically, these &lt;em&gt;could&lt;/em&gt; be programmed at test/bringup time. In practice, this is hard to do (manufacturing floors don&amp;rsquo;t have reliable network access and can&amp;rsquo;t afford downtime). It turns out that EEPROM vendors will sell you tiny EEPROMs preprogrammed with a range of guaranteed unique MAC addresses, so they just populate the board with one of those. The production process is thus consistent! Many modern CPUs/SoCs/radio interfaces also provide a built-in unique device ID or MAC address, and so we use those.&lt;/p&gt;

&lt;p&gt;(There are plenty of Ethernet devices out there that don&amp;rsquo;t have unique MAC addresses at all; they just generate one randomly and hope for the best. Thirty cents saved. Just sayin&amp;rsquo;.)&lt;/p&gt;

&lt;p&gt;You can script your device to generate its own private keys at first-boot. I know that Raspberry Pi firmware images do this. I don&amp;rsquo;t know of any production IoT devices do this, probably because it delays bringup by a minute, and on a manufacturing floor, more time means more floor space means higher manufacturing cost.&lt;/p&gt;

&lt;p&gt;So, in practice, we have thousands to millions of devices being shipped which have &lt;em&gt;exactly&lt;/em&gt; the same private keys and &lt;em&gt;exactly&lt;/em&gt; the same hidden passwords.&lt;/p&gt;

&lt;h2 id=&#34;why-is-this-a-problem&#34;&gt;Why is this a problem?&lt;/h2&gt;

&lt;p&gt;Remember risk analysis: likelihood of breach times impact of a breach.&lt;/p&gt;

&lt;p&gt;For something like an SSL private key, the likelihood is very low. Consider the XBox public key attacks. We&amp;rsquo;ve got unlimited access to the public keys and hardware and known plaintexts. But without being able to generate that private key, we&amp;rsquo;re stuck.&lt;/p&gt;

&lt;!-- TODO url for the xbox attacks --&gt;

&lt;!-- TODO there was also PS3 that did a similar thing - its private keys were leaked --&gt;

&lt;p&gt;&lt;strong&gt;If&lt;/strong&gt; that private key is leaked or reconstructed somehow, the whole system falls apart. People can write their own software and run pirate games. There&amp;rsquo;s &lt;em&gt;massive&lt;/em&gt; impact that would end that product line and most future development for it.&lt;/p&gt;

&lt;p&gt;Consider a DVD player. It contains a symmetric encryption key which is shared across that class of players. If one player&amp;rsquo;s key leaks, the key is leaked for &lt;em&gt;all&lt;/em&gt; players using that key. There was consideration given to this in the DVD CSS scheme (new content will not be decrytable using a revoked key), but the impact is still huge. All past content is now decryptable because one device out of &lt;em&gt;billions&lt;/em&gt; was compromised. So the likelihood and impact are both fairly high; this is (was) a high-risk system.&lt;/p&gt;

&lt;p&gt;Impersonation becomes a real issue. If you&amp;rsquo;re trying to attack a particular device on a target network (say, a router or a camera), you don&amp;rsquo;t have to gain access to that exact device. You can buy a device &lt;em&gt;of the same type&lt;/em&gt; and attack that instead, in the comfort of your own home/office/dungeon, taking as much time as you like. You can buy a &lt;em&gt;hundred&lt;/em&gt; of them and subject them to a range of attacks, including destructive attacks. You can backdoor one of your devices and physically swap it with the target device. And in extreme examples &amp;ndash; say the device carries a private key &amp;ndash; you can extract the private key from your device and use it to perform crypto-level attacks on the target device.&lt;/p&gt;

&lt;p&gt;None of this is possible if each device has its own unique private keys.&lt;/p&gt;

&lt;h2 id=&#34;ok-so-i-ll-generate-private-keys-on-first-boot&#34;&gt;OK, so I&amp;rsquo;ll generate private keys on first boot&lt;/h2&gt;

&lt;p&gt;Did you put in a reasonable source of randomness? Most embedded devices have no random number source; many don&amp;rsquo;t even have a clock (which is a weak but tolerable substitute). There&amp;rsquo;s no point in generating keys on first boot if every device &amp;lsquo;randomly&amp;rsquo; generates the same keys because they&amp;rsquo;re starting from the same state. Manufacturing processes encourage every device to be identical!&lt;/p&gt;

&lt;p&gt;Your unique identifier can help a lot. It&amp;rsquo;s predictable, but at least every device will be different.&lt;/p&gt;

&lt;!-- TODO
iot/embedded devices frequently reuse private keys: http://blog.sec-consult.com/2016/09/house-of-keys-9-months-later-40-worse.html?m=1
http://blog.sec-consult.com/2015/11/house-of-keys-industry-wide-https.html

http://blog.sec-consult.com/2016/04/smart-home-security.html?m=1
--&gt;
</description>
    </item>
    
    <item>
      <title>How does firmware get onto the device?</title>
      <link>https://ianhowson.com/iot/loading-device-firmware/</link>
      <pubDate>Wed, 07 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/loading-device-firmware/</guid>
      <description>

&lt;p&gt;One of the big difference between IoT devices and software is that IoT devices are &lt;em&gt;manufactured&lt;/em&gt;. Manufacturing processes focus on consistency and reproducibility; all variations must be eliminated.&lt;/p&gt;

&lt;p&gt;Once the hardware is assembled, the firmware must be written to the device. Usually, this is done in one of three ways: during testing, using pre-programmed chips, or during chip manufacturing.&lt;/p&gt;

&lt;h2 id=&#34;during-testing&#34;&gt;During testing&lt;/h2&gt;

&lt;p&gt;During manufacturing, a hardware assembly will undergo basic electrical testing. Usually this is achieved by putting the assembly into a test jig, connecting to test points on the assembly with pogo pins, and checking that various signals and voltages on the board are within tolerance. Assemblies that fail these checks are returned for rework or discarded.&lt;/p&gt;

&lt;p&gt;This is a convenient time to write firmware to the device. This happens automatically if the device passes electrical testing. It costs a few seconds to a minute, depending on the design of the device.&lt;/p&gt;

&lt;p&gt;Often, a small bootstrap firmware image is all that is written. Test jig time is expensive and it&amp;rsquo;s slow to write large amounts of data. It&amp;rsquo;s also difficult to change the firmware image after the manufacturing line is set up. From a project management point of view, a late firmware project doesn&amp;rsquo;t delay setup of the manufacturing line. So a minimal bootstrap image is written through the jig, and final production firmware (including any updates) is written at a later stage.&lt;/p&gt;

&lt;p&gt;This is the most common way to get initial firmware onto a device.&lt;/p&gt;

&lt;h2 id=&#34;pre-programmed-chips&#34;&gt;Pre-programmed chips&lt;/h2&gt;

&lt;p&gt;Most devices use blank off-the-shelf chips and write their own image to them. Sometimes, it&amp;rsquo;s easier to program the chips before assembly onto the PCB. This might be done as an in-house manufacturing step using a dedicated device programmer, or it might be done by a third party (usually the chip manufacturer). The same guidelines as above apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the image is difficult to change, so you either write a bootstrap image or be &lt;em&gt;very&lt;/em&gt; confident that you&amp;rsquo;re not going to change the image later&lt;/li&gt;
&lt;li&gt;obviously, there&amp;rsquo;s no permissible variation between the images&lt;/li&gt;
&lt;li&gt;incorrectly written firmware can write off an entire assembly, so your cost will increase slightly (assembly writeoffs, plus your chip is going to cost a little more)&lt;/li&gt;
&lt;li&gt;you have to test the assembly &lt;em&gt;anyway&lt;/em&gt;, so there&amp;rsquo;s not a big argument to be made for reducing the number of test points on the PCB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is mostly used when you have a stable, high-volume product.&lt;/p&gt;

&lt;h2 id=&#34;built-in-cpu-support&#34;&gt;Built-in CPU support&lt;/h2&gt;

&lt;p&gt;Many modern CPUs, SoCs and microcontrollers have bootloaders built in. Some of these are remarkably sophisticated, able to interface with MicroSD storage, parse FAT filesystems or communicate over network interfaces.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re custom-building your SoC (say, for hidden crypto/trust modules) then you&amp;rsquo;re in a good position to bake in a bootloader that works for your hardware platform and thus save some manufacturing time.&lt;/p&gt;

&lt;p&gt;You might not need to write firmware to the device at all until just before it&amp;rsquo;s shipped.&lt;/p&gt;

&lt;h2 id=&#34;all-devices-are-the-same&#34;&gt;All devices are the same&lt;/h2&gt;

&lt;p&gt;Manufacturing processes are driven by two concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimise cost. Cost is driven by manufacturing time, equipment cost, floor space and human interaction.&lt;/li&gt;
&lt;li&gt;Minimise variability. A consistent, predictable process can be optimised for lower cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the end of the manufacturing process, &lt;em&gt;every single device is identical&lt;/em&gt;. Any variation &amp;ndash; except in very specific, controlled ways, such as a unique ID chip &amp;ndash; is considered waste and must be eliminated.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Commentary on the Sony IPELA IP Camera backdoor</title>
      <link>https://ianhowson.com/iot/sony-ipela-backdoor/</link>
      <pubDate>Wed, 07 Dec 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/sony-ipela-backdoor/</guid>
      <description>

&lt;p&gt;It turns out that a range of Sony IP cameras had a hidden telnet/SSH server: &lt;a href=&#34;http://blog.sec-consult.com/2016/12/backdoor-in-sony-ipela-engine-ip-cameras.html?m=1&#34;&gt;http://blog.sec-consult.com/2016/12/backdoor-in-sony-ipela-engine-ip-cameras.html?m=1&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&#34;what-s-good-about-the-design&#34;&gt;What&amp;rsquo;s good about the design?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The servers weren&amp;rsquo;t wide open to the world&lt;/strong&gt;. Getting access required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a firmware dump (easy; available on the public Internet)&lt;/li&gt;
&lt;li&gt;analysis of the dump (usually hard, but reasonably automated in this case)&lt;/li&gt;
&lt;li&gt;reversing password hashes

&lt;ul&gt;
&lt;li&gt;Difficulty depends on the password, but unfortunately one of them was &amp;lsquo;admin&amp;rsquo;. The other is as-yet unknown, so&amp;hellip; hard?&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;disassembly of the firmware to figure out how to access the servers (hard)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So while this looks bad on the surface, this hack actually required a lot of effort. While obviously critically flawed, in the ecosystem of IoT devices, &lt;em&gt;this one is better than most&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sony responded appropriately&lt;/strong&gt; and released an update for the cameras.&lt;/p&gt;

&lt;h2 id=&#34;what-s-bad-about-the-design&#34;&gt;What&amp;rsquo;s bad about the design?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The servers weren&amp;rsquo;t disabled before the cameras were shipped out&lt;/strong&gt;. This, in my mind, is the critical problem. The manufacturing line &lt;em&gt;needs&lt;/em&gt; to have privileged access to the device; this is where firmware gets uploaded, hardware gets calibrated and the device is tested. The servers need to be present. They &lt;em&gt;must not&lt;/em&gt; be enabled after device shipment.&lt;/p&gt;

&lt;p&gt;I can understand each device having the same passwords. This is a manufacturing convenience which saves money and time. Every device gets the same passwords, has the same public keys and the same binary firmware image. If you&amp;rsquo;re coming from a desktop/mobile security perspective this is problematic, but in the IoT space, sorry, cost concerns override the impurity of having a million devices with the same keys.&lt;/p&gt;

&lt;p&gt;Some devices &amp;ndash; especially Internet routers &amp;ndash; will assign a different password either at manufacture time or based off a unique device ID embedded in the hardware. The cameras certainly have a unique MAC address, so the hardware is present.&lt;/p&gt;

&lt;h2 id=&#34;what-did-we-learn&#34;&gt;What did we learn?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you have hidden secrets on your device, they will be discovered given enough time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once any secrets are out, they&amp;rsquo;re out for a whole class of devices&lt;/strong&gt;. All cameras with this firmware are now vulnerable. One key mitigation for this type of attack is that each device (singular) get different keys and passwords; at least then a breach of one device only affects that device.&lt;/p&gt;

&lt;h2 id=&#34;speculation&#34;&gt;Speculation&lt;/h2&gt;

&lt;p&gt;I don&amp;rsquo;t think this was intentional (in the sense of &amp;ldquo;hey let&amp;rsquo;s run telnet and SSH nobody will notice&amp;rdquo;). Certainly Sony have enough smart engineers to consider the security ramifications of onboard web, telnet and SSH servers; it&amp;rsquo;s even likely that they have a threat model and risk analysis. They also have enough history manufacturing this sort of device that they know that manufacturing wants certain access to the device. And this same mistake hasn&amp;rsquo;t been found in other Sony devices to date.&lt;/p&gt;

&lt;p&gt;My bet is that the firmware engineers have handed this off to manufacturing saying, &amp;ldquo;hey, we enable telnet and SSH so you can test and calibrate on the line. Make sure you turn it off.&amp;rdquo; And manufacturing, being a totally different sort of engineer, have written their scripts that run over SSH, set up the production lines and forgotten the warning. Or maybe they left it on not understanding the security impact; disabling the servers makes their life difficult if they want to retest or service a device. It&amp;rsquo;s an easy mistake to make in a big company.&lt;/p&gt;

&lt;h2 id=&#34;what-s-the-impact&#34;&gt;What&amp;rsquo;s the impact?&lt;/h2&gt;

&lt;p&gt;Usual vulnerability disclosure ethics require that you give the vendor some time to correct the vulnerability before publishing it. This has been done here; all credit to the SEC Consult team. But I can&amp;rsquo;t help but feel that this is bad policy for IoT devices. Sony have produced new firmware, distributed it, and yet&amp;hellip; the vast majority of cameras in the wild will not get the update. They will be vulnerable and had someone not gone looking (using the specialist knowledge and tools above) it&amp;rsquo;s unlikely that the vulnerability would have been discovered. Certainly, there are more profitable places for miscreants to search for vulnerabilities on their own.&lt;/p&gt;

&lt;p&gt;So while I advocate openness and disclosure, I think the usual disclosure policy might need some adjustment for devices which can&amp;rsquo;t be easily updated.&lt;/p&gt;

&lt;p&gt;As a result, any Internet-connected devices using this firmware will now be easily harvested for botnets.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>What&#39;s the least I can know?</title>
      <link>https://ianhowson.com/iot/iot-security-start-here/</link>
      <pubDate>Tue, 29 Nov 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/iot-security-start-here/</guid>
      <description>&lt;p&gt;&lt;strong&gt;IoT devices have security issues because they&amp;rsquo;re built to be as cheap as possible.&lt;/strong&gt; The hardware required to provide adequate security is expensive, large, and consumes a lot of power. IoT devices stay &amp;lsquo;in the field&amp;rsquo; for a long time and the business model of most vendors does not incentivise them to produce security updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Produce a threat model and a risk analysis&lt;/strong&gt; before you do anything else. Most devices do not need strong security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://ianhowson.com/iot/hardware-classes/&#34;&gt;Figure out what sort of hardware you have&lt;/a&gt;&lt;/strong&gt;. This will dictate what security controls are available to you. The vast majority of IoT devices in the field are not capable of strong security.&lt;/p&gt;

&lt;p&gt;The pinnacle of IoT security is the modern iPhone. By learning about its security measures, you will learn a lot about what is required to produce a secure IoT device. It&amp;rsquo;s difficult and expensive.&lt;/p&gt;

&lt;p&gt;If your device controls something of value, you should &lt;strong&gt;assume that your device will be compromised&lt;/strong&gt;. Plan accordingly. &lt;a href=&#34;https://ianhowson.com/iot/design-for-failure/&#34;&gt;Design the device to minimise the impact of a breach&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>What is IoT?</title>
      <link>https://ianhowson.com/iot/what-is-iot/</link>
      <pubDate>Sat, 26 Nov 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/what-is-iot/</guid>
      <description>

&lt;p&gt;The Internet of Things blah blah revolutionise fifty billions blah blah everything connects to the Internet blah change your life.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a simple, boring, accurate definition: Internet of Things is a new name for &amp;lsquo;embedded systems&amp;rsquo;. There are &lt;em&gt;already&lt;/em&gt; billions of invisible networked devices controlling parts of your life, and they&amp;rsquo;ve been running for decades. The major change is that more of them are connecting to the Internet.&lt;/p&gt;

&lt;h2 id=&#34;ok-what-s-an-embedded-system&#34;&gt;OK, what&amp;rsquo;s an embedded system?&lt;/h2&gt;

&lt;p&gt;A computer that is part of a device. Perhaps an electronic device that contains a computer.&lt;/p&gt;

&lt;p&gt;Where a general-purpose computer can be adapted by the user to suit many applications, an embedded system is pre-programmed for a single, specialised purpose. Usually you will buy the device hardware and software as a single unit for a single purpose.&lt;/p&gt;

&lt;h2 id=&#34;you-said-that-these-have-been-around-for-decades-how-is-that-possible&#34;&gt;You said that these have been around for decades. How is that possible?&lt;/h2&gt;

&lt;p&gt;Here&amp;rsquo;s a few examples. You will own many of them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Internet routers&lt;/li&gt;
&lt;li&gt;HVAC controls&lt;/li&gt;
&lt;li&gt;Car ECUs&lt;/li&gt;
&lt;li&gt;Fitbit&lt;/li&gt;
&lt;li&gt;Electronic children&amp;rsquo;s toys&lt;/li&gt;
&lt;li&gt;Battery chargers&lt;/li&gt;
&lt;li&gt;DVD players&lt;/li&gt;
&lt;li&gt;Hearing aids&lt;/li&gt;
&lt;li&gt;That box on the street that connects your house to the fibre network&lt;/li&gt;
&lt;li&gt;Electronic door locks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are all embedded systems. They all contain a programmable computer. Most of them connect to networks, some of them to the Internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every single one of these&lt;/strong&gt; has a slew of security vulnerabilities. They were all designed to be cheap to manufacture. Practically none of them receive security updates.&lt;/p&gt;

&lt;p&gt;Some IoT/embedded devices that give particular attention to security features are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DVD players (copy protection of discs)&lt;/li&gt;
&lt;li&gt;Game consoles (copy protection of discs)&lt;/li&gt;
&lt;li&gt;Pay TV/cable boxes (ensuring that customers have paid for their service)&lt;/li&gt;
&lt;li&gt;Smartphones (sandboxed execution of downloaded apps)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We&amp;rsquo;ll come back to these, as they provide great examples of the cost tradeoffs that we need to make to achieve good security.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Blockchains on IoT devices</title>
      <link>https://ianhowson.com/iot/blockchains-on-iot-devices/</link>
      <pubDate>Fri, 25 Nov 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/blockchains-on-iot-devices/</guid>
      <description>

&lt;p&gt;Assuming you have an application that warrants building a blockchain and further, that it needs to be running on an IoT device, there are a few major implications for your device&amp;rsquo;s design and cost that follow.&lt;/p&gt;

&lt;h2 id=&#34;1-you-need-the-full-suite-of-cryptographic-capabilities&#34;&gt;1. You need the full suite of cryptographic capabilities&lt;/h2&gt;

&lt;p&gt;The smallest device that you&amp;rsquo;re reasonably going to fit a blockchain application into will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Be capable of running Linux&lt;/li&gt;
&lt;li&gt;Have flexible storage (i.e. a filesystem with integrity guarantees)&lt;/li&gt;
&lt;li&gt;With decent Internet connectivity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This excludes most of the cheap/low power hardware platforms and sets a minimum hardware cost starting at around $15.&lt;/p&gt;

&lt;p&gt;You could squeeze things harder &amp;ndash; Blockchain crypto only requires kilobytes of RAM &amp;ndash; but in 2016, your effort needs to be in getting the blockchain side of things right, not in trying to reimplement everything to save RAM.&lt;/p&gt;

&lt;h2 id=&#34;2-you-need-flexible-storage&#34;&gt;2. You need flexible storage&lt;/h2&gt;

&lt;p&gt;You&amp;rsquo;re going to store a lot of data and update it regularly. A filesystem isn&amp;rsquo;t a strict requirement, but it&amp;rsquo;s going to make your life a lot easier.&lt;/p&gt;

&lt;p&gt;Most devices have unreliable power and so your life will be a lot easier if you use a filesystem with atomicity and integrity guarantees. Better to re-download and re-verify the last few blocks than the whole chain.&lt;/p&gt;

&lt;h2 id=&#34;3-you-need-robust-updates&#34;&gt;3. You need robust updates&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Blockchain is new and largely untested&lt;/li&gt;
&lt;li&gt;We&amp;rsquo;re regularly finding vulnerabilities in old, well-reviewed codebases&lt;/li&gt;
&lt;li&gt;Attackers will have a good reason to attack your system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you will need to be able to roll out updates to your devices quickly and securely.&lt;/p&gt;

&lt;p&gt;How you do this in a decentralised manner is an interesting problem. Again, if you&amp;rsquo;re centralising updates, you might as well centralise the whole database. If you &lt;em&gt;don&amp;rsquo;t&lt;/em&gt; centralise updates, who provides updates, and how do you trust them?&lt;/p&gt;

&lt;h2 id=&#34;4-think-about-what-happens-as-your-blockchain-grows&#34;&gt;4. Think about what happens as your blockchain grows&lt;/h2&gt;

&lt;p&gt;Blockchains only grow in length. As transactions are added, the chain gets longer, without bound.&lt;/p&gt;

&lt;p&gt;This length is burdensome for Bitcoin right now &amp;ndash; we&amp;rsquo;re at about 80GB, which is about $40 worth of flash memory.&lt;/p&gt;

&lt;p&gt;If your device is going to run for say, five years, you need to provide enough storage to last &lt;em&gt;the life of the device&lt;/em&gt;, not just the current size of the blockchain.&lt;/p&gt;

&lt;p&gt;To address this, many Bitcoin clients only track recent transactions &amp;ndash; say, the most recent gigabyte. This reduces the size of the storage required, and importantly &lt;em&gt;bounds&lt;/em&gt; it to something predictable. It has two problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You don&amp;rsquo;t have the whole state of the system available. If you need data which is not in your &amp;lsquo;recent transactions&amp;rsquo; list, you need to retrieve it from somewhere. If you&amp;rsquo;re doing a property ownership blockchain, for instance, the last transaction on a property in question may have been 25 years ago.&lt;/li&gt;
&lt;li&gt;You need to have a trusted third party who stores the whole transaction history. The difficulty of finding a &amp;lsquo;trusted third party&amp;rsquo; is much of why blockchains are interesting right now!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both of these problems can be solved (hashes over un-stored parts of the blockchain, DHT/torrents for retrieval), but impose further restrictions on your application. We don&amp;rsquo;t have concrete, reliable solutions to them right now.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re considering running servers to store more transaction history, the same problems as with IoT updates exist: your company may not exist in a few years and so your blockchain will stop working. Of course, this defeats the whole point of a blockchain!&lt;/p&gt;

&lt;h2 id=&#34;5-consider-battery-life&#34;&gt;5. Consider battery life&lt;/h2&gt;

&lt;p&gt;Devices running the above stack need moderate amounts of power, as far as IoT devices go. At minimum, you&amp;rsquo;ll need to be fixed to an external power source or have a li-ion battery and charger. You can&amp;rsquo;t run a Linux machine on a primary coin cell for any length of time.&lt;/p&gt;

&lt;p&gt;Most modern devices achieve good battery life by turning off the CPU as much as possible. Blockchains, by their design, require a decent amount of computation just to track the active state of the chain.&lt;/p&gt;

&lt;h2 id=&#34;6-mining-on-iot-devices&#34;&gt;6. Mining on IoT devices?&lt;/h2&gt;

&lt;p&gt;You could, if you really wanted to, but your device will then consume so much power that it will need to be plugged into a wall socket. If you (as a miner) wanted a disproportionate amount of mining power, you could run the same software on a desktop machine (or GPU, or ASIC, or whatever). Using proof-of-work mining on battery-powered devices (e.g. cell phones) is a bad idea.&lt;/p&gt;

&lt;p&gt;If you can find a proof-of-stake algorithm that you trust (in 2016, still an open problem) then that would probably be feasible to run on a battery-powered device.&lt;/p&gt;

&lt;h2 id=&#34;7-reliable-internet-connectivity&#34;&gt;7. Reliable Internet connectivity&lt;/h2&gt;

&lt;p&gt;Lots of the applications for IoT blockchains require that the device be occasionally offline. This is fine if you&amp;rsquo;re reading data out of the blockchain, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you might miss out on recent transactions&lt;/li&gt;
&lt;li&gt;you can&amp;rsquo;t add transactions to the blockchain without being online&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;8-so-we-just-use-cellphones-then&#34;&gt;8. So we just use cellphones, then?&lt;/h2&gt;

&lt;p&gt;Pretty much. A modern cellphone with a good Internet connection ticks all of the boxes. Thanks to economies of scale, you can&amp;rsquo;t build custom hardware cheaper.&lt;/p&gt;

&lt;!--
TODO
https://www.google.com.au/search?q=blockchains+on+iot+devices&amp;oq=blockchains+on+iot+devices&amp;aqs=chrome..69i57.5478j0j7&amp;sourceid=chrome&amp;ie=UTF-8
--&gt;
</description>
    </item>
    
    <item>
      <title>Hardware classes of embedded/IoT devices</title>
      <link>https://ianhowson.com/iot/hardware-classes/</link>
      <pubDate>Wed, 23 Nov 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/iot/hardware-classes/</guid>
      <description>

&lt;p&gt;Every IoT device has a different hardware design. Each has different capabilities and makes different security tradeoffs. It&amp;rsquo;s helpful to describe broad &amp;lsquo;hardware classes&amp;rsquo; or &amp;lsquo;technology levels&amp;rsquo; of devices.&lt;/p&gt;

&lt;p&gt;These classes let us classify the capabilities and cost of devices and help us to understand why particular tradeoffs are made in a device&amp;rsquo;s design.&lt;/p&gt;

&lt;!-- TODO
    this is an interesting parallel to your hardware classes piece: https://imgtec.com/markets/internet-of-things/ - they have a table with different types
        differentiation by network type (wifi, btle etc) is interesting
 --&gt;

&lt;h2 id=&#34;huge&#34;&gt;Huge&lt;/h2&gt;

&lt;p&gt;These are 32 or 64-bit CPUs, usually ARM, with an MMU and external RAM. They can run Linux. Implicit in this is that they can also run OpenSSL and perform cryptographic operations (particularly public key operations) quickly.&lt;/p&gt;

&lt;p&gt;Power envelope is the largest of anything discussed here, with minimum continuous power draw typically in the 10mW range and very large peak power consumption (watts). Raspberry Pi and BeagleBone are common development boards that fall into this class.&lt;/p&gt;

&lt;p&gt;These are easy to develop for &amp;ndash; you can use an off-the-shelf Linux distribution in most cases. Your chip vendor will probably supply one for you.&lt;/p&gt;

&lt;p&gt;Because you&amp;rsquo;ve got lots of RAM and can afford dynamic allocation, you can use higher-level languages than C or C++. Your development will be faster and cheaper if you use something like Python or Go for non-real-time parts of the application.&lt;/p&gt;

&lt;p&gt;The hardware requires a CPU, external RAM (sometimes on-package, but always separate die), external flash and relatively complex power supplies. The increase in COGS is typically $15 or more, which translates to a $30-$100 difference in the final retail price of the device.&lt;/p&gt;

&lt;h3 id=&#34;running-linux&#34;&gt;Running Linux?&lt;/h3&gt;

&lt;p&gt;Linux is a big, heavy operating system for an embedded system. It has longer boot times, unpredictable real-time behaviour and a large, complex software stack that is difficult to reason about. You gain fast firmware development, easy access to third-party software and flexible filesystems.&lt;/p&gt;

&lt;p&gt;Linux can take a long time to boot &amp;ndash; often longer than users will tolerate. Typical boot time is 30-60 seconds. Sub-20 seconds is achievable without much work. There are research efforts to bring this down to under five seconds. Five seconds is still too long for many applications (control systems &amp;ndash; medical, industrial, drones) but they will usually use separate microcontrollers for the time-critical functions.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;possibility&lt;/em&gt; that a Linux system might not boot reliably or in a predictable amount of time (e.g. failed &lt;code&gt;fsck&lt;/code&gt;, down network) can be enough to eliminate it from many applications. Without a UI or human to power-cycle it, you may not be able to bring it up again.&lt;/p&gt;

&lt;p&gt;Lots of Linux systems ship with an extra microcontroller (like IPMI) that can reset the machine if it doesn&amp;rsquo;t respond in a fixed period of time. Often this micro will need to talk over the network &amp;ndash; with all of the associated security risks. Now you have &lt;em&gt;two&lt;/em&gt; embedded systems to worry about!&lt;/p&gt;

&lt;p&gt;Embedded Linux devices are vulnerable to all of the same problems that a Linux machine on the Internet has. You&amp;rsquo;ll need to ship regular security updates, which will probably require a longish period of downtime to apply.&lt;/p&gt;

&lt;h2 id=&#34;large&#34;&gt;Large&lt;/h2&gt;

&lt;p&gt;You can get the same CPU power as a Huge device (32-bit ARM clocked at anything you like) but bundled with on-chip RAM and flash. This is a great compromise for many devices. COGS is significantly lower due to the integrated storage and the device will have more built-in peripherals.&lt;/p&gt;

&lt;p&gt;These can have megabytes of flash and RAM or as little as kilobytes. Power consumption can be very low, but you must go smaller for the lowest-power devices (microwatt and lower).&lt;/p&gt;

&lt;p&gt;You can&amp;rsquo;t (as of 2016) run Linux on these as they do not have enough RAM or an MMU. You need to select an RTOS or ucLinux. You can&amp;rsquo;t just pull software components from the Internet; you must consider how they will be integrated into your firmware. As a result, firmware development time will be longer.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;might&lt;/em&gt; be able to run an interpreted language on here (perhaps Go or JavaScript), but this is rare in practice. Even though the hardware is capable, most developers elect to use C or C++. You &lt;em&gt;do&lt;/em&gt; have the luxury of dynamic memory allocation, should you choose to use it.&lt;/p&gt;

&lt;p&gt;As you&amp;rsquo;re using an RTOS from a third-party vendor, you probably have all of the vulnerabilities that a Linux system does, but without the intense scrutiny that Linux has. In other words, you&amp;rsquo;re just as vulnerable, but you don&amp;rsquo;t know it.&lt;/p&gt;

&lt;p&gt;These devices are capable of the whole range of cryptographic operations, but because you&amp;rsquo;re using a custom software stack, you can&amp;rsquo;t just drop in OpenSSL. You need to be very careful to ensure that any cryptographic libraries you include are correct, secure, and legal for you to bundle with your product (in the export controls sense). Supporting the full range of certificate operations (expirations, revocations, updates) requires flexibility in your use of flash, and that is challenging as you don&amp;rsquo;t have a general-purpose filesystem to rely on.&lt;/p&gt;

&lt;p&gt;Increase in COGS is $1-$5.&lt;/p&gt;

&lt;p&gt;Sometimes these are embedded into other SoCs, such as WiFi modules.&lt;/p&gt;

&lt;h2 id=&#34;medium&#34;&gt;Medium&lt;/h2&gt;

&lt;p&gt;16-bit with integrated RAM and flash. There is no dominant architecture at this time. You&amp;rsquo;ve probably got 16-1024kb of onboard RAM. Clock rates range from 4-80MHz.&lt;/p&gt;

&lt;p&gt;You can use C++ but the storage and runtime overheads may be burdensome. You might be restricted to plain C. You &lt;em&gt;could&lt;/em&gt; use dynamic memory allocation, but you probably don&amp;rsquo;t have enough RAM to do so safely.&lt;/p&gt;

&lt;p&gt;To save power there are provisions for switching the CPU and peripherals off. You usually won&amp;rsquo;t see sub-1MHz clock rates.&lt;/p&gt;

&lt;p&gt;Public key crypto is doable but takes significant development effort. The CPU time required might be noticeable by the user. You should consider using an external cryptoprocessor.&lt;/p&gt;

&lt;p&gt;These are often embedded in BLE and Bluetooth chipsets.&lt;/p&gt;

&lt;h2 id=&#34;small&#34;&gt;Small&lt;/h2&gt;

&lt;p&gt;8-bit CPUs e.g. AVR8, 8-bit PIC, 8051.&lt;/p&gt;

&lt;p&gt;Often Harvard architecture (split instruction/data memories). Often the instruction memory is mapped directly to flash. This is interesting for security as buffer overflows will trash data RAM but not affect instructions.&lt;/p&gt;

&lt;p&gt;These are often embedded in smaller RF chips such as the Nordic nRF series.&lt;/p&gt;

&lt;p&gt;They can have tiny power consumption if programmed appropriately; they can run for years on a coin cell.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;technically&lt;/em&gt; can run C++ code on these, but there&amp;rsquo;s no point. You can&amp;rsquo;t fit much code on them in the first place, C++ gets poor compiled code density, and you&amp;rsquo;d only use C++ if you have complex software anyway. So just stick with C.&lt;/p&gt;

&lt;p&gt;Because quiescent power consumption is so low, some applications stop thinking in terms of continuous power consumption (e.g. 10$\mu$A continuous) and start thinking in terms of number of power-consuming operations (e.g. 400 door strike activations on a single primary cell). You don&amp;rsquo;t use rechargeable cells in these applications because self-discharge is too high. This saves further on BOM cost.&lt;/p&gt;

&lt;p&gt;Symmetric crypto is usually OK on these, but key setup time can be noticeable. Public key crypto is generally impossible as they don&amp;rsquo;t have enough RAM to hold the key. Of course, any cryptography is going to run the CPU hard for a long time, and this is going to hurt your battery life.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re using coin cells to run your device, the peak power consumption can pull their voltage low enough that the CPU will brown out (or worse, latch up). This is problematic for boot-time firmware verification; a device will be fine if you leave it on, but if you turn it off it won&amp;rsquo;t be able to start again.&lt;/p&gt;

&lt;h2 id=&#34;tiny&#34;&gt;Tiny&lt;/h2&gt;

&lt;p&gt;8-bit, under 128 bytes of static RAM and maybe a kilobyte of flash &amp;ndash; tinyAVR, for example.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s probably not enough RAM to store a symmetric key, so &lt;em&gt;any&lt;/em&gt; crypto is challenging here. Some algorithms have smaller working set sizes, but you&amp;rsquo;re in desperate territory. The vast majority of designs that need cryptography will select a large CPU or have a dedicated cryptoprocessor to do the heavy lifting.&lt;/p&gt;

&lt;p&gt;Typical clock rates are 128kHz or 4-16MHz. If you&amp;rsquo;re in the MHz range and have enough RAM, you can run &lt;em&gt;some&lt;/em&gt; symmetric algorithms. kHz-range designs will incur noticeable delays.&lt;/p&gt;

&lt;p&gt;You can use C. You might use assembler if you&amp;rsquo;re counting pennies.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>CPU clock rates are meaningless now</title>
      <link>https://ianhowson.com/blog/cpu-clock-rates-are-meaningless-now/</link>
      <pubDate>Tue, 15 Nov 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/cpu-clock-rates-are-meaningless-now/</guid>
      <description>

&lt;h2 id=&#34;history&#34;&gt;History&lt;/h2&gt;

&lt;p&gt;Way back when, you bought a CPU, and it had a marked clock rate. You ran it at that rate. The end.&lt;/p&gt;

&lt;p&gt;Later you got a turbo button, but that was more for application compatibility. We were spinning for a fixed number of cycles to mark time!&lt;/p&gt;

&lt;p&gt;Around the Athlon time, we started throttling CPUs. It turned out that they could be damaged if run too hot for too long, and laptops were having trouble getting the heat out. So Intel (and later AMD) parts started to slow themselves down if they got to a dangerous temperature.&lt;/p&gt;

&lt;h2 id=&#34;turbo-boost&#34;&gt;Turbo Boost&lt;/h2&gt;

&lt;p&gt;Later still &amp;ndash; within the last five years &amp;ndash; we got &amp;lsquo;Turbo Boost&amp;rsquo;. Originally, this was to reflect that the CPU could run faster &lt;em&gt;for a very brief time&lt;/em&gt;, but eventually we would be unable to remove the heat fast enough and the CPU could reach dangerous temperatures again. In some ways, this reflected the thermal mass of the CPU, its heatspreader and the immediate heatsink. Heatpipes were now in common use, and while they could remove a lot of heat from a small area, they couldn&amp;rsquo;t change the rate of heat conductance rapidly. While desktops were usually designed to remove all of the heat that the CPU could produce at maximum power, laptops couldn&amp;rsquo;t afford this &amp;ndash; the space and weight required was just too great.&lt;/p&gt;

&lt;p&gt;Recently, &amp;ldquo;a brief time&amp;rdquo; has become &amp;ldquo;a really long time&amp;rdquo;. My wife&amp;rsquo;s Macbook Air, for instance, runs at a &amp;lsquo;base clock&amp;rsquo; of 1.6GHz. If you watch the actual CPU speed, however, it &lt;em&gt;never&lt;/em&gt; runs at 1.6GHz. If it&amp;rsquo;s idle, it will run at less than 1GHz (and it&amp;rsquo;ll actually be asleep for much of that). If you work it hard, it&amp;rsquo;ll increase to 2.4GHz. For as long as the workload lasts. So there&amp;rsquo;s no thermal mass effect here &amp;ndash; it&amp;rsquo;s just 1GHz/sleeping for low load, 2.4GHz for high load, and somewhere in the middle for a mixed load.&lt;/p&gt;

&lt;p&gt;Under high load, the clock rate is determined by the cooling capacity of the laptop. But &amp;ndash; importantly! &amp;ndash; there are no circumstances under which the laptop will &amp;lsquo;prefer&amp;rsquo; to run at its rated &amp;lsquo;base clock&amp;rsquo; of 1.6GHz. There&amp;rsquo;s no point. The CPU can adjust its clock rate anywhere from about 1GHz to 2.4GHz in fine-grained steps, and it chooses the exact clock rate that it needs to balance performance and energy efficiency.&lt;/p&gt;

&lt;h2 id=&#34;so-what-is-the-base-clock&#34;&gt;So what is the base clock?&lt;/h2&gt;

&lt;p&gt;Intel Ark has this to say about &amp;ldquo;Processor Base Frequency&amp;rdquo;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The processor base frequency is the operating point where TDP is defined.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Nowhere does it say &amp;ldquo;this is the preferred frequency&amp;rdquo; or &amp;ldquo;this is the maximum&amp;rdquo; or &amp;ldquo;this is the most efficient point&amp;rdquo;. It&amp;rsquo;s just where the processor runs at its TDP. &lt;strong&gt;The TDP is chosen by Intel!&lt;/strong&gt; The exact same CPU can be sold at two different TDPs, at two different clock rates, to two different markets (e.g. laptop and desktop).&lt;/p&gt;

&lt;p&gt;TDP is &amp;lsquo;thermal design power&amp;rsquo; &amp;ndash; typically between 5W and 150W for modern Intel chips. Importantly, though, it&amp;rsquo;s an arbitrarily chosen number. For a laptop part, TDP is chosen to be smaller &amp;ndash; say, 15W. For a desktop or server part, TDP is larger &amp;ndash; 35-135W. TDP is important for manufacturers because it dictates how big a cooling solution is needed. If they have to move a &amp;lsquo;nominal&amp;rsquo; 15W from a laptop CPU instead of 135W for a many-core server CPU, they can use a smaller and lighter cooler.&lt;/p&gt;

&lt;p&gt;Higher clock speeds and core counts require higher output power. TDP is arbitrarily selected to suit the end-user, but it doesn&amp;rsquo;t imply that the CPU is more or less capable than another. We know that our &amp;lsquo;1.6GHz&amp;rsquo; CPU can run over 2.4GHz! It&amp;rsquo;s just that &lt;strong&gt;at the TDP, this is how fast we can run in steady state&lt;/strong&gt;. The same CPU could run faster forever if you have a big enough cooler!&lt;/p&gt;

&lt;p&gt;So, &amp;lsquo;base clock&amp;rsquo; is a pointless figure now. Intel and the machine manufacturers publish it, but it&amp;rsquo;s more like &amp;ldquo;under these circumstances (workload, ambient temperature and heatsink efficiency), we can run this CPU at this clock rate indefinitely&amp;rdquo;.&lt;/p&gt;

&lt;h2 id=&#34;cooling&#34;&gt;Cooling&lt;/h2&gt;

&lt;p&gt;The computer manufacturer thus has a big impact in how fast the CPU will run, because they design the cooling system. A too-small cooling system (e.g. Macbook Air 11&amp;rdquo; or 2015 Macbook) will constrain CPU performance simply because under load, the CPU will heat up and the clock speed will need to be reduced. A too-small cooling system is great for the manufacturer (less weight and volume leads to a smaller, lighter laptop) but you&amp;rsquo;re trading off CPU performance. Cooling efficiency is never reported!&lt;/p&gt;

&lt;p&gt;For CPUs in the same series and with the same nominal TDP, there might be advantages to the faster ones. They&amp;rsquo;re &lt;em&gt;sold as&lt;/em&gt; faster for the same rated TDP, and conversely they &lt;em&gt;might&lt;/em&gt; run slightly cooler at the same clock rate. Given that the difference in clock rate is usually tiny (10%) and the price difference can be huge (hundreds of dollars) there&amp;rsquo;s rarely any point in buying the faster parts.&lt;/p&gt;

&lt;p&gt;All of this is wrapped up in the GHz figure &amp;ndash; the one the consumer looks at &amp;ndash; but it&amp;rsquo;s no guarantee that performance is actually better. A laptop with a high clock rate, high TDP CPU might perform worse than one with a lower clock rate if the cooling is inadequate.&lt;/p&gt;

&lt;h2 id=&#34;case-study-the-2016-macbook-pros&#34;&gt;Case study: the 2016 Macbook Pros&lt;/h2&gt;

&lt;p&gt;There&amp;rsquo;s an interesting comparison to be made between the 2016 Macbook Pros. The &amp;lsquo;Escape Edition&amp;rsquo; has a 2.0GHz CPU, while the &amp;lsquo;Touch Bar&amp;rsquo; model has a 2.9GHz CPU. On the outside, the machines look identical (except for the Touch Bar). Inside, the differences are tremendous. It&amp;rsquo;s a completely different design. Notably, the Escape Edition has a single CPU fan, while Touch Bar has two fans and bigger heatsinks.&lt;/p&gt;

&lt;p&gt;The Escape Edition&amp;rsquo;s CPU is an &lt;a href=&#34;https://ark.intel.com/products/91156/Intel-Core-i5-6360U-Processor-4M-Cache-up-to-3_10-GHz&#34;&gt;i5-6360U&lt;/a&gt;, while the Touch Bar&amp;rsquo;s is an &lt;a href=&#34;https://ark.intel.com/products/91166/Intel-Core-i5-6267U-Processor-4M-Cache-up-to-3_30-GHz&#34;&gt;i5-6267U&lt;/a&gt;. Other than the TDP and Base Frequency, the parts are &lt;em&gt;identical&lt;/em&gt;!&lt;/p&gt;

&lt;p&gt;At the time of writing, the &lt;a href=&#34;http://browser.primatelabs.com/mac-benchmarks&#34;&gt;Geekbench single-core benchmarks&lt;/a&gt; show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Escape Edition (2.0GHz): 3608&lt;/li&gt;
&lt;li&gt;Touch Bar (2.9GHz): 3769&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Touch Bar model has a 45% faster base clock. We&amp;rsquo;re testing a CPU-bound workload. We would expect it to get close to a 45% increase in performance. In reality, it only gets a &lt;strong&gt;4.5%&lt;/strong&gt; increase. The clock rate does not tell the full story!&lt;/p&gt;

&lt;p&gt;The Escape Edition CPU has a maximum Turbo speed of 3.1GHz, while the Touch Bar CPU has a maximum Turbo speed of 3.3GHz &amp;ndash; a 6.5% increase. This more closely explains the difference in benchmark results!&lt;/p&gt;

&lt;p&gt;Better yet, the 1.2GHz Macbook scores &lt;em&gt;3003&lt;/em&gt;. That&amp;rsquo;s 80% of the performance of the Escape Edition with 41% of the base clock rate.&lt;/p&gt;

&lt;!-- TODO do a chart --&gt;
</description>
    </item>
    
    <item>
      <title>How to enable the oplog on Ubuntu MongoDB for Meteor</title>
      <link>https://ianhowson.com/blog/oplog-on-ubuntu-mongodb-meteor/</link>
      <pubDate>Mon, 15 Aug 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/oplog-on-ubuntu-mongodb-meteor/</guid>
      <description>

&lt;p&gt;If you install MongoDB from Ubuntu 14.04 LTS, there’s a few steps that you need to take to enable the oplog for use with Meteor.&lt;/p&gt;

&lt;p&gt;To enable the oplog, we need to enable replication. We’re not actually going to replicate to any other servers.&lt;/p&gt;

&lt;h2 id=&#34;1-enable-oplog&#34;&gt;1. Enable oplog&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;/etc/mongodb.conf&lt;/code&gt;, add&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;replSet = rs0
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Restart mongodb:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;service mongodb restart
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As root, run &lt;code&gt;mongo&lt;/code&gt; to get a shell. Run:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rs.initiate({_id:&amp;quot;rs0&amp;quot;, members: [{&amp;quot;_id&amp;quot;:1, &amp;quot;host&amp;quot;:&amp;quot;127.0.0.1:27017&amp;quot;}]})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You should see something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
        &amp;quot;info&amp;quot; : &amp;quot;Config now saved locally.  Should come online in about a minute.&amp;quot;,
        &amp;quot;ok&amp;quot; : 1
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can run &lt;code&gt;rs.conf()&lt;/code&gt; and &lt;code&gt;rs.status()&lt;/code&gt; for more information.&lt;/p&gt;

&lt;h2 id=&#34;2-add-a-user-to-access-the-oplog&#34;&gt;2. Add a user to access the oplog&lt;/h2&gt;

&lt;p&gt;Switch to the admin database with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rs0:PRIMARY&amp;gt; use admin
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;then create the user:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rs0:PRIMARY&amp;gt; db.addUser({user: &amp;quot;oplogger&amp;quot;, pwd: &amp;quot;password&amp;quot;, roles: [], otherDBRoles: {local: [&amp;quot;read&amp;quot;]}})
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;3-tweak-your-meteor-config-to-use-the-oplog&#34;&gt;3. Tweak your Meteor config to use the oplog&lt;/h2&gt;

&lt;p&gt;In your Meteor environment settings, add:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;MONGO_OPLOG_URL=mongodb://oplogger:password@172.17.0.1/local?authSource=admin
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you’re using my setup with Meteor running in Docker containers on AWS machines, you need to use the host IP like so:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;MONGO_OPLOG_URL=mongodb://oplogger:password@172.17.0.1/local?authSource=admin
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that you must use the &lt;code&gt;local&lt;/code&gt; database, not whatever your application is configured for. Also note that this has security implications if you intend run separate applications on the same database server.&lt;/p&gt;

&lt;h2 id=&#34;4-restart-your-meteor-application&#34;&gt;4. Restart your Meteor application&lt;/h2&gt;

&lt;p&gt;It it comes up without errors, that’s a really good sign!&lt;/p&gt;

&lt;h2 id=&#34;5-confirm-that-the-oplog-is-being-used&#34;&gt;5. Confirm that the oplog is being used&lt;/h2&gt;

&lt;p&gt;There’s some advice at &lt;a href=&#34;https://github.com/meteor/docs/blob/version-NEXT/long-form/oplog-observe-driver.md&#34;&gt;https://github.com/meteor/docs/blob/version-NEXT/long-form/oplog-observe-driver.md&lt;/a&gt; but it’s pretty old and requires you to change your application. To be continued!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Etymotic ER4XR review</title>
      <link>https://ianhowson.com/blog/etymotic-er4xr-review/</link>
      <pubDate>Tue, 05 Jul 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/etymotic-er4xr-review/</guid>
      <description>

&lt;!-- TODO put a glamour pic of the earphones in here somewhere --&gt;

&lt;h2 id=&#34;how-do-they-sound&#34;&gt;How do they sound?&lt;/h2&gt;

&lt;p&gt;Pretty much as you&amp;rsquo;d expect ER-4&amp;rsquo;s to sound. Flat, with very well defined treble. Compared with the original ER-4P, the ER4XR has much more &amp;lsquo;present&amp;rsquo; bass. It&amp;rsquo;s not overwhelming or boomy. It&amp;rsquo;s just there. You don&amp;rsquo;t have to search for it like with the ER-P.&lt;/p&gt;

&lt;p&gt;I got the ER4XRs to replace some broken UM3xs. The ER4XR seemed to have very harsh treble initially, but this settled down after a few minutes of listening. This isn&amp;rsquo;t surprising as the UM3x has fairly muted treble. It&amp;rsquo;s probably a psychological effect.&lt;/p&gt;

&lt;p&gt;The UM3x has much more powerful bass than the ER4XR. Whether it&amp;rsquo;s better would be a matter of personal taste. I did enjoy the bass of the UM3x, but I also enjoy the treble of the ER4XR.&lt;/p&gt;

&lt;p&gt;The ER4XR bass can surprise on occasion. The bassline of Red Hot Chili Peppers&amp;rsquo; &amp;lsquo;The Getaway&amp;rsquo; gave some unexpected thrills. On the other hand, I found the drums in Tool&amp;rsquo;s Forty Six &amp;amp; 2 to be a little underwhelming; they&amp;rsquo;re pretty fantastic on the UM3x.&lt;/p&gt;

&lt;p&gt;iPhone has a &amp;lsquo;Bass Booster&amp;rsquo; EQ option. It&amp;rsquo;s a little too much, but it helps some tracks. Forty Six &amp;amp; 2 goes back to punching me right in the eardrums without losing too much midrange. On my Mac, I boost the bass with AU Lab and they respond extremely well.&lt;/p&gt;

&lt;!-- TODO: image of bass boost in AU Lab; perhaps do a whole blog post about this  --&gt;

&lt;h2 id=&#34;input-level&#34;&gt;Input level&lt;/h2&gt;

&lt;p&gt;No complaints. About &lt;sup&gt;1&lt;/sup&gt;&amp;frasl;&lt;sub&gt;3&lt;/sub&gt; on my iPhone is comfortable for regular listening. The UM3x was extremely sensitive, and this was a nuisance.&lt;/p&gt;

&lt;h2 id=&#34;isolation&#34;&gt;Isolation&lt;/h2&gt;

&lt;p&gt;I fly a lot and have noisy children, so isolation is very important to me, even more than sound quality.&lt;/p&gt;

&lt;p&gt;The ER4XR has the same great isolation that you&amp;rsquo;d expect from the ER-4, of course. It&amp;rsquo;s far superior to what you get with the UM3x. I could hold a conversation with my UM3xs inserted; that&amp;rsquo;s not possible with the ER4XR.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s also easier to get good insertion depth with the ER4XR. I found that the body of the UM3x got in the way.&lt;/p&gt;

&lt;p&gt;I use foam pads, and they fit securely on the barrel of the earphones. I feel comfortable inserting the pads right inside my ear canal. With the UM3x, pads would sometimes slip off the barrel and get lodged in my ear canal. (I eventually painted some nail polish around the barrel to thicken it, which helped a lot.)&lt;/p&gt;

&lt;h2 id=&#34;microphonics-and-cabling&#34;&gt;Microphonics and cabling&lt;/h2&gt;

&lt;p&gt;The top segment of the cable (beyond the splitter) will induce a lot of noise. If I pull the splitter tight up under my chin that eliminates most of it, but that looks silly and gets in the way.&lt;/p&gt;

&lt;p&gt;What &lt;em&gt;does&lt;/em&gt; work well for me it running the cable over my ears, behind my neck, left over my left shoulder and clipping it to my shirt. I get no microphonics and it&amp;rsquo;s out of the way.&lt;/p&gt;

&lt;p&gt;I do miss the over-the-ear cable from the UM3x. It was also possible to lie on my side with those; it&amp;rsquo;s impossible with the ER4XR.&lt;/p&gt;

&lt;p&gt;One major plus to having a straight (not over-the-ear) design is that there&amp;rsquo;s a lot less tangling of the earphones themselves.&lt;/p&gt;

&lt;p&gt;The cable is nice and long. I had an aftermarket cable for UM3x which hardly tangled at all, but the ER4XR one is pretty good. It&amp;rsquo;s not braided all the way, which helps.&lt;/p&gt;

&lt;p&gt;After about two years the cable has become damaged at the earphone strain relief points. This is a bit disappointing, especially as the cable is expensive to replace (USD50/EUR50). I was able to repair and reinforce my old cable, color-coding the left and right in the process. I tried some aftermarket MMCX cables but none work as the ER4XR&amp;rsquo;s MMCX connector is recessed into the earphones.&lt;/p&gt;

&lt;!-- TODO image of strain relief points --&gt;

&lt;h2 id=&#34;the-price&#34;&gt;The price&lt;/h2&gt;

&lt;p&gt;I paid AUD$200 for my old (used) ER-4Ps; I paid probably AUD$450 for the UM3x. The ER4XRs were AUD$539 landed due to the exchange rate and UPS international shipping.&lt;/p&gt;

&lt;p&gt;This sounds like a lot, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Historically, I get about five years out of a set of IEMs&lt;/li&gt;
&lt;li&gt;Etymotic are relatively popular and so there&amp;rsquo;s a decent market for them used. If I choose to upgrade I’ll get some cash back.&lt;/li&gt;
&lt;li&gt;Etymotic use their own pads, and they&amp;rsquo;re cheap to replace. This actually matters! Comply pads are about $5/pair, Etymotic are $1, and I figure a pair lasts a month, so that&amp;rsquo;s 60x$4=$240 saved over the life of the earphones.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;the-pads&#34;&gt;The pads&lt;/h2&gt;

&lt;p&gt;Comply pads don&amp;rsquo;t last very long. They&amp;rsquo;re quite soft and tear easily under normal use. I can wash a pair once (just soak in boiling water) but they don&amp;rsquo;t survive a second wash.&lt;/p&gt;

&lt;p&gt;Etymotic pads are a bit rougher, but once inserted there&amp;rsquo;s no comfort difference.&lt;/p&gt;

&lt;p&gt;Comply do offer multiple colours. I use the audiologist convention of blue pads for the left ear and red for the right. The ER4XR markings are difficult to see and you can&amp;rsquo;t get different colours.&lt;/p&gt;

&lt;p&gt;Apparently soaking the pads in hydrogen peroxide will remove the earwax without drying out the pads. I haven&amp;rsquo;t tried it yet.&lt;/p&gt;

&lt;h2 id=&#34;ergonomics&#34;&gt;Ergonomics&lt;/h2&gt;

&lt;p&gt;The ER4XR plug tip is narrow, so it&amp;rsquo;ll fit your iPhone while it&amp;rsquo;s in the case.&lt;/p&gt;

&lt;p&gt;The cable clip is too loose &amp;ndash; it slips off and gets lost. I wrap tape around the clip to stop it coming apart.&lt;/p&gt;

&lt;!-- TODO image of tape --&gt;

&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;These are great earphones. There&amp;rsquo;s very little that I could suggest to improve them.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Turbo Boost and MPI</title>
      <link>https://ianhowson.com/blog/turbo-boost-and-mpi/</link>
      <pubDate>Wed, 04 May 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/turbo-boost-and-mpi/</guid>
      <description>&lt;p&gt;There&amp;rsquo;s this attitude when optimising that if you&amp;rsquo;re not maxing out all of your processing resources, you&amp;rsquo;re wasting them.&lt;/p&gt;

&lt;p&gt;Utilisation is a good guideline, but it&amp;rsquo;s missing the wood for the trees. You actually want your task to run faster! Using more resources doesn&amp;rsquo;t guarantee that your job will run faster. If you have idle resources, you will &lt;em&gt;usually&lt;/em&gt; get gains by using them, but it&amp;rsquo;s not a guarantee.&lt;/p&gt;

&lt;p&gt;MPI programs are often written in the form (A):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;across all nodes:
    do the same task&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;rather than (B):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;do a task on node 0
broadcast the result to all nodes&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Form A is ridiculously wasteful of hardware resources &amp;ndash; it uses N times the CPU cycles as form B, but is often slightly faster. Why? In form B, the total time is &amp;lt;time to run your task&amp;gt; plus &amp;lt;time to broadcast&amp;gt;. In form A, under the assumption that CPU cores are independent, the time is &amp;lt;time to run your task&amp;gt;. You&amp;rsquo;ve saved &amp;lt;time to broadcast&amp;gt;, which matters if it&amp;rsquo;s a measureable percentage of &amp;lt;time to run your task&amp;gt;.&lt;/p&gt;

&lt;p&gt;Are CPU cores independent? They&amp;rsquo;re less independent now than they used to be. Traditionally, the memory bus was the primary shared resource, and form A gets pretty good cache utilisation. All cores are doing the same job and will have the same memory access patterns, so caches mostly cover up the increased memory traffic.&lt;/p&gt;

&lt;p&gt;Since 2011, Intel CPUs have supported Turbo Boost. This feature lets a single core run at higher than the nominal clock rate for a short period of time. This is motivated largely by thermal considerations. Obviously, the silicon is capable of running at a higher clock rate &amp;ndash; otherwise it wouldn&amp;rsquo;t work at all, ever. The nominal clock speed is a self-imposed restriction that reflects that the heat cannot be removed from such a small area (maybe 1x1mm?) at a high enough rate to keep the core at a safe temperature. For a multicore package running at a lower clock rate, there&amp;rsquo;s more total heat but it&amp;rsquo;s spread across the package better. The individual cores do not reach a dangerous temperature.&lt;/p&gt;

&lt;p&gt;So now, you can choose between multiple cores doing the same task &lt;em&gt;at a lower clock rate&lt;/em&gt; versus single-core task+broadcast &lt;em&gt;at a higher clock rate&lt;/em&gt;. Does this matter? It depends a lot on your hardware environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;are you using a turbo-capable CPU?&lt;/li&gt;
&lt;li&gt;are you virtualised?&lt;/li&gt;
&lt;li&gt;are there other jobs running that are using cores on your CPU?&lt;/li&gt;
&lt;li&gt;is cooling on the machine sufficient that you can keep the machine in turbo for any length of time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you&amp;rsquo;re in a cloud environment (EC2, DigitalOcean, etc) then there&amp;rsquo;s almost no chance that you can turbo a core as other people will be running on the same machine. Because you don&amp;rsquo;t have exclusive access to the CPUs, your cores might complete the same job in different amounts of time. A synchronised task will complete in the worst-case execution time, so your final time might be better if you reduce the number of cores you use.&lt;/p&gt;

&lt;p&gt;On something like CUDA, there are good reasons to run the same task on many cores even if many of them are idle. The hardware architecture rewards you for orderly memory access and you usually have a scarcity of memory bandwidth, not cores. If you can do exactly the same task across many cores (where &amp;lsquo;exactly&amp;rsquo; means &amp;lsquo;the same CPU instructions in lockstep&amp;rsquo;) then you can use all of the cores. There&amp;rsquo;s no way to fit spare tasks or other users into the spare cores like in a CPU-based environment, so if you don&amp;rsquo;t use them, they get wasted. Even different branches breaks the &amp;lsquo;exactly&amp;rsquo; requirement, so you&amp;rsquo;re usually better off wasting cycles on some cores than having the size of your thread group drop from 16 to 1.&lt;/p&gt;

&lt;p&gt;What&amp;rsquo;s the take home lesson? Test, test, test. Don&amp;rsquo;t assume. CPU cores since 2011 are less independent than they used to be, and the common practice of running identical tasks across many cores often doesn&amp;rsquo;t hold any more. Test it again.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More MPI performance optimisation</title>
      <link>https://ianhowson.com/blog/more-mpi-performance-optimisation/</link>
      <pubDate>Wed, 04 May 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/more-mpi-performance-optimisation/</guid>
      <description>&lt;p&gt;Previously, I compared the following forms commonly used for MPI programs; (A):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;across all nodes:
    do the same task&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;or (B):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;do a task on node 0
broadcast the result to all nodes&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Under the assumptions that CPUs are independent and identical, for form A, your total execution time is T, the time taken to run one instance of the task. In form B, it&amp;rsquo;s T+B, where B is the amount of time taken to broadcast the result. Since B cannot be less than zero, T &amp;lt; T+B, so form A is better.&lt;/p&gt;

&lt;p&gt;Right? Wrong! Maybe.&lt;/p&gt;

&lt;p&gt;T is not constant. Even given a node full of identical unloaded CPUs, T will vary for no particular reason. In my previous post I covered some of the reasons why CPUs are not independent and thus T is not going to be the same for all CPUs.&lt;/p&gt;

&lt;p&gt;Form A cannot complete until all CPUs have finished running the task. If T is different for different CPUs, the final execution time is the &lt;em&gt;worst-case&lt;/em&gt; execution time for all CPUs. Form A has execution time max(T(all CPUs)); form B has execution time T0+B.&lt;/p&gt;

&lt;p&gt;Note that any sensible OS will assign your form B (single-threaded) task to the CPU with the lowest load and hence indirectly attain the lowest T0.&lt;/p&gt;

&lt;p&gt;This is particularly relevant for cloud environments which are constantly oversold and so you are always sharing CPUs with someone else. If you use all of your assigned CPUs in a form A program, you&amp;rsquo;re going to experience a lot of variability in the execution time, and your final execution time is going to suffer. Using less CPUs than you&amp;rsquo;ve paid for may actually reduce overall execution time.&lt;/p&gt;

&lt;p&gt;Form A is better if time B is relatively high &amp;ndash; you have a lot of data to move around or your nodes are not sharing a memory bus.&lt;/p&gt;

&lt;p&gt;So we should use form B always? No! Test, test, test!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How I build Meteor apps</title>
      <link>https://ianhowson.com/blog/meteor/</link>
      <pubDate>Mon, 28 Mar 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/meteor/</guid>
      <description>

&lt;h2 id=&#34;policies&#34;&gt;Policies&lt;/h2&gt;

&lt;h3 id=&#34;publications-and-subscriptions&#34;&gt;Publications and subscriptions&lt;/h3&gt;

&lt;p&gt;I disagree with the official advice to put the subscription as close to use as possible. Most of the time, I put all subscriptions in the global namespace. This works perfectly for me most of the time, with a few obvious exceptions (huge and/or rapidly changing collections). Those ones get special effort to ensure that things remain performant &amp;ndash; remember &amp;ndash; premature optimisation is still the root of all evil. Global subscriptions save so much development effort and are almost always the right thing to do.&lt;/p&gt;

&lt;p&gt;I view publications more like &amp;lsquo;file permissions&amp;rsquo; than ’send this data to the client&amp;rsquo;. That is, the client can see anything that is published at any time. It&amp;rsquo;s the server&amp;rsquo;s job to make sure that that is a sensible (appropriate and safe) subset of the data, and all that the client has to worry about is how best to present it. Remember that from a security point of view, you can’t effectively control data once it’s on the client.&lt;/p&gt;

&lt;h3 id=&#34;security&#34;&gt;Security&lt;/h3&gt;

&lt;p&gt;I almost always disable client-side updates and inserts on collections, preferring to use server-side Methods (with paranoid checking) wherever possible. The development overhead of doing this is minimal.&lt;/p&gt;

&lt;p&gt;Overly permissive publications and subscriptions are by far the most common security issue. Check that your client-side caches are cleared when the user logs out. Manually query the client-side collections to ensure that only the exact data the client requires is actually present &amp;ndash; it&amp;rsquo;s easy to mess up the publication side and overpublish.&lt;/p&gt;

&lt;p&gt;Packages are a huge security risk. They&amp;rsquo;re not well audited at this stage and Meteor is small enough that many useful packages only have a small number of users.&lt;/p&gt;

&lt;p&gt;Packages sometimes publish data automatically. It&amp;rsquo;s rare that this is mentioned in the documentation. You need to check that any publications fit with your access control model and (again) do not publish any more than is necessary.&lt;/p&gt;

&lt;p&gt;Often, you&amp;rsquo;ll need to build an admin interface. Every user will receive a copy of this, and you need to think about whether it&amp;rsquo;s a risk. I usually leave admin interfaces in the same application. Occasionally it&amp;rsquo;s worth building admin code in a separate application pointing to the same MongoDB instance. This will create overheads for development. There are packages to automate this but I haven&amp;rsquo;t evaluated any yet.&lt;/p&gt;

&lt;p&gt;The official Meteor Security Guide is excellent and worth reading carefully.&lt;/p&gt;

&lt;h3 id=&#34;package-updates&#34;&gt;Package updates&lt;/h3&gt;

&lt;p&gt;Package updates and Meteor updates will cause breaking changes. You will need to retest everything.&lt;/p&gt;

&lt;p&gt;So far, there is no mechanism to tell you which updates are security-related and which are merely bug/feature fixes. I hope that this is remedied soon. Individual package authors pay practically no attention to security; you&amp;rsquo;re on your own there.&lt;/p&gt;

&lt;h3 id=&#34;schemas&#34;&gt;Schemas&lt;/h3&gt;

&lt;p&gt;I don&amp;rsquo;t bother. Most data models are simple enough that it&amp;rsquo;s not necessary.&lt;/p&gt;

&lt;h3 id=&#34;error-tracking&#34;&gt;Error tracking&lt;/h3&gt;

&lt;p&gt;I install Raven/Sentry on every app that I deploy. It&amp;rsquo;s almost no effort and it will show you amazing debug information if anything goes wrong at runtime.&lt;/p&gt;

&lt;h3 id=&#34;google-analytics&#34;&gt;Google Analytics&lt;/h3&gt;

&lt;p&gt;There&amp;rsquo;s probably something better. This is fine for now.&lt;/p&gt;

&lt;h3 id=&#34;standard-stuff&#34;&gt;Standard stuff&lt;/h3&gt;

&lt;h2 id=&#34;deployments&#34;&gt;Deployments&lt;/h2&gt;

&lt;!-- TODO: add links to your other blog posts --&gt;

&lt;p&gt;I deploy small apps to Docker instances with a shared Mongo instance on a cheap 1GB VPS (BinaryLane, because they provide great performance at a great price and are in Sydney). This fits about a dozen low-traffic apps. Anything larger gets migrated to its own instance &amp;ndash; usually to Amazon or DigitalOcean where it can programmatically scale.&lt;/p&gt;

&lt;h2 id=&#34;standard-packages&#34;&gt;Standard packages&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;raix:handlebar-helpers&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Iron Router&lt;/em&gt;. I haven&amp;rsquo;t taken the time to learn Flow Router or decide if it&amp;rsquo;s going to be an improvement.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Semantic-UI&lt;/em&gt;. I&amp;rsquo;m no designer, but with Semantic-UI, nobody can tell the difference. I used to use Bootstrap but find Semantic much easier and prettier.&lt;/p&gt;

&lt;p&gt;I use &lt;em&gt;Blaze&lt;/em&gt; simply because I haven’t learned React or Angular yet. I should probably learn React at some point.&lt;/p&gt;

&lt;h2 id=&#34;structure&#34;&gt;Structure&lt;/h2&gt;

&lt;p&gt;I put this last as it&amp;rsquo;s almost entirely personal preference and completely unimportant.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;app/
    client/
        subscriptions.js
        templates and JS go here
    lib/
        collections.js
        shared JS goes here
    server/
        publications.js
        methods.js
        other server-side JS goes here
    public/
deployment/
    deploy.yaml - Ansible playbook
    hosts - the name of my server
doc/
    documentation in Markdown format
README.md&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Connecting Meteor to Sentry</title>
      <link>https://ianhowson.com/blog/connecting-meteor-to-sentry/</link>
      <pubDate>Fri, 11 Mar 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/connecting-meteor-to-sentry/</guid>
      <description>

&lt;p&gt;Sentry answers a critical question: are my users experiencing errors while using my application? You can test all day, but your users will do different things using different software, and they’ll find bugs that you won’t.&lt;/p&gt;

&lt;p&gt;Sentry collects error log from your application and aggregates them for later resolution. And if you’re proactive, you can contact users who have problems directly!&lt;/p&gt;

&lt;p&gt;I assume that you already have a Sentry instance set up, either paid through &lt;a href=&#34;http://getsentry.com&#34;&gt;getsentry.com&lt;/a&gt;, or self-hosted (which I use).&lt;/p&gt;

&lt;h2 id=&#34;set-up-a-new-project-in-sentry&#34;&gt;Set up a new project in Sentry&lt;/h2&gt;

&lt;p&gt;Both the client and the server side will log to the same project. You probably want one project for your production deployment and one for any staging or development deployments. There’s no reason not to use it in development (even on your local machine) &amp;ndash; just keep the clutter separate from your production logs!&lt;/p&gt;

&lt;p&gt;I use ‘Other’ for the Platform, as Meteor isn’t common enough yet to have its own integration helpers. You can also use Node.js; it makes little difference.&lt;/p&gt;

&lt;p&gt;You also need to whitelist the domain that your application is running on.  This is controlled from the ‘Client Security’ section of the Settings tab. The easiest thing to do is to just allow errors to be submitted from anywhere by adding &amp;lsquo;*&amp;rsquo; to the whitelist:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#39;https://ianhowson.com/images/meteor-sentry-client.png&#39; alt=&#39;Client configuration for Meteor connecting to Sentry&#39; width=&#39;600px&#39;&gt;&lt;/p&gt;

&lt;h2 id=&#34;configure-your-meteor-project&#34;&gt;Configure your Meteor project&lt;/h2&gt;

&lt;p&gt;First, add the logging plugin:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;meteor add deepwell:raven&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Sentry uses strings called &amp;ldquo;DSNs&amp;rdquo; to identify clients that are sending it events. You need to provide these to your Meteor project.&lt;/p&gt;

&lt;p&gt;To set up the server, create &lt;code&gt;server/raven.js&lt;/code&gt; with the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;RavenLogger.initialize({
    server: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;&amp;lt;long DSN&amp;gt;&amp;#39;&lt;/span&gt;
});&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;where &lt;long DSN&gt; is the first (longer) DSN value.&lt;/p&gt;

&lt;p&gt;Similarly, to set up the client, create &lt;code&gt;client/raven.js&lt;/code&gt; with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;RavenLogger.initialize({
    client: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;&amp;lt;short DSN&amp;gt;&amp;#39;&lt;/span&gt;
});&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&amp;lt;short DSN&amp;gt; is the second (shorter) DSN value.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why the difference between client and server DSN?&lt;/p&gt;

&lt;p&gt;The execution environment for the client is totally untrusted (it’s random people on the Internet) so there’s no point in authenticating them strongly. Random people can and might push garbage into your logs. You just need to be aware of that when you analyse them.&lt;/p&gt;

&lt;p&gt;Your server is (hopefully!) trustworthy, so you can trust it with a longer authentication string, which prevents random or malicious users from pushing useless log entries.&lt;/p&gt;

&lt;p&gt;The short DSN is just like a username. The long DSN is like a username/password pair. You don’t need to give the clients the password because you don’t trust them anyway.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;using-a-settings-file&#34;&gt;Using a settings file&lt;/h2&gt;

&lt;p&gt;I strongly recommend that you use a &lt;code&gt;settings.json&lt;/code&gt; file to store the DSN keys. This lets you easily switch between production and development configurations. This looks something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;{
  &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;public&amp;#34;&lt;/span&gt; : {
    &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;ravenClientDSN&amp;#34;&lt;/span&gt;: &amp;lt;shortDSN&amp;gt;
  },
  &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;private&amp;#34;&lt;/span&gt; : {
    &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;ravenServerDSN&amp;#34;&lt;/span&gt;: &amp;lt;longDSN&amp;gt;
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then, your server init code looks like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;RavenLogger.initialize({
    server: Meteor.settings.&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;private&lt;/span&gt;.ravenServerDSN
});&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and the client:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;RavenLogger.initialize({
    client: Meteor.settings.&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;public&lt;/span&gt;.ravenClientDSN
});&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;test&#34;&gt;Test&lt;/h2&gt;

&lt;p&gt;Start your app. On the client console, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;RavenLogger.log(&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;This is a test message&amp;#39;&lt;/span&gt;);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Sentry should show your message:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#39;https://ianhowson.com/images/meteor-sentry-message.png&#39; alt=&#39;Sample client message from Meteor on Sentry&#39; width=&#39;400px&#39;&gt;&lt;/p&gt;

&lt;p&gt;Similarly, somewhere in server code (even in a new file), temporarily insert the line:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;RavenLogger.log(&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;This is a message sent from the server&amp;#34;&lt;/span&gt;);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;using-this-in-practice&#34;&gt;Using this in practice&lt;/h2&gt;

&lt;p&gt;Server-side exceptions should be caught and logged automatically. No extra work is required there.&lt;/p&gt;

&lt;p&gt;On the client, exceptions are &lt;em&gt;not&lt;/em&gt; automatically caught and logged. There’s probably an easy way to automatically wrap Meteor code, but I haven’t worked it out. Right now, you need to either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wrap relevant chunks of code to catch and manually log exceptions&lt;/li&gt;
&lt;li&gt;place &lt;code&gt;RavenLogger.log()&lt;/code&gt; statements at relevant points (e.g. at assertion failures, places where you want telemetry)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;further-reading&#34;&gt;Further reading&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://atmospherejs.com/deepwell/raven&#34;&gt;https://atmospherejs.com/deepwell/raven&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How to deploy a Meteor project on your VPS using Docker</title>
      <link>https://ianhowson.com/blog/deploy-meteor-on-vps-using-docker/</link>
      <pubDate>Tue, 19 Jan 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/deploy-meteor-on-vps-using-docker/</guid>
      <description>

&lt;p&gt;I have a lot of little Meteor projects. Here&amp;rsquo;s the exact steps that I take to deploy them on a VPS in a resource-efficient manner. A cheap VPS with 1GB of RAM costs about $10/month and can support dozens of small Meteor deployments on its own.&lt;/p&gt;

&lt;p&gt;I use a shared MongoDB instance and run the Meteor projects in Docker containers. nginx sits at the front end (port 80) and directs traffic to the appropriate Meteor project.&lt;/p&gt;

&lt;!-- insert diagram of how it&#39;s all configured - nginx at the frontend, docker containers in the middle, single mongo instance at the back --&gt;

&lt;h2 id=&#34;for-the-whole-server&#34;&gt;For the whole server&lt;/h2&gt;

&lt;p&gt;I&amp;rsquo;m starting with an Ubuntu 14.04 LTS 64-bit server running on DigitalOcean. The same should work for any recent Debian-like distribution and any VPS host (e.g. Amazon EC2). I assume that you are running as &lt;code&gt;root&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&#34;install-packages&#34;&gt;Install packages&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;aptitude update
aptitude upgrade -y
aptitude install nginx mongodb &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href=&#34;https://docs.docker.com/engine/installation/ubuntulinux/&#34;&gt;Follow the instructions to install Docker&lt;/a&gt;. If you don&amp;rsquo;t want to read all of that, paste the following into your terminal:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;echo&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;deb https://apt.dockerproject.org/repo ubuntu-trusty main&amp;#34;&lt;/span&gt; | sudo tee /etc/apt/sources.list.d/docker.list
aptitude update
aptitude install docker-engine&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&#34;download-the-relevant-docker-images&#34;&gt;Download the relevant Docker images&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker pull meteorhacks/meteord:base&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&#34;expose-mongodb-to-docker-containers&#34;&gt;Expose MongoDB to Docker containers&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;/etc/mongodb.conf&lt;/code&gt;, change:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;bind_ip = &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;127&lt;/span&gt;.0.0.1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;to&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;bind_ip = &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;127&lt;/span&gt;.0.0.1,10.0.3.1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;for-each-meteor-application-you-want-to-deploy&#34;&gt;For each Meteor application you want to deploy&lt;/h2&gt;

&lt;h3 id=&#34;add-a-database-and-user-to-mongo&#34;&gt;Add a database and user to Mongo&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;mongo&lt;/code&gt; and enter the following.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-js&#34; data-lang=&#34;js&#34;&gt;use &amp;lt;databasename&amp;gt;
db.addUser( { user: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;&amp;lt;username&amp;gt;&amp;#34;&lt;/span&gt;,
              pwd: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;&amp;lt;password&amp;gt;&amp;#34;&lt;/span&gt;,
              roles: [ &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;readWrite&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;dbAdmin&amp;#34;&lt;/span&gt; ]
            } )&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Replace &amp;lt;databasename&amp;gt;, &amp;lt;username&amp;gt; and &amp;lt;password&amp;gt; as appropriate.&lt;/p&gt;

&lt;p&gt;Note that this uses Mongo polling, not the oplog.&lt;/p&gt;

&lt;h4 id=&#34;but-the-oplog-doesn-t-polling-suck&#34;&gt;But&amp;hellip; the oplog! Doesn&amp;rsquo;t polling suck?&lt;/h4&gt;

&lt;p&gt;Yes, polling sucks. Remember that this setup is meant for small deployments and &lt;em&gt;multiple&lt;/em&gt; deployments where you have many applications sharing the same MongoDB. I haven&amp;rsquo;t figured out (yet!) how to securely share the oplog between applications &amp;ndash; you don&amp;rsquo;t want one insecure application to compromise the others. If this installation started to grow, you might notice that your machine load was high (maybe due to polling, maybe something else) and you&amp;rsquo;d be best moving to a dedicated MongoDB server with oplog access. But you&amp;rsquo;re small for now, so don&amp;rsquo;t sweat it. Premature optimisation is the root of all evil, etc.&lt;/p&gt;

&lt;h3 id=&#34;set-up-your-nginx-frontend-proxy&#34;&gt;Set up your nginx frontend proxy&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;/etc/nginx/sites-enabled/&amp;lt;sitename&amp;gt;&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-nginx&#34; data-lang=&#34;nginx&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;server&lt;/span&gt; {
        &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;server_name&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;lt;hostname&amp;gt;&lt;/span&gt;;

        &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;access_log&lt;/span&gt; on;

        &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;location&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;/&lt;/span&gt; {
                &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_pass&lt;/span&gt;         &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;http://localhost:9001&lt;/span&gt;;
                &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_redirect&lt;/span&gt;     off;

                &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;Host&lt;/span&gt;              $host;
                &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Real-IP&lt;/span&gt;         $remote_addr;
                &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-For&lt;/span&gt;   $proxy_add_x_forwarded_for;
                &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-Proto&lt;/span&gt; $scheme;
        }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Replace &amp;lt;sitename&amp;gt; with a shortname for your project (e.g. todolist) and &amp;lt;hostname&amp;gt; with the URL that you want to access your project at (e.g. todolist.example.com).&lt;/p&gt;

&lt;h2 id=&#34;every-time-you-update-a-new-version-of-the-application-and-the-first-time&#34;&gt;Every time you update a new version of the application (and the first time)&lt;/h2&gt;

&lt;h3 id=&#34;build-a-meteor-bundle&#34;&gt;Build a Meteor bundle&lt;/h3&gt;

&lt;p&gt;Within the Meteor project directory:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;meteor build --architecture=os.linux.x86_64 ./&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create a new .tar.gz file in your project directory.&lt;/p&gt;

&lt;p&gt;(The minification process is a little stricter than the standard &lt;code&gt;meteor run&lt;/code&gt;, so you might run into new syntax errors and the like.)&lt;/p&gt;

&lt;h3 id=&#34;copy-it-to-the-docker-host&#34;&gt;Copy it to the Docker host&lt;/h3&gt;

&lt;p&gt;You need a directory to store the bundles; all of the .tar.gz files in the directory will be decompressed by the Docker image.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;rsync --inplace -vP &amp;lt;bundlename&amp;gt;.tar.gz root@vpshost.example.com:/opt/whatever/whatever.tar.gz&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each bundle either needs to go into a unique directory, or you need to clear out old ones when you upload new ones.&lt;/p&gt;

&lt;!-- TODO: think about having fallback if the new container won&#39;t start; want to be able to have multiple bundles available --&gt;

&lt;!-- TODO: automatically insert a timestamp using bash script, use rsync for the copy --&gt;

&lt;!--
### Update settings.json on the server

If you&#39;re using a settings file, you need to copy this separately to your bundle.

    scp settings.json root@vpshost.example.com:/opt/whatever/
    --&gt;

&lt;h3 id=&#34;shut-down-the-old-container-if-necessary&#34;&gt;Shut down the old container (if necessary)&lt;/h3&gt;

&lt;!-- TODO --&gt;

&lt;h3 id=&#34;run-the-new-container&#34;&gt;Run the new container&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker run -d &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;    -e ROOT_URL=http://&amp;lt;app-url&amp;gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;    -e MONGO_URL=mongodb://&amp;lt;user&amp;gt;:&amp;lt;password&amp;gt;@10.0.3.1/&amp;lt;database-name&amp;gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;    -v /&amp;lt;bundle-dir&amp;gt;:/bundle &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;    -p &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;9001&lt;/span&gt;:80 &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;    meteorhacks/meteord:base&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that 10.0.3.1 is the default IP address of the host where we are running the MongoDB server.&lt;/p&gt;

&lt;!-- * TODO cover how to include the settings file --&gt;

&lt;!--
verify that two different docker/meteor apps can share a database name - use the same collection name on both - or confirm with robomongo. best if the two databases are isolated - the usernames should only be able to access their own database and not the others


where you replace:

    &lt;app-url&gt;: 
    &lt;database-name&gt;: something unique to the project
    &lt;user&gt;: MongoDB username
    &lt;password&gt;: MongoDB password



*** what about settings.json?

    # docker pull mongo



___

here&#39;s a Makefile that automates a lot of these steps



___

you need to set up a replica set to make the oplog available; then you can use the more efficeint oplog tailing in meteor

add this to the docker run command: 
    -e MONGO_OPLOG_URL=mongodb://&lt;user&gt;:&lt;password&gt;@10.0.3.1/local?authSource=winegame \

set up the user (? not sure if this is correct)
    db.addUser({user: &#34;winegame&#34;, pwd: &#34;k0qKjdq1&#34;, roles: [], otherDBRoles: {local: [&#34;read&#34;]}})


oplog url should be
mongourl/local?authSource=&lt;database-name&gt;

*** TODO: write an article on how to set up replica set with oplog).

--&gt;
</description>
    </item>
    
    <item>
      <title>Why Apple would make a phone with no headphone jack</title>
      <link>https://ianhowson.com/blog/2016-01-18-why-apple-would-make-a-phone-with-no-headphone-jack/</link>
      <pubDate>Mon, 18 Jan 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/2016-01-18-why-apple-would-make-a-phone-with-no-headphone-jack/</guid>
      <description>&lt;p&gt;TL;DR: Apple&amp;rsquo;s BLE audio streaming protocol cannot be easily copied and allows them to make a thinner phone than its competitors without fear of copycats.&lt;/p&gt;

&lt;p&gt;In order for Apple&amp;rsquo;s phones to remain competitive and continue demanding a premium price, they need to be differentiated against Android phones. One of Apple&amp;rsquo;s strategies for this is to integrate technologies which the Android manufacturers can&amp;rsquo;t. A prime example is their purchase of Authentec to produce Touch ID sensors - Authentec&amp;rsquo;s sensors are the best on the market, and now no other phone manufacturer can use them. They can&amp;rsquo;t redevelop the technology thanks to high technical difficulty and extensive patent protection.&lt;/p&gt;

&lt;p&gt;Apple has Bluetooth Low Energy audio streaming technology. BLE doesn&amp;rsquo;t normally support audio streaming; there&amp;rsquo;s no standard protocol, and conventional Bluetooth audio streaming has a lot of problems (high energy consumption, inconsistent codec support, poor audio quality, difficult pairing). So an Apple-controlled BLE audio streaming protocol is important: they can solve all of the above problems. Apple has enough marketshare - and owns Beats, a headphone manufacturer - that it can make such a protocol commonplace. Apple thus has a high quality product that other manufacturers cannot easily duplicate.&lt;/p&gt;

&lt;p&gt;So a headphone-jack-free phone is technically feasible. Without the headphone jack, Apple can make the phone even thinner than before. The diameter of the headphone jack dictates the thickness of the iPhone 6 and 6S. Because no other manufacturer has good audio streaming tech, they can&amp;rsquo;t remove the headphone jack from their phones, and the jack continues to dictate the thickness of their phones. iPhone 7 can be thinner than the competitors - and Apple establishes another difficult-to-duplicate point of differentiation.&lt;/p&gt;

&lt;p&gt;Predictions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Beats headphones will have a built-in receiver and will not look different at all&lt;/li&gt;
&lt;li&gt;Apple will sell chips to device manufacturers who want to make wireless headphones under MFi, like they do with Lightning ports. They will not open the protocol so as to lock out non-Apple devices.&lt;/li&gt;
&lt;li&gt;iPhone 7 will ship with an external headphone dongle that connects between the phone&amp;rsquo;s Lightning port and provides a 3.5mm jack&lt;/li&gt;
&lt;li&gt;Apple (or an MFi licencee) will sell an external headphone adaptor that connects to the iPhone over BLE. It will have a small internal battery, achieve 12 hours of listening time, and charge using a built-in Lighting port that plugs directly into the bottom of the iPhone (much like the Apple Pen does with the iPad Pro). It will have a small lapel clip to attach to clothing.&lt;/li&gt;
&lt;li&gt;There will be an iPhone 7 with no headphone jack - consider it an iPhone Air. The iPhone 7 Plus &lt;em&gt;might&lt;/em&gt; have a headphone jack. The cheap iPhone (iPhone 5S, maybe iPhone 6) will still have a headphone jack.&lt;/li&gt;
&lt;li&gt;Android manufacturers will come up with a low-quality alternative protocol - but strict adherence to the BLE spec will restrict audio quality to very low levels. Latency will be poor.&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>How to fix your Skull Shaver Bald Eagle if it turns itself off</title>
      <link>https://ianhowson.com/blog/skull-shaver/</link>
      <pubDate>Mon, 18 Jan 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/skull-shaver/</guid>
      <description>

&lt;p&gt;I used my Bald Eagle successfully three times. On the fourth, I was travelling. I got halfway through cutting my hair and it turned itself off. I pressed the button, it ran for a few seconds and turned itself off again. I charged it, I rinsed it, I soaked it, but it still would not stay on for more than a few seconds. I looked around for a hat so I could go and buy a disposable razor.&lt;/p&gt;

&lt;p&gt;Skull Shaver customer service didn&amp;rsquo;t respond to my email. I saw a lot of similar complaints on Amazon. I started to worry a bit &amp;ndash; being in Australia, I had little hope of getting a warranty replacement, and shipping was expensive and slow.&lt;/p&gt;

&lt;p&gt;The motor unit seemed to run alright &amp;ndash; was there something wrong with the heads? I pulled them apart, and surprise, surprise:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/skullshaver-hair.jpeg&#34; alt=&#34;Head component of Bald Eagle full of hair&#34; /&gt; &lt;img src=&#34;https://ianhowson.com/images/skullshaver-hair-top.jpeg&#34; alt=&#34;Top of Bald Eagle heads full of hair&#34; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HAIR&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So much hair.&lt;/p&gt;

&lt;p&gt;I cleaned it out and haven&amp;rsquo;t had a problem since. Here&amp;rsquo;s how you can do the same.&lt;/p&gt;

&lt;h3 id=&#34;step-1-detach-the-heads&#34;&gt;Step 1: Detach the heads.&lt;/h3&gt;

&lt;p&gt;Grasp the heads and pull directly away from the battery.&lt;/p&gt;

&lt;h3 id=&#34;step-2-open-up-one-of-the-heads&#34;&gt;Step 2: Open up one of the heads&lt;/h3&gt;

&lt;p&gt;There&amp;rsquo;s a little notch between the silver and black part of the head. I stick a thumbnail in there and twist to separate the two halves.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/skullshaver-gap.jpeg&#34; alt=&#34;Gap between Bald Eagle head components&#34; /&gt; &lt;img src=&#34;https://ianhowson.com/images/skullshaver-gap-thumb.jpeg&#34; alt=&#34;Where to put your thumbnail to separate halves of Bald Eagle heads&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;step-3-clean-out-the-hair&#34;&gt;Step 3: Clean out the hair&lt;/h3&gt;

&lt;p&gt;I used the little nylon brush that came with the unit.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/skullshaver-clean-head.jpeg&#34; alt=&#34;Clean Bald Eagle head bottom&#34; /&gt; &lt;img src=&#34;https://ianhowson.com/images/skullshaver-clean-top.jpeg&#34; alt=&#34;Clean Bald Eagle head top&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;step-4-reassemble-the-head&#34;&gt;Step 4: Reassemble the head&lt;/h3&gt;

&lt;p&gt;The two halves just click together.&lt;/p&gt;

&lt;h3 id=&#34;step-5-clean-the-other-three-heads&#34;&gt;Step 5: Clean the other three heads&lt;/h3&gt;

&lt;h3 id=&#34;step-6-clean-the-centre-head&#34;&gt;Step 6: Clean the centre head&lt;/h3&gt;

&lt;p&gt;It just twists off. There are some little arrows that show the direction. Mine is usually pretty clean, though.&lt;/p&gt;

&lt;h3 id=&#34;step-7-wash-the-hair-down-the-sink-before-your-wife-sees-it&#34;&gt;Step 7: Wash the hair down the sink before your wife sees it&lt;/h3&gt;

&lt;p&gt;This is what was in my shaver heads. Yes, I was doing the recommended &amp;ldquo;immerse in water and turn it on&amp;rdquo; routine.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/skullshaver-sink-hair.jpeg&#34; alt=&#34;Sink full of hair from my &#39;clean&#39; Bald Eagle&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;so-why-does-this-happen-i-clean-mine-thoroughly-after-every-shave&#34;&gt;So why does this happen? I clean mine thoroughly after every shave!&lt;/h2&gt;

&lt;p&gt;In my case, I think there were two causes. One, I only shave my head once a week &amp;ndash; so it&amp;rsquo;s longer than the recommended 0.4mm. The hair didn&amp;rsquo;t seem to escape the heads during the cleaning routine.&lt;/p&gt;

&lt;p&gt;Two, the hair that came out was clumped together and thick with grease. I use sunscreen a lot. It didn&amp;rsquo;t occur to me that that would get inside the shaver and gum it up.&lt;/p&gt;

&lt;p&gt;I thought initially that the grease was part of the head mechanism, intended to lubricate it, but given that it&amp;rsquo;s all stainless steel and nylon, it probably doesn&amp;rsquo;t need lubrication.&lt;/p&gt;

&lt;p&gt;So, the gears got gummed up. The load on the motor and the batteries increased. The lithium batteries probably have a polyswitch (resettable fuse) for safety &amp;ndash; you don&amp;rsquo;t want them catching fire if there&amp;rsquo;s a short circuit. The polyswitch tripped, the motor stopped. It cooled, reset, and I press the power button again. The polyswitch trips again. Repeat.&lt;/p&gt;

&lt;h2 id=&#34;epilogue&#34;&gt;Epilogue&lt;/h2&gt;

&lt;p&gt;For me, regular cleaning helped, but did not solve the problem. Even with a completely clean head, I wasn&amp;rsquo;t able to finish a shave.&lt;/p&gt;

&lt;p&gt;I pulled mine apart and figured out how to add a mechanical switch that would bypass the electronics. I ran out of time and never got any love from customer service, so I wound up throwing it in the bin. Man, that was expensive.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m using Mach 3&amp;rsquo;s and am very happy.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Persistent computer hardware myths</title>
      <link>https://ianhowson.com/blog/persistent-computer-hardware-myths/</link>
      <pubDate>Sun, 17 Jan 2016 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/persistent-computer-hardware-myths/</guid>
      <description>

&lt;h2 id=&#34;1-don-t-buy-ssds-because-they-ll-wear-out-if-you-write-too-much&#34;&gt;1. Don’t buy SSDs because they’ll wear out if you write too much&lt;/h2&gt;

&lt;p&gt;OK, yeah. They&amp;rsquo;ll wear out. But you&amp;rsquo;ll probably throw them out first.&lt;/p&gt;

&lt;p&gt;SSDs have a finite write lifespan, sure. But unless you&amp;rsquo;re running a database server with a heavy write workload, it&amp;rsquo;s not going to wear out for at least 10 years.&lt;/p&gt;

&lt;p&gt;All SSDs track their lifespan. They are perfectly capable of warning you when they&amp;rsquo;re wearing out through the SMART system. They won&amp;rsquo;t just forget everything; they can degrade gracefully.&lt;/p&gt;

&lt;p&gt;Sudden death is definitely a common way for SSDs fail &amp;ndash; but this is true of spinning disks as well. The cause is not the FLASH media wearing out; it’s the usual problems with any electronic device failing suddenly. Spinning disks certainly fail suddenly and for a litany of mechanical reasons that SSDs aren’t vulnerable to.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;ve got 10-year-old SSDs (which have been running for all 10 years) which still show &amp;gt; 90% lifetime remaining.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s just not an issue.&lt;/p&gt;

&lt;p&gt;The worst instance of this that I&amp;rsquo;ve heard is &amp;ldquo;don&amp;rsquo;t buy an SSD because it will wear out. Buy a regular hard drive instead.&amp;rdquo;. Talk about throwing the baby out with the bathwater. You&amp;rsquo;re going to have a slow computer just so it won&amp;rsquo;t wear out in what, 10 years? Just buy another one!&lt;/p&gt;

&lt;p&gt;(A spinning disk won&amp;rsquo;t last that long, anyway.)&lt;/p&gt;

&lt;h2 id=&#34;2-disk-encryption-is-slow&#34;&gt;2. Disk encryption is slow&lt;/h2&gt;

&lt;p&gt;This wasn&amp;rsquo;t true when encryption was done in software, and now that it&amp;rsquo;s 100% hardware accelerated (on both phones and regular computers), it&amp;rsquo;s a complete non-issue. Hard drives (even SSDs) are really slow compared with CPUs. The bottleneck is the drive, not the encryption software.&lt;/p&gt;

&lt;p&gt;Practically everything written to an iPhone is encrypted &amp;ndash; even the really old ones — and you don&amp;rsquo;t see complaints about slow writes on &lt;em&gt;them&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id=&#34;3-faster-cpus-are-worth-paying-for&#34;&gt;3. Faster CPUs are worth paying for&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;At best&lt;/em&gt;, you&amp;rsquo;ll get 10% performance increase from a 10% clock speed bump. But computer performance is a &amp;lsquo;weakest link&amp;rsquo; type of affair; it&amp;rsquo;s no good having a 100GHz CPU if you&amp;rsquo;re bottlenecking on memory or I/O. This is why I&amp;rsquo;m so gung-ho about buying SSDs; a spinning hard drive is practically always the bottleneck these days.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;re paying, what, $300 for a 5% clock speed increase, which won&amp;rsquo;t amount to anything in reality? Buy an SSD first and more RAM second. Once you&amp;rsquo;ve got a Samsung 950 SSD, at least 16GB of RAM and a massive GPU, then &lt;em&gt;maybe&lt;/em&gt; consider a CPU bump. But probably just save your money for the next revision of hardware in a year.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Design</title>
      <link>https://ianhowson.com/bulkem/design/</link>
      <pubDate>Thu, 18 Jun 2015 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/bulkem/design/</guid>
      <description>

&lt;h2 id=&#34;the-em-algorithm-for-mixtures-of-inverse-gaussian-distributions&#34;&gt;The EM algorithm for mixtures of inverse Gaussian distributions&lt;/h2&gt;

&lt;p&gt;Literature review did not show any existing implementations of the EM algorithm for inverse Gaussian mixture models. Therefore, we must derive one from scratch.&lt;/p&gt;

&lt;p&gt;As a model, the method given in &lt;a href=&#34;#Bilmes1998&#34;&gt;Bilmes (1998, p. 3-7)&lt;/a&gt; is used. It shows the derivation of the EM equations for Normal mixture models.&lt;/p&gt;

&lt;p&gt;The following variables are used:&lt;/p&gt;

&lt;p&gt;\( \Theta \): the set of parameters estimates for the mixture model. \( \Theta^{g} \) refers to the &amp;lsquo;guessed&amp;rsquo; parameter set.&lt;/p&gt;

&lt;p&gt;\( \lambda_{\ell} \): the shape parameter for the \( \ell \)th mixture component&lt;/p&gt;

&lt;p&gt;\( \mu_{\ell} \): the mean parameter for the \( \ell \)th mixture component&lt;/p&gt;

&lt;p&gt;\( \alpha_{\ell} \): the mixture weight parameter for the \( \ell \)th mixture component&lt;/p&gt;

&lt;h3 id=&#34;e-step&#34;&gt;E-step&lt;/h3&gt;

&lt;p&gt;From &lt;a href=&#34;#Bilmes1998&#34;&gt;Bilmes (1998, p. 2)&lt;/a&gt;, we define:&lt;/p&gt;

&lt;p&gt;$$
Q(\Theta,\Theta^{(i-1)})=E\left[\log p(X,Y|\Theta)|X,\Theta^{(i-1)}\right]
$$&lt;/p&gt;

&lt;p&gt;For the inverse Gaussian distribution, we have the following expression for the conditional probability of mixture component \( \ell \) given parameter guesses \( \lambda_{\ell} \) and \( \mu_{\ell} \) &lt;a href=&#34;#Wikipedia2015a&#34;&gt;(Wikipedia, 2015a)&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;$$
p_{\ell}(x|\lambda_{\ell},\mu_{\ell})=\left[\frac{\lambda_{\ell}}{2\pi x^{3}}\right]^{1/2}\exp\frac{-\lambda_{\ell}(x-\mu_{\ell})^{2}}{2\mu^{2}x}\label{eq:probfunc}\tag{1}
$$&lt;/p&gt;

&lt;p&gt;We denote proportion of mixing components by \( \alpha_{\ell} \) in order to reduce confusion with the constant \( \pi \). We have the constraint that&lt;/p&gt;

&lt;p&gt;$$
\sum_{\ell=1}^{M}\alpha_{\ell}=1
$$&lt;/p&gt;

&lt;p&gt;The \( Q \) function is given by &lt;a href=&#34;#Bilmes1998&#34;&gt;Bilmes (1998, p. 4)&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;$$
Q(\Theta,\Theta^{g})=\sum_{\ell=1}^{M}\sum_{i=1}^{N}\log(\alpha_{\ell})p(\ell|x_{i},\Theta^{g})+\sum_{\ell=1}^{M}\sum_{i=1}^{N}\log(p_{\ell}(x_{i}|\theta_{\ell}))p(\ell|x_{i},\Theta^{g})\label{eq:bilmes-eqn-5}\tag{2}
$$&lt;/p&gt;

&lt;h3 id=&#34;m-step&#34;&gt;M-step&lt;/h3&gt;

&lt;p&gt;On each iteration, we need to improve the parameter estimates based on the previous estimates &lt;a href=&#34;#Bilmes1998&#34;&gt;(Bilmes, 1998, p. 2)&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;$$
\Theta^{(i)}=argmax_{\Theta}Q(\Theta,\Theta^{(i-1)})
$$&lt;/p&gt;

&lt;p&gt;The left-hand term of \( \eqref{eq:bilmes-eqn-5} \) is independent of the inverse Gaussian parameters and the right-hand term is independent of \( \alpha \). Therefore, we can reuse the result from &lt;a href=&#34;#Bilmes1998&#34;&gt;Bilmes (1998, p. 5)&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;$$
\alpha_{\ell}^{new}=\frac{1}{N}\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})
$$&lt;/p&gt;

&lt;p&gt;We now need to maximise the following expression with respect to each of \( \lambda_{\ell} \) and \( \mu_{\ell} \)&lt;/p&gt;

&lt;p&gt;$$
\sum_{\ell=1}^{M}\sum_{i=1}^{N}\log(p_{\ell}(x_{i}|\theta_{\ell}))p(\ell|x_{i},\Theta^{g})\label{eq:sumsum_logp}\tag{3}
$$&lt;/p&gt;

&lt;p&gt;Taking the log of \( \eqref{eq:probfunc} \), we get:&lt;/p&gt;

&lt;p&gt;$$
\log p_{\ell}(x|\lambda_{\ell},\mu_{\ell})  =   \log\left(\left[\frac{\lambda_{\ell}}{2\pi x^{3}}\right]^{1/2}\exp\frac{-\lambda_{\ell}(x-\mu_{\ell})^{2}}{2\mu^{2}x}\right)
$$&lt;/p&gt;

&lt;p&gt;$$
    =   \frac{1}{2}\log\left(\frac{\lambda_{\ell}}{2\pi x^{3}}\right)-\frac{\lambda_{\ell}(x-\mu_{\ell})^{2}}{2\mu^{2}x}
$$&lt;/p&gt;

&lt;p&gt;$$
    =   \frac{1}{2}\log(\lambda_{\ell})-\frac{1}{2}\log(2\pi x^{3})-\frac{\lambda_{\ell}(x-\mu_{\ell})^{2}}{2\mu^{2}x}
$$&lt;/p&gt;

&lt;p&gt;Substituting this into \( \eqref{eq:sumsum_logp} \), we get:&lt;/p&gt;

&lt;p&gt;$$
\sum_{\ell=1}^{M}\sum_{i=1}^{N}(\frac{1}{2}\log(\lambda_{\ell})-\frac{1}{2}\log(2\pi x_{i}^{3})-\frac{\lambda_{\ell}(x_{i}-\mu_{\ell})^{2}}{2\mu^{2}x_{i}})p(\ell|x_{i},\Theta^{g})\label{eq:func-to-diff}\tag{4}
$$&lt;/p&gt;

&lt;p&gt;We wish to maximise \( \eqref{eq:func-to-diff} \) for \( \mu_{\ell} \), so we take its partial derivative with respect to \( \mu_{\ell} \) and set it equal to 0. Note that the first two terms inside the summations are independent of \( \mu_{\ell} \), so we can ignore them for the purpose of maximisation; we seek&lt;/p&gt;

&lt;p&gt;$$
        \frac{\partial}{\partial\mu_{\ell}}\sum_{i=1}^{N}\left[-\frac{\lambda_{\ell}(x_{i}-\mu_{\ell})^{2}}{2\mu^{2}x_{i}}\right]p(\ell|x_{i},\Theta^{g})
$$&lt;/p&gt;

&lt;p&gt;$$
    =   -\lambda\frac{\partial}{\partial\mu_{\ell}}\sum_{i=1}^{N}\left[\frac{(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})}{2\mu^{2}x_{i}}\right]\label{eq:partial}\tag{5}
$$&lt;/p&gt;

&lt;p&gt;Concentrating just on the inner term of the summation, we need&lt;/p&gt;

&lt;p&gt;$$
\frac{\partial}{\partial\mu_{\ell}}\frac{(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})}{2\mu^{2}x_{i}}
$$&lt;/p&gt;

&lt;p&gt;Apply the quotient rule:&lt;/p&gt;

&lt;p&gt;$$
    =   \frac{2\mu_{\ell}^{2}x_{i}\left(-2x_{i}p(\ell|x_{i},\Theta^{g})+2\mu_{\ell}p(\ell|x_{i},\Theta^{g})\right)-4(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})\mu_{\ell}x_{i}}{4\mu_{\ell}^{4}x^{2}}
$$&lt;/p&gt;

&lt;p&gt;$$
    =   \frac{p(\ell|x_{i},\Theta^{g})(\mu_{\ell}-x_{i})}{\mu_{\ell}^{3}}
$$&lt;/p&gt;

&lt;p&gt;Substituting this back into \( \eqref{eq:partial} \) and setting it to zero, we get:&lt;/p&gt;

&lt;p&gt;$$
-\lambda\sum_{i=1}^{N}\frac{p(\ell|x_{i},\Theta^{g})(\mu_{\ell}-x_{i})}{\mu_{\ell}^{3}} =   0
$$&lt;/p&gt;

&lt;p&gt;$$
\frac{1}{\mu_{\ell}^{3}}\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})(\mu_{\ell}-x_{i})    =   0
$$&lt;/p&gt;

&lt;p&gt;This is always defined as \( \mu&amp;gt;0 \) for the inverse Gaussian distribution.&lt;/p&gt;

&lt;p&gt;$$
\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})(\mu_{\ell}-x_{i})    =   0
$$&lt;/p&gt;

&lt;p&gt;$$
\mu_{\ell}\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})-\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})x_{i}    =   0
$$&lt;/p&gt;

&lt;p&gt;$$
\mu_{\ell}  =   \frac{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})x_{i}}{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})}
$$&lt;/p&gt;

&lt;p&gt;This is identical to the \( \mu_{\ell} \) maximiser for the Normal distribution &lt;a href=&#34;#Bilmes1998&#34;&gt;(Bilmes, 1998, p. 7)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the same way, we maximise \( \eqref{eq:func-to-diff} \) for \( \lambda_{\ell} \):&lt;/p&gt;

&lt;p&gt;$$
\frac{\partial}{\partial\lambda_{\ell}}\sum_{i=1}^{N}(\frac{1}{2}\log(\lambda_{\ell})-\frac{\lambda_{\ell}(x_{i}-\mu_{\ell})^{2}}{2\mu_{\ell}^{2}x_{i}})p(\ell|x_{i},\Theta^{g})    =   0
$$&lt;/p&gt;

&lt;p&gt;$$
\frac{\partial}{\partial\lambda_{\ell}}\sum_{i=1}^{N}(\frac{1}{2}\log(\lambda_{\ell})p(\ell|x_{i},\Theta^{g})-\frac{\partial}{\partial\lambda_{\ell}}\sum_{i=1}^{N}\frac{\lambda_{\ell}(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})}{2\mu_{\ell}^{2}x_{i}}    =   0
$$&lt;/p&gt;

&lt;p&gt;$$
\sum_{i=1}^{N}\frac{p(\ell|x_{i},\Theta^{g})}{2\lambda_{\ell}}-\sum_{i=1}^{N}\frac{(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})}{2\mu_{\ell}^{2}x_{i}}    =   0
$$&lt;/p&gt;

&lt;p&gt;$$
\sum_{i=1}^{N}\frac{p(\ell|x_{i},\Theta^{g})}{\lambda_{\ell}}   =   \sum_{i=1}^{N}\frac{(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})}{\mu_{\ell}^{2}x_{i}}
$$&lt;/p&gt;

&lt;p&gt;$$
\lambda_{\ell}  =   \frac{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})}{\sum_{i=1}^{N}\frac{(x_{i}-\mu_{\ell})^{2}p(\ell|x_{i},\Theta^{g})}{\mu_{\ell}^{2}x_{i}}}
$$&lt;/p&gt;

&lt;p&gt;In summary, the update equations are:&lt;/p&gt;

&lt;p&gt;$$
\alpha_{\ell}^{new} =   \frac{1}{N}\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})
$$&lt;/p&gt;

&lt;p&gt;$$
\mu_{\ell}^{new}    =   \frac{\sum_{i=1}^{N}x_{i}p(\ell|x_{i},\Theta^{g})}{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})}
$$&lt;/p&gt;

&lt;p&gt;$$
\lambda_{\ell}^{new}    =   \frac{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})}{\sum_{i=1}^{N}\frac{(x_{i}-\mu_{\ell}^{old})^{2}p(\ell|x_{i},\Theta^{g})}{(\mu_{\ell}^{old})^{2}x_{i}}}
$$&lt;/p&gt;

&lt;p&gt;We have converged when the likelihood decreases by less than \( \epsilon \), where the likelihood is given by:&lt;/p&gt;

&lt;p&gt;$$
L(\theta;\boldsymbol{x},\boldsymbol{z})=\sum_{n=1}^{N}\log\left(\sum_{m=1}^{M}\alpha_{m}p_{m}(x)\right)
$$&lt;/p&gt;

&lt;h2 id=&#34;initialisation-of-the-em-algorithm&#34;&gt;Initialisation of the EM algorithm&lt;/h2&gt;

&lt;p&gt;Originally, we attempted to set initialisation parameters the same on every run (i.e. \( \mu_{1}=0.99 \), \( \mu_{2}=1.01 \), \( \lambda_{1}=1 \), \( \lambda_{2}=1 \), \( \alpha_{1}=0.5 \), \( \alpha_{2}=0.5 \)). This yielded poor quality fits on many datasets. For example, on the BMI dataset included with the &lt;code&gt;mixsmsn&lt;/code&gt; package &lt;a href=&#34;#Prates2013&#34;&gt;(Prates et al., 2013)&lt;/a&gt;, the following model was generated:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/bmi-bad.png&#34; width=&#34;359&#34;&gt;&lt;/p&gt;

&lt;p&gt;This is clearly unsatisfactory as it does not adequately model the bimodal nature of the data.&lt;/p&gt;

&lt;p&gt;Additional research was conducted to find effective ways to initialise the EM algorithm. Many papers suggested clustering approaches such as k-means, but they require well-separated data.&lt;/p&gt;

&lt;p&gt;We elected to use a random initialisation strategy to improve the fit, roughly following the &amp;lsquo;alternative method&amp;rsquo; from &lt;a href=&#34;#McLachlan2000&#34;&gt;McLachlan et al. (2000, p. 55)&lt;/a&gt; or the &amp;lsquo;subset approach&amp;rsquo; algorithm described in &lt;a href=&#34;#Schepers2015&#34;&gt;Schepers (2015)&lt;/a&gt;. We chose this method as it is easy to implement and produces higher-quality fits than other methods at the expense of computation time &lt;a href=&#34;#Schepers2015&#34;&gt;Schepers (2015, p. 142)&lt;/a&gt;. The algorithm proceeds as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;for each random start (e.g. 100 times):
    for each mixture component (e.g. two):
        sample p+1 observations from the dataset
        use the ML equations to estimate distribution parameters for those observations
        use those parameters as initial values for the component
    EM fit using those initial parameters and record the log-likelihood of the solution
choose the fit that achieves the highest log-likelihood&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;\( p \) is the number of parameters that characterise the distribution (2 for the inverse Gaussian distribution). We always set the mixing weight (\( \alpha \)) components to have equal weighting (i.e. 0.5 for a two-component mixture) as we have no information on the true weighting of the components.&lt;/p&gt;

&lt;p&gt;Using this algorithm on the BMI dataset, we achieve the following fit:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/bmi-good.png&#34; width=&#34;368&#34;&gt;&lt;/p&gt;

&lt;p&gt;The new model better reflects the bimodal structure of the data.&lt;/p&gt;

&lt;h2 id=&#34;system-design&#34;&gt;System design&lt;/h2&gt;

&lt;p&gt;Traditionally, CUDA is used to make a single large task run very quickly. Each observation in the dataset is assigned to a separate CUDA thread; this is termed data parallelism &lt;a href=&#34;#Wikipedia2014&#34;&gt;(Wikipedia, 2014)&lt;/a&gt;. Larger CUDA hardware can process more observations simultaneously. &lt;a href=&#34;#Woolley2013&#34;&gt;Woolley (2013)&lt;/a&gt; states that “To get good performance &amp;hellip; You want to have 14K or more threads running concurrently.” To maximise performance, we must make each step run as quickly as possible, even if it is wasteful of machine resources. Most tasks proceed as follows:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/data-parallel.png&#34; width=&#34;250&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bulkem&lt;/code&gt; is intended to work efficiently on relatively small datasets with less than 14,000 observations. It expects to see a large number of datasets (thousands) and/or random starts. This suggests that we will need to have multiple tasks running simultaneously on the GPU in order to achieve good performance; task parallelism &lt;a href=&#34;#Wikipedia2015c&#34;&gt;(Wikipedia, 2015c)&lt;/a&gt; is more appropriate. This is a relatively uncommon usage of CUDA hardware; &lt;a href=&#34;#Tzeng2012&#34;&gt;Tzeng et al. (2012)&lt;/a&gt; has a brief overview.&lt;/p&gt;

&lt;p&gt;Recent CUDA hardware supports a feature called “streams” &lt;a href=&#34;#Rennich2011&#34;&gt;(Rennich, 2011)&lt;/a&gt; which allows the GPU to perform a number of tasks simultaneously. Recent hardware can simultaneously execute up to 16 CUDA kernels while copying data back and forth between host RAM. We can use streams to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overlap memory copies to and from the GPU&lt;/li&gt;
&lt;li&gt;Execute multiple kernels simultaneously. As our datasets are relatively small, this ensures that the GPU is not sitting idle.&lt;/li&gt;
&lt;li&gt;Use extra CPUs to queue more work for the GPU to perform, again ensuring that the GPU is kept busy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Assuming sufficient GPU resources are available, the execution flow might look like:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/task-parallel.png&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;The high-level strategy for bulkem&amp;rsquo;s CUDA path is therefore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieve the list of datasets from R&lt;/li&gt;
&lt;li&gt;Assign each dataset to a different CPU thread. Each CPU thread has an associated CUDA stream.&lt;/li&gt;
&lt;li&gt;Each thread generates a number of initial parameters for the dataset. It uses the GPU to execute the EM algorithm for each set of initial parameters.&lt;/li&gt;
&lt;li&gt;The best fit is stored in a list&lt;/li&gt;
&lt;li&gt;When all threads have finished (i.e. all datasets have been fit) the list is transferred back to R&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/system-design.png&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;h2 id=&#34;cpu-thread-design&#34;&gt;CPU thread design&lt;/h2&gt;

&lt;p&gt;Each CPU thread controls a single CUDA stream. It chooses a dataset to fit, generates many sets of initial parameters and fits each using EM. The bulk of the work of EM fitting is performed on the GPU. After fitting, the best fit is selected and stored.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/thread-design.png&#34; width=&#34;335&#34;&gt;&lt;/p&gt;

&lt;h2 id=&#34;em-kernel-design&#34;&gt;EM kernel design&lt;/h2&gt;

&lt;!-- TODO fix internal ref --&gt;

&lt;p&gt;The final kernel design is guided by the need to minimise the number of kernel launches. The reasons for this are explored in &lt;a href=&#34;#FailedKernelDesigns&#34;&gt;failed kernel designs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Recall the equations that we need to evaluate to perform a single iteration of the EM algorithm:&lt;/p&gt;

&lt;p&gt;$$
\alpha_{\ell}^{new} =   \frac{1}{N}\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})
$$&lt;/p&gt;

&lt;p&gt;$$
\mu_{\ell}^{new}    =   \frac{\sum_{i=1}^{N}x_{i}p(\ell|x_{i},\Theta^{g})}{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})}
$$&lt;/p&gt;

&lt;p&gt;$$
\lambda_{\ell}^{new}    =   \frac{\sum_{i=1}^{N}p(\ell|x_{i},\Theta^{g})}{\sum_{i=1}^{N}\frac{(x_{i}-\mu_{\ell}^{old})^{2}p(\ell|x_{i},\Theta^{g})}{(\mu_{\ell}^{old})^{2}x_{i}}}
$$&lt;/p&gt;

&lt;p&gt;Each operation within the equations can be classified as either:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;performing an operation on each observation (e.g. evaluating \( p(\ell|x_{i} \), \( \Theta^{g}) \) or \( (x_{i}-\mu_{\ell}^{old})^{2} \) ), or&lt;/li&gt;
  &lt;li&gt;summing combinations of these operations (the \( \sum_{i=1}^{N} \) operation appears multiple times)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each iteration of the EM algorithm requires \( 2+M \) kernel launches (where \( M \) is the number of mixture components being fit). The first performs the per-observation calculations. At the end of this launch, the operands to every summation are available. This stage is referred to as &lt;code&gt;member_prob_kernel&lt;/code&gt; in the source code.&lt;/p&gt;

&lt;p&gt;The second launch performs the summation required to evaluate the current log-likelihood. This is called &lt;code&gt;lp_sum_kernel&lt;/code&gt; and is described in more detail in &lt;a href=&#34;#FusedSingleLaunchSumKernel&#34;&gt;fused single launch sum kernel&lt;/a&gt;. After this launch, we stop if convergence has been achieved.&lt;/p&gt;

&lt;p&gt;The remaining launches perform the summations required to evaluate the new mixture parameter values, again using &lt;code&gt;lp_sum_kernel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The process to perform a single iteration of the EM algorithm is therefore:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Perform per-observation operations using &lt;code&gt;member_prob_kernel&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Sum over the per-observation likelihoods to calculate the solution log-likelihood using &lt;code&gt;lp_sum_kernel&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If we have converged (if the new log-likelihood is within \( \epsilon \) of the old log-likelihood) stop the process&lt;/li&gt;
&lt;li&gt;Otherwise, perform the remaining summations using &lt;code&gt;lp_sum_kernel&lt;/code&gt;. The update equations can then be evaluated to generate the next iteration&amp;rsquo;s initial parameter estimates.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&#34;FailedKernelDesigns&#34;&gt;Learning from failed kernel designs&lt;/h3&gt;

&lt;p&gt;It took a number of attempts to design a CUDA kernel that performed well. The following table summarises the failed attempts&lt;sup&gt;&lt;a href=&#34;#foot2&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;For reference, the pure R implementation took around 80ms to produce each fit using a single CPU core. Each fit runs on the same problem with the same initial conditions, running 100 iterations of the EM algorithm to produce a result.&lt;/p&gt;

&lt;table class=&#39;ui celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Time to fit&lt;/th&gt;&lt;th&gt;Problems&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;

  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Basic CUDA C&lt;/td&gt;
      &lt;td&gt;2.3ms&lt;/td&gt;
      &lt;td&gt;Summations were implemented incorrectly, so results were incorrect&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;Using the Thrust API&lt;/td&gt;
      &lt;td&gt;80ms&lt;/td&gt;
      &lt;td&gt;Slow; roughly the same performance as R on CPU&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;Thrust with Streams&lt;/td&gt;
      &lt;td&gt;80-290ms&lt;/td&gt;
      &lt;td&gt;Multiple threads interacted poorly; often slower than before&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;CUB single-threaded&lt;/td&gt;
      &lt;td&gt;21ms&lt;/td&gt;
      &lt;td&gt;Not fast enough to justify use of GPU&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;CUB multi-threaded&lt;/td&gt;
      &lt;td&gt;100ms&lt;/td&gt;
      &lt;td&gt;cudaFuncGetAttributes call using a large amount of time&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;Modify CUB&lt;/td&gt;
      &lt;td&gt;15ms&lt;/td&gt;
      &lt;td&gt;Not fast enough to justify use of GPU&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;Replace CUB with single-launch sum kernel&lt;/td&gt;
      &lt;td&gt;3.5ms&lt;/td&gt;
      &lt;td&gt;Kernel launch time is now the chief bottleneck&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td&gt;Fuse multiple summations&lt;/td&gt;
      &lt;td&gt;2.5ms&lt;/td&gt;
      &lt;td&gt;Kernel launch time is now the chief bottleneck&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Broadly, we learned the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With such small datasets, kernel launch overhead takes the vast majority of the time. This is explored further below.&lt;/li&gt;
&lt;li&gt;Libraries such as Thrust &lt;a href=&#34;#Hoberock2015&#34;&gt;(Hoberock et al., 2015)&lt;/a&gt; and CUB &lt;a href=&#34;#Corporation2015&#34;&gt;(Corporation, 2015)&lt;/a&gt;, while making it relatively easy to develop code that runs on CUDA hardware, assume that kernel launch overhead is relatively small. They perform a lot of independent kernel launches. This makes them unsuitable in this application. We must write CUDA C/C++ by hand.&lt;/li&gt;
&lt;li&gt;Support for CUDA streams is still relatively new to Thrust. A lot of operations &amp;ndash; particularly memory copies &amp;ndash; are performed without streams in mind, which costs performance.&lt;/li&gt;
&lt;li&gt;Both CUB and Thrust run poorly in multithreaded environments such as this one. The different threads interact, costing performance&lt;sup&gt;&lt;a href=&#34;#foot3&#34;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;Performing a summation in CUB or Thrust launches many kernels&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;kernel-launch-time&#34;&gt;Kernel launch time&lt;/h3&gt;

&lt;p&gt;Kernel launch time is the time that it takes the CPU to launch a kernel on the GPU. &lt;a href=&#34;#Boyer&#34;&gt;Boyer&lt;/a&gt; gives measurements showing that calling a CPU function takes about 3.3ns, but launching an asynchronous CUDA kernel takes between 3.0 and 3.9\(\mu\)s &amp;ndash; a thousand times longer. &lt;a href=&#34;#Lee2010&#34;&gt;Lee et al. (2010)&lt;/a&gt; notes that &amp;ldquo;For GPUs, we found that global inter-thread synchronization is very costly, because it involves a kernel termination and new kernel call overhead from the host.&amp;rdquo;&lt;/p&gt;

&lt;p&gt;This implies that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the time that the GPU spends executing the kernel must be greater than the time taken to launch the kernel or the GPU will be idle for some time&lt;/li&gt;
&lt;li&gt;if the CPU can execute the task in less time than the kernel takes to launch, we do not benefit from using the GPU at all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using CUDA streams or CPU threads does not impact this restriction. If many threads are trying to launch a kernel at once, they enter a queue. Only one CPU-GPU operation (a memory copy or kernel launch) can be started at any time, even though many can be in-progress simultaneously.&lt;/p&gt;

&lt;p&gt;With small datasets, the kernels do not take long to execute. Kernel launch time then becomes the main determinant of performance. The only way we can reduce launch time is to minimise the number of kernel launches.&lt;/p&gt;

&lt;h2 id=&#34;FusedSingleLaunchSumKernel&#34;&gt;Fused single launch sum kernel&lt;/h2&gt;

&lt;p&gt;The algorithm requires \( 2+M \) summations per iteration. Using the standard CUB or Thrust libraries, two kernel launches are required per summation, giving a total of nine kernel launches for each iteration on a two-component mixture.&lt;/p&gt;

&lt;p&gt;To reduce the number of kernel launches required, a new kernel was developed with two important features: it can perform &lt;em&gt;multiple summations&lt;/em&gt; with a &lt;em&gt;single kernel launch&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&#34;sum-reductions&#34;&gt;Sum reductions&lt;/h3&gt;

&lt;p&gt;Most sum kernels use a reduction tree, demonstrated on page 3 of &lt;a href=&#34;#Harris2010&#34;&gt;Harris (2010)&lt;/a&gt;. Rather than having a single thread step through each item and keeping a running total, the many cores of CUDA hardware are used. A large number of threads are launched, proportional to the number of items to be summed&lt;sup&gt;&lt;a href=&#34;#foot4&#34;&gt;4&lt;/a&gt;&lt;/sup&gt;. Each thread sums two adjacent items, a task which can be performed extremely quickly. Then, the adjacent items of those summations are summed, and so on until the last two items are summed. The effect of this is that the summation is performed in roughly \( O(\frac{N}{T}\log(N)) \) time. The traditional &amp;lsquo;running sum&amp;rsquo; algorithm operates in \( O(N) \) time; far slower on a GPU where \( T \) is large (hundreds or thousands).&lt;/p&gt;

&lt;p&gt;For 2 million items, the kernel might be launched across 1 million threads. No existing GPU hardware has this many hardware execution units available, so the launch is split into blocks. Each block runs on the hardware in turn. The blocks are not resident in the GPU at the same time and cannot synchronise with each other. Therefore, multiple kernel launches are needed to achieve synchronisation &lt;a href=&#34;#Harris2010&#34;&gt;(Harris, 2010, p. 4)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/sum-tree.png&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;The downside to this algorithm is that each stage must run completely before the next starts, and there is no way to do this except for waiting for the kernel launch to complete. This implies that many kernel launches are required.&lt;/p&gt;

&lt;p&gt;The new kernel solves this problem by adding a stage before the reduction tree. If \( T \) threads are launched, the \( t \)th thread calculates&lt;/p&gt;

&lt;p&gt;$$
\sum_{i=0}^{\left\lceil N/T\right\rceil }\begin{cases}
x_{t+iT} &amp;amp; \text{for t+iT&amp;lt;N}\\
0 &amp; \text{otherwise}
\end{cases}
$$&lt;/p&gt;

&lt;p&gt;In other words, it sums every \( T \)th element into a \( T \)-wide array. Then, the sum reduction can proceed as per normal. This is slower than the traditional algorithm, but it has the key advantage that global synchronisation is not required. The entire summation can execute with a single kernel launch.&lt;/p&gt;

&lt;p&gt;\( T \) can be selected to be any number smaller than the maximum thread block size supported by the CUDA hardware. It is critical that all threads are executed simultaneously (i.e. the kernel must be launched with a grid size of 1).&lt;/p&gt;

&lt;h3 id=&#34;kernel-fusion&#34;&gt;Kernel fusion&lt;/h3&gt;

&lt;p&gt;The M-step of EM requires three summations across \( N \)-sized arrays. These three summations can be performed in a single kernel launch by providing the sum kernel with the details of all three arrays in a single launch, rather than performing three launches. This technique is called Vectored I/O in other contexts &lt;a href=&#34;#Wikipedia2015d&#34;&gt;(Wikipedia, 2015d)&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;footnotes&#34;&gt;Footnotes&lt;/h2&gt;

&lt;p id=&#34;foot2&#34;&gt;2 Source code for these can be browsed at &lt;a href=&#34;https://github.com/ihowson/CUDA-Task-Pipeline&#34;&gt;https://github.com/ihowson/CUDA-Task-Pipeline&lt;/a&gt;&lt;/p&gt;
&lt;p id=&#34;foot3&#34;&gt;3 Profiling revealed a significant bottleneck in CUB when used in multithreaded applications. This bottleneck has been patched in &lt;a href=&#34;https://github.com/ihowson/cub/commit/0c90360c9b9c397398a646d689ddd980aa5da811&#34;&gt;https://github.com/ihowson/cub/commit/0c90360c9b9c397398a646d689ddd980aa5da811&lt;/a&gt;&lt;/p&gt;
&lt;p id=&#34;foot4&#34;&gt;4 This is a simplified explanation; the curious reader is encouraged to read &lt;a href=&#34;#Harris2010&#34;&gt;Harris (2010)&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p id=&#34;Bilmes1998&#34;&gt;JA Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, International Computer Science Institute, 1998.&lt;/p&gt;
&lt;p id=&#34;Boyer&#34;&gt;M Boyer. CUDA kernel overhead. URL &lt;a href=&#34;http://www.cs.virginia.edu/~mwb7w/cuda_support/kernel_overhead.html&#34;&gt;http://www.cs.virginia.edu/~mwb7w/cuda_support/kernel_overhead.html&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Harris2010&#34;&gt;M Harris. Optimizing parallel reduction in CUDA, Mar 2010. URL &lt;a href=&#34;http://docs.nvidia.com/cuda/ samples/6_Advanced/reduction/doc/reduction.pdf&#34;&gt;http://docs.nvidia.com/cuda/ samples/6_Advanced/reduction/doc/reduction.pdf&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;McLachlan2000&#34;&gt;G McLachlan and D Peel. &lt;i&gt;Finite mixture models&lt;/i&gt;. John Wiley &amp; Sons, Inc, 2000.&lt;/p&gt;
&lt;p id=&#34;Hoberock2015&#34;&gt;J Hoberock and N Bell. Thrust - parallel algorithms library, Mar 2015. URL &lt;a href=&#34;http://thrust.github.io/&#34;&gt;http://thrust.github.io/&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Prates2013&#34;&gt;MO Prates, CRB Cabral, and VH Lachos. mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. &lt;i&gt;Journal of Statistical Software&lt;/i&gt;, 54(12), August 2013.&lt;/p&gt;
&lt;p id=&#34;Corporation2015&#34;&gt;NVIDIA Corporation. CUB, Apr 2015. URL &lt;a href=&#34;http://nvlabs.github.io/cub/&#34;&gt;http://nvlabs.github.io/cub/&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Rennich2011&#34;&gt;S Rennich. CUDA C/C++ streams and concurrency, 2011. URL &lt;a href=&#34;http://on-demand.gputechconf.com/gtc-express/2011/presentations/StreamsAndConcurrencyWebinar.pdf&#34;&gt;http://on-demand.gputechconf.com/gtc-express/2011/presentations/StreamsAndConcurrencyWebinar.pdf&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Schepers2015&#34;&gt;J Schepers. Improved random-starting method for the EM algorithm for finite mixtures of regressions. &lt;i&gt;Behavior Research Methods&lt;/i&gt;, 47(1):134–146, Mar 2015.&lt;/p&gt;
&lt;p id=&#34;Tzeng2012&#34;&gt;S Tzeng, A Patney, and JD Owens. GPU task-parallelism: primitives and applications, 2012. URL &lt;a href=&#34;http://on-demand.gputechconf.com/gtc/2012/presentations/S0138-GPU-Task-Parallelism-Primitives-and-Apps.pdf&#34;&gt;http://on-demand.gputechconf.com/gtc/2012/presentations/S0138-GPU-Task-Parallelism-Primitives-and-Apps.pdf&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Wikipedia2014&#34;&gt;Wikipedia. Data parallelism, Dec 2014. URL &lt;a href=&#34;http://en.wikipedia.org/wiki/Data_parallelism&#34;&gt;http://en.wikipedia.org/wiki/Data_parallelism&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Wikipedia2015a&#34;&gt;Wikipedia. Inverse gaussian distribution, February 2015a. URL &lt;a href=&#34;http://en.wikipedia.org/wiki/Inverse_Gaussian_distribution&#34;&gt;http://en.wikipedia.org/wiki/Inverse_Gaussian_distribution&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Wikipedia2015c&#34;&gt;Wikipedia. Task parallelism, Mar 2015c. URL &lt;a href=&#34;http://en.wikipedia.org/wiki/Task_parallelism&#34;&gt;http://en.wikipedia.org/wiki/Task_parallelism&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Wikipedia2015d&#34;&gt;Wikipedia. Vectored I/O, February 2015d. URL &lt;a href=&#34;http://en.wikipedia.org/wiki/Vectored_I/O&#34;&gt;http://en.wikipedia.org/wiki/Vectored_I/O&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Woolley2013&#34;&gt;C Woolley. GPU optimization fundamentals, 2013. URL &lt;a href=&#34;https://www.olcf.ornl.gov/wp-content/uploads/2013/02/GPU_Opt_Fund-CW1.pdf&#34;&gt;https://www.olcf.ornl.gov/wp-content/uploads/2013/02/GPU_Opt_Fund-CW1.pdf&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Results</title>
      <link>https://ianhowson.com/bulkem/results/</link>
      <pubDate>Thu, 18 Jun 2015 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/bulkem/results/</guid>
      <description>

&lt;h2 id=&#34;time-to-fit-a-single-dataset&#34;&gt;Time to fit a single dataset&lt;/h2&gt;

&lt;p&gt;To test performance across varying dataset sizes, we sample from a two-component inverse Gaussian mixture model with known parameters. Only a single dataset is fit.&lt;/p&gt;

&lt;table class=&#39;ui celled table&#39;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Dataset size&lt;/th&gt;
            &lt;th&gt;CPU time (seconds)&lt;/th&gt;
            &lt;th&gt;GPU time (seconds)&lt;/th&gt;
            &lt;th&gt;GPU speedup&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;

    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;100&lt;/td&gt;
            &lt;td&gt;0.00620&lt;/td&gt;
            &lt;td&gt;0.01576&lt;/td&gt;
            &lt;td&gt;0.39&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;1,000&lt;/td&gt;
            &lt;td&gt;0.06032&lt;/td&gt;
            &lt;td&gt;0.01572&lt;/td&gt;
            &lt;td&gt;3.84&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;10,000&lt;/td&gt;
            &lt;td&gt;0.67876&lt;/td&gt;
            &lt;td&gt;0.03924&lt;/td&gt;
            &lt;td&gt;17.30&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;100,000&lt;/td&gt;
            &lt;td&gt;6.35048&lt;/td&gt;
            &lt;td&gt;0.19740&lt;/td&gt;
            &lt;td&gt;32.17&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;1,000,000&lt;/td&gt;
            &lt;td&gt;67.98868&lt;/td&gt;
            &lt;td&gt;1.87952&lt;/td&gt;
            &lt;td&gt;36.17&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/elapsed-absolute.png&#34; width=&#39;350&#39;&gt;
&lt;img src=&#34;https://ianhowson.com/bulkem/elapsed-absolute-log.png&#34; width=&#39;350&#39;&gt;
&lt;img src=&#34;https://ianhowson.com/bulkem/elapsed-speedup.png&#34; width=&#39;350&#39;&gt;&lt;/p&gt;

&lt;p&gt;On the test hardware, we see that the GPU is slower for small dataset sizes (100 samples) but outperforms the CPU for larger dataset sizes. For datasets with 1 million samples, the GPU runs around 36 times faster than the CPU.&lt;/p&gt;

&lt;h2 id=&#34;time-to-fit-many-datasets&#34;&gt;Time to fit many datasets&lt;/h2&gt;

&lt;p&gt;In this case, the dataset size is held constant (2000 samples) and we fit many datasets simultaneously, generating them in the same way as for the single dataset case.&lt;/p&gt;

&lt;table class=&#39;ui celled table&#39;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Number of datasets&lt;/th&gt;
            &lt;th&gt;CPU time (seconds)&lt;/th&gt;
            &lt;th&gt;GPU time (seconds)&lt;/th&gt;
            &lt;th&gt;GPU speedup&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;

    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;1&lt;/td&gt;
            &lt;td&gt;0.10452&lt;/td&gt;
            &lt;td&gt;0.02092&lt;/td&gt;
            &lt;td&gt;5.00&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;10&lt;/td&gt;
            &lt;td&gt;1.12048&lt;/td&gt;
            &lt;td&gt;0.05364&lt;/td&gt;
            &lt;td&gt;20.89&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;100&lt;/td&gt;
            &lt;td&gt;10.12788&lt;/td&gt;
            &lt;td&gt;0.35904&lt;/td&gt;
            &lt;td&gt;28.21&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;1000&lt;/td&gt;
            &lt;td&gt;102.40840&lt;/td&gt;
            &lt;td&gt;3.42036&lt;/td&gt;
            &lt;td&gt;29.94&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/bulkem/multi-absolute.png&#34; width=&#34;350&#34;&gt;
&lt;img src=&#34;https://ianhowson.com/bulkem/multi-absolute-log.png&#34; width=&#34;350&#34;&gt;
&lt;img src=&#34;https://ianhowson.com/bulkem/multi-speedup.png&#34; width=&#34;350&#34;&gt;&lt;/p&gt;

&lt;p&gt;We see similar results &amp;ndash; the ratio of GPU-CPU performance increases as the number of datasets increases. When 1000 datasets of 2000 samples are being fit simultaneously, the GPU runs around 30 times as fast as the CPU.&lt;/p&gt;

&lt;h2 id=&#34;multiple-datasets-on-ec2&#34;&gt;Multiple datasets on EC2&lt;/h2&gt;

&lt;p&gt;Comparing performance of CPUs vs. GPUs is somewhat unsound; there is no obvious way to say &amp;ldquo;this CPU is equivalent to this GPU&amp;rdquo;. Most papers, including this one, compare performance using whatever hardware the author had available at the time &lt;a href=&#34;#Gillespie2011&#34;&gt;(Gillespie, 2011)&lt;/a&gt;. No effort was made to optimise the CPU implementation, while significant time was spent optimising the GPU implementation, an issue discussed in depth in &lt;a href=&#34;#Lee2010&#34;&gt;(Lee et al., 2010)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Fortunately, services such as Amazon EC2 &lt;a href=&#34;#Services&#34;&gt;(Services, ??)&lt;/a&gt; provide an alternative way to compare the CPU and GPU approaches: cost of rental. For a given price, one will be able to rent a certain amount of hardware which will perform the desired computations in an amount of time. Both CPU and GPU time can be rented. A fairer way to compare the two technologies is the &lt;em&gt;cost to perform your computation&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A summary of the machine configurations is available at &lt;a href=&#34;#Services2015&#34;&gt;Services (2015)&lt;/a&gt;. For reference, ECUs are a measure of allocated CPU capacity. The US East region was selected as it is generally the lowest priced.&lt;/p&gt;

&lt;p&gt;For the CPU implementation, we selected a c4.large instance as they provide the best price-performance ratio at the time of writing (eight ECUs and two CPU cores at USD$0.116/hour as of 2015-06-09). t2 instances are not suitable as they provide &amp;lsquo;burstable&amp;rsquo; CPU performance; they are not intended for long-running jobs. This machine has two CPU cores, but the R implementation will only use one. As there are no dependencies between datasets, we will assume that additional CPU cores will provide a linear speedup (that is, with appropriate software, we could obtain double the performance with double the CPU cores). The rationale for this is explored further in &lt;a href=&#34;https://ianhowson.com/bulkem/design/#LinearSpeedupAssumption&#34;&gt;the linear speedup assumption&lt;/a&gt;. Also note that pricing for c4 instances is close to constant per CPU core and ECU allocation; the cost-to-fit ought to remain constant regardless of instance choice.&lt;/p&gt;

&lt;table class=&#39;ui celled table&#39;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Name&lt;/th&gt;
            &lt;th&gt;Number of CPU cores&lt;/th&gt;
            &lt;th&gt;ECU allocation&lt;/th&gt;
            &lt;th&gt;Price per hour (USD)&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;

    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;c4.large&lt;/td&gt;
            &lt;td&gt;2&lt;/td&gt;
            &lt;td&gt;8&lt;/td&gt;
            &lt;td&gt;0.116&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;c4.xlarge&lt;/td&gt;
            &lt;td&gt;4&lt;/td&gt;
            &lt;td&gt;16&lt;/td&gt;
            &lt;td&gt;0.232&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;c4.2xlarge&lt;/td&gt;
            &lt;td&gt;8&lt;/td&gt;
            &lt;td&gt;31&lt;/td&gt;
            &lt;td&gt;0.464&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;c4.4xlarge&lt;/td&gt;
            &lt;td&gt;16&lt;/td&gt;
            &lt;td&gt;62&lt;/td&gt;
            &lt;td&gt;0.928&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;c4.8xlarge&lt;/td&gt;
            &lt;td&gt;36&lt;/td&gt;
            &lt;td&gt;132&lt;/td&gt;
            &lt;td&gt;1.856&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;For the GPU implementation, we chose a g2.x2large instance at USD$0.650/hour. Rephrasing this in terms of speedup ratios, the GPU implementation must achieve a 0.65/(0.&lt;sup&gt;116&lt;/sup&gt;&amp;frasl;&lt;sub&gt;2&lt;/sub&gt;)=11.2x speedup ratio in order to break even on cost.&lt;/p&gt;

&lt;p&gt;As before, all datasets contain 2000 randomly generated samples.&lt;/p&gt;

&lt;table class=&#39;ui celled table&#39;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Datasets (D)&lt;/th&gt;
            &lt;th&gt;CPU time&lt;/th&gt;
            &lt;th&gt;GPU time&lt;/th&gt;
            &lt;th&gt;CPU cost&lt;/th&gt;
            &lt;th&gt;GPU cost&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th&gt;&lt;/th&gt;
            &lt;th&gt;(seconds)&lt;/th&gt;
            &lt;th&gt;(seconds)&lt;/th&gt;
            &lt;th&gt;(USDx10&lt;sup&gt;-6&lt;/sup&gt;)&lt;/th&gt;
            &lt;th&gt;(USDx10&lt;sup&gt;-6&lt;/sup&gt;)&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;

    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;1&lt;/td&gt;
            &lt;td&gt;0.08912&lt;/td&gt;
            &lt;td&gt;0.01708&lt;/td&gt;
            &lt;td&gt;1.44&lt;/td&gt;
            &lt;td&gt;3.08&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;10&lt;/td&gt;
            &lt;td&gt;0.87360&lt;/td&gt;
            &lt;td&gt;0.06940&lt;/td&gt;
            &lt;td&gt;14.07&lt;/td&gt;
            &lt;td&gt;12.53&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;100&lt;/td&gt;
            &lt;td&gt;9.17784&lt;/td&gt;
            &lt;td&gt;0.63868&lt;/td&gt;
            &lt;td&gt;147.86&lt;/td&gt;
            &lt;td&gt;115.32&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;1000&lt;/td&gt;
            &lt;td&gt;84.44992&lt;/td&gt;
            &lt;td&gt;6.39072&lt;/td&gt;
            &lt;td&gt;1360.58&lt;/td&gt;
            &lt;td&gt;1153.88&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;From this, we can see that the GPU implementation is slightly more cost-effective than the CPU implementation for larger problems. The difference is not large and could probably be eliminated altogether with some optimisation work on the CPU implementation.&lt;/p&gt;

&lt;p&gt;These prices differences may seem to be trivial (who cares about &lt;em&gt;microcents&lt;/em&gt;?) but recall that use cases may include many more datasets (tens of thousands of datasets is the intended use case) and require random initialisation to achieve a good fit (100 random initialisations means 100 times as much work, and therefore cost). For 40,000 datasets and 100 random initialisations, the cost is around USD$5.44 using the CPU implementation and USD$4.62 using the GPU implementation.&lt;/p&gt;

&lt;p&gt;For the large dataset test, we obtain the following results:&lt;/p&gt;

&lt;table class=&#39;ui celled table&#39;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Samples (N)&lt;/th&gt;
            &lt;th&gt;CPU time&lt;/th&gt;
            &lt;th&gt;GPU time&lt;/th&gt;
            &lt;th&gt;CPU cost&lt;/th&gt;
            &lt;th&gt;GPU cost&lt;/th&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;th&gt;&lt;/th&gt;  
            &lt;th&gt;(seconds)&lt;/th&gt;
            &lt;th&gt;(seconds)&lt;/th&gt;
            &lt;th&gt;(USDx10&lt;sup&gt;-6&lt;/sup&gt;)&lt;/th&gt;
            &lt;th&gt;(USDx10&lt;sup&gt;-6&lt;/sup&gt;)&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;

    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;100&lt;/td&gt;
            &lt;td&gt;0.00724&lt;/td&gt;
            &lt;td&gt;0.01616&lt;/td&gt;
            &lt;td&gt;0.12&lt;/td&gt;
            &lt;td&gt;2.92&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;1,000&lt;/td&gt;
            &lt;td&gt;0.05264&lt;/td&gt;
            &lt;td&gt;0.02032&lt;/td&gt;
            &lt;td&gt;0.85&lt;/td&gt;
            &lt;td&gt;3.67&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;10,000&lt;/td&gt;
            &lt;td&gt;0.57628&lt;/td&gt;
            &lt;td&gt;0.03264&lt;/td&gt;
            &lt;td&gt;9.28&lt;/td&gt;
            &lt;td&gt;5.89&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;100,000&lt;/td&gt;
            &lt;td&gt;6.03700&lt;/td&gt;
            &lt;td&gt;0.22568&lt;/td&gt;
            &lt;td&gt;97.26&lt;/td&gt;
            &lt;td&gt;40.75&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;1,000,000&lt;/td&gt;
            &lt;td&gt;50.11764&lt;/td&gt;
            &lt;td&gt;2.25400&lt;/td&gt;
            &lt;td&gt;807.45&lt;/td&gt;
            &lt;td&gt;406.97&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;For sufficiently large problems, the GPU instances can perform the model fits at roughly half the price.&lt;/p&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p id=&#34;Services&#34;&gt;Amazon Web Services. Amazon EC2. URL &lt;a href=&#34;http://aws.amazon.com/ec2/&#34;&gt;http://aws.amazon.com/ec2/&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Services2015&#34;&gt;Amazon Web Services. Amazon EC2 pricing, 2015. URL &lt;a href=&#34;http://aws.amazon.com/ec2/pricing/&#34;&gt;http://aws.amazon.com/ec2/pricing/&lt;/a&gt;.
&lt;p id=&#34;Gillespie2011&#34;&gt;C Gillespie. Reviewing a paper that uses GPUs, July 2011. URL &lt;a href=&#34;https://csgillespie.wordpress.com/2011/07/12/how-to-review-a-gpu-statistics-paper/&#34;&gt;https://csgillespie.wordpress.com/2011/07/12/how-to-review-a-gpu-statistics-paper/&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Lee2010&#34;&gt;VW Lee, C Kim, J Chhugani, M Deisher, D Kim, AD Nguyen, N Satish, M Smelyanskiy, S Chennupaty, P Hammarlund, R Singhal, and P Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In &lt;i&gt;ISCA ’10 Proceedings of the 37th Annual International Symposium on Computer Architecture&lt;/i&gt;, pages 451–460. ACM, 2010.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Background</title>
      <link>https://ianhowson.com/bulkem/background/</link>
      <pubDate>Thu, 18 Jun 2015 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/bulkem/background/</guid>
      <description>

&lt;h2 id=&#34;the-inverse-gaussian-distribution&#34;&gt;The inverse Gaussian distribution&lt;/h2&gt;

&lt;p&gt;The inverse Gaussian distribution is an exponential-family probability distribution with the density function:&lt;/p&gt;

&lt;p&gt;$$ f(x)=\left[\frac{\lambda}{2\pi x^{3}}\right]^{1/2}\exp\frac{-\lambda(x-\mu)^{2}}{2\mu^{2}x} $$&lt;/p&gt;

&lt;p&gt;for \( x&amp;gt;0 \), mean \( \mu&amp;gt;0 \) and shape \( \lambda&amp;gt;0 \) &lt;a href=&#34;#Seshadri1993&#34;&gt;(Seshadri, 1993, p. 1)&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;the-expectation-maximisation-algorithm&#34;&gt;The expectation-maximisation algorithm&lt;/h2&gt;

&lt;!-- TODO references throughout --&gt;

&lt;p&gt;The EM algorithm &lt;a href=&#34;#Dempster1977&#34;&gt;(Dempster et al., 1977)&lt;/a&gt; iteratively refines a maximum likelihood estimate in the presence of missing data. Here, we use it to fit mixture models, as described in &lt;a href=&#34;#Bilmes1998&#34;&gt;Bilmes (1998)&lt;/a&gt;. Two characteristics are of note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As it is an iterative algorithm, the execution time can be quite long. Hence our desire to find a faster way to perform model fitting.&lt;/li&gt;
&lt;li&gt;EM is sensitive to the choice of initial parameters &lt;a href=&#34;#Aitkin1980&#34;&gt;(Aitkin et al., 1980, p. 327)&lt;/a&gt;. Therefore, we must think about how best to select them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;gpus-and-cuda&#34;&gt;GPUs and CUDA&lt;/h2&gt;

&lt;p&gt;A GPU is a component of a computer that accelerates graphics operations. All modern computers and mobile devices include a GPU. GPUs can be programmed to perform general-purpose computations in the same way as a CPU; this is referred to as &amp;lsquo;general purpose GPU computing&amp;rsquo;.&lt;/p&gt;

&lt;p&gt;GPUs are similar to CPUs in that they run user-defined software. The key difference is that while a CPU generally has a small number of execution units (two or four are common for consumer hardware), a GPU may have thousands of execution units operating in parallel. The CPU is optimised for &lt;em&gt;serial&lt;/em&gt; operations &amp;ndash; performing a sequence of instructions as quickly as possible. The GPU is optimised for &lt;em&gt;parallel&lt;/em&gt; operations where the same instructions are performed many times on different data. Each execution unit of the GPU is simple and more restricted, but there are many more of them. The peak computational output (measured in instructions per second) is far greater than for a CPU. Redesigning the problem to take advantage of this structure is the major challenge of the programmer working on a GPU computing problem.&lt;/p&gt;

&lt;p&gt;There are two major standards for GPU computing: OpenCL and CUDA. OpenCL is supported by most GPU vendors. CUDA is only supported by NVIDIA hardware but it is a mature standard with excellent tools and documentation. We will only consider CUDA from this point on.&lt;/p&gt;

&lt;p&gt;There are significant barriers to widespread adoption of GPU computing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not everyone has access to &amp;lsquo;large&amp;rsquo; GPU hardware. Most CPUs now ship with on-board GPUs which are sufficient for graphics, but do not provide significant performance advantages in the GPU computing context.&lt;/li&gt;
&lt;li&gt;The effort required to port a given algorithm to a GPU is large. Programming GPUs is significantly more difficult than CPUs&lt;/li&gt;
&lt;li&gt;CPUs will always be the first to get any new algorithm&lt;/li&gt;
&lt;li&gt;Most problems do not require a large amount of computing power. There is no incentive to speed up something that is already fast.&lt;/li&gt;
&lt;li&gt;Not all algorithms run faster when executed on a GPU. For an algorithm to be a good candidate for GPU execution, it must generally:

&lt;ul&gt;
&lt;li&gt;require many iterations (thousands), each of which is independent of the others&lt;/li&gt;
&lt;li&gt;require a large amount of computation time relative to the amount of memory access&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;For many problems and institutions, a cluster of general purpose PCs is a better fit. It has the advantages of running all available software, requiring minimal rework and being &amp;lsquo;familiar&amp;rsquo; (programming a cluster is very similar to that of programming a single PC).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p id=&#34;Aitkin1980&#34;&gt;M Aitkin and GT Wilson. Mixture models, outliers, and the EM algorithm. &lt;i&gt;Technometrics&lt;/i&gt;, 22(3): 325–331, Aug 1980.&lt;/a&gt;
&lt;p id=&#34;Bilmes1998&#34;&gt;JA Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, International Computer Science Institute, 1998.&lt;/a&gt;
&lt;p id=&#34;Dempster1977&#34;&gt;AP Dempster, NM Laird, and DB Rubin. Maximum likelihood from incomplete data via the EM algorithm. &lt;i&gt;Journal of the Royal Statistical Society. Series B (Methodological)&lt;/i&gt;, 39(1):1–38, 1977.&lt;/p&gt;
&lt;p id=&#34;Seshadri1993&#34;&gt;V Seshadri. &lt;i&gt;The inverse gaussian distribution: a case study in exponential families.&lt;/i&gt; Oxford Science Publications, 1993.&lt;/p&gt;

&lt;!-- TODO: is there something that provides inline/hover references? --&gt;
</description>
    </item>
    
    <item>
      <title>Introduction</title>
      <link>https://ianhowson.com/bulkem/introduction/</link>
      <pubDate>Thu, 18 Jun 2015 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/bulkem/introduction/</guid>
      <description>

&lt;p&gt;We wish to fit mixture models to a large number of datasets. We assume that an appropriate model is a two-component mixture of inverse Gaussian distributions. The components of the data are not well separated. The use case is roughly 40,000 datasets of 2,000 observations each.&lt;/p&gt;

&lt;p&gt;A natural way to estimate the mixture parameters is to use the Expectation Maximisation algorithm. However, two problems need to be overcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Because of the large amount of data, most software implementations of the EM algorithm take a long time to execute&lt;/li&gt;
&lt;li&gt;Because the EM algorithm is sensitive to the selection of initial parameters, many attempts must be made to fit any given dataset. This makes the fitting process even slower.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One way to reduce the time needed to generate the models is to use CUDA hardware. CUDA uses graphical processing units (GPUs) found in many computers to perform general-purpose computation. The software used to perform model fitting must be customised to suit CUDA.&lt;/p&gt;

&lt;p&gt;This report describes the design and development of bulkem&lt;sup&gt;&lt;a href=&#34;#foot1&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;, an R package which fits mixture models using CUDA hardware. Using CUDA hardware, bulkem can fit a large number of small datasets around thirty times faster than a conventional CPU. It can fit very large datasets around 36 times faster than a conventional CPU.&lt;/p&gt;

&lt;h2 id=&#34;conventions&#34;&gt;Conventions&lt;/h2&gt;

&lt;p&gt;The following variables are used throughout this report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;\(N\) : the number of elements in an array or the number of observations in a dataset&lt;/li&gt;
&lt;li&gt;\(M\) : the number of components in the mixture model being fit&lt;/li&gt;
&lt;li&gt;\(T\) : the number of threads being launched&lt;/li&gt;
&lt;li&gt;\(D\) : the number of datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A number of performance measurements are quoted. Unless otherwise specified, those measurements were performed on an Intel i5-4460 (quad-core 3.2GHz) CPU with an NVIDIA GeForce GTX 660. The machine is running OS X Yosemite, CUDA Toolkit 6.5 and R 3.1.2.&lt;/p&gt;

&lt;h2 id=&#34;footnotes&#34;&gt;Footnotes&lt;/h2&gt;

&lt;p id=&#39;foot1&#39;&gt;1. The bulkem source code is available at &lt;a href=&#34;https://github.com/ihowson/bulkem&#34;&gt;https://github.com/ihowson/bulkem&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Discussion and conclusion</title>
      <link>https://ianhowson.com/bulkem/discussion/</link>
      <pubDate>Thu, 18 Jun 2015 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/bulkem/discussion/</guid>
      <description>

&lt;h2 id=&#34;LinearSpeedupAssumption&#34;&gt;The linear speedup assumption&lt;/h2&gt;

&lt;p&gt;Fitting models to independent datasets is an embarrassingly parallel &lt;a href=&#34;#Wikipedia2015b&#34;&gt;(Wikipedia, 2015)&lt;/a&gt; problem. The datasets have no dependence on each other and can be fitted separately.&lt;/p&gt;

&lt;p&gt;This implies that, in an ideal world:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If we have enough datasets, we can expect the performance figures quoted here to extrapolate linearly (i.e. if fitting 1000 datasets takes 10 seconds, we can expect that fitting 2000 datasets will take around 20 seconds)&lt;/li&gt;
&lt;li&gt;If we have more hardware to parallelise across (e.g. more CPU cores, more computers with or without GPUs) we can expect them to reduce the computation time proportionally to the amount of resources added&lt;/li&gt;
&lt;li&gt;Assuming that EC2 has an unlimited supply of hardware for us to rent, we can perform very large model fits in an arbitrarily small amount of time with the same total cost. Renting twice as much hardware will halve our model fit time, so the total expenditure remains the same.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, in practice things are not so ideal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 bills per-hour, so we cannot reduce execution time for extremely large jobs below an hour without incurring additional costs&lt;/li&gt;
&lt;li&gt;EC2 instances take some time to boot, so there is a cost to using large numbers of instances&lt;/li&gt;
&lt;li&gt;Large clusters of machines incur some overhead for communication&lt;/li&gt;
&lt;li&gt;Very small tasks have fixed overheads. We saw this in the GPU results, where there was almost no time difference between a 100-sample dataset and a 1000-sample dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using R, the &lt;code&gt;foreach&lt;/code&gt; package &lt;a href=&#34;#Analytics2014&#34;&gt;(Analytics et al., 2014)&lt;/a&gt; makes it easy to parallelise code across cores within a computer. The &lt;code&gt;snow&lt;/code&gt; package &lt;a href=&#34;#Tierney2013&#34;&gt;(Tierney et al., 2013)&lt;/a&gt; is suitable for use across networked clusters of computers.&lt;/p&gt;

&lt;h2 id=&#34;future-work&#34;&gt;Future work&lt;/h2&gt;

&lt;h3 id=&#34;improving-gpu-performance&#34;&gt;Improving GPU performance&lt;/h3&gt;

&lt;p&gt;Kernel launch time is still the limiting factor, so further reducing the number of kernel launches is the natural way to improve performance. Ideally, the entire EM algorithm (across multiple iterations) could be moved to the GPU using similar techniques to &lt;code&gt;lp_sum&lt;/code&gt; to handle datasets larger than the thread block size.&lt;/p&gt;

&lt;p&gt;On the author&amp;rsquo;s hardware, compute occupancy is around 60%, so we could expect, at most, another \( \frac{1}{0.6}= \) 67% performance gain.&lt;/p&gt;

&lt;p&gt;At that stage, GPU performance would likely be the limiting factor. The current implementation of the &lt;code&gt;lp_sum&lt;/code&gt; summation is not very efficient, but it would be prudent to check for hotspots with a profiler before investing further development time.&lt;/p&gt;

&lt;p&gt;Finally, while running the GPU software, the CPUs have significant idle time. With additional software support, one could perform additional model fits using that idle CPU capacity, improving performance-per-dollar further.&lt;/p&gt;

&lt;h3 id=&#34;improving-cpu-performance&#34;&gt;Improving CPU performance&lt;/h3&gt;

&lt;p&gt;The obvious way to improve performance of the CPU implementation is to take advantage of additional CPU cores. The easiest way to achieve this is with the &lt;code&gt;foreach&lt;/code&gt; R package, which can run arbitrary R code across any number of CPU threads. Running the R code under a profiler ought to reveal hotspots which can guide optimisation of the R code, Finally, rewriting the R implementation in C might provide further improvement.&lt;/p&gt;

&lt;h3 id=&#34;new-functionality&#34;&gt;New functionality&lt;/h3&gt;

&lt;p&gt;Modifying &lt;code&gt;bulkem&lt;/code&gt; to fit Normal mixture models would be fairly straightforward and very useful; Normal models are far more common than inverse Gaussian.&lt;/p&gt;

&lt;h1 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;This report describes &lt;code&gt;bulkem&lt;/code&gt;, an R package which fits inverse Gaussian mixture models using the EM algorithm. It has demonstrated that GPUs can provide significant performance and cost advantages over CPUs in this application. Unlike most GPU computing packages, &lt;code&gt;bulkem&lt;/code&gt; offers significant performance improvement even on small datasets. Directions for further improvement of the &lt;code&gt;bulkem&lt;/code&gt; algorithms have been identified which might provide further improvement.&lt;/p&gt;

&lt;h1 id=&#34;references&#34;&gt;References&lt;/h1&gt;

&lt;p id=&#34;Analytics2014&#34;&gt;Revolution Analytics and S Weston. foreach: foreach looping construct for R, Apr 2014. URL &lt;a href=&#34;http://cran.r-project.org/web/packages/foreach/index.html&#34;&gt;http://cran.r-project.org/web/packages/foreach/index.html&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Wikipedia2015b&#34;&gt;Wikipedia. Embarrassingly parallel, Mar 2015b. URL &lt;a href=&#34;http://en.wikipedia.org/wiki/Embarrassingly_parallel&#34;&gt;http://en.wikipedia.org/wiki/Embarrassingly_parallel&lt;/a&gt;.&lt;/p&gt;
&lt;p id=&#34;Tierney2013&#34;&gt;L Tierney, AJ Rossini, N Li, and H Sevcikova. snow: simple network of workstations, Sep 2013. URL &lt;a href=&#34;http://cran.r-project.org/web/packages/snow/index.html&#34;&gt;http://cran.r-project.org/web/packages/snow/index.html&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Dell P2715Q review</title>
      <link>https://ianhowson.com/blog/dell-p2715q-4k-ips-monitor-review/</link>
      <pubDate>Mon, 05 Jan 2015 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/dell-p2715q-4k-ips-monitor-review/</guid>
      <description>

&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Buy it.&lt;/p&gt;

&lt;p&gt;Buy it and a new camera and the biggest graphics card you can find, because you’ll need them to take advantage of the beautiful panel.&lt;/p&gt;

&lt;p&gt;Do your research to make sure it will work with your computer. My MacBook didn’t work at full resolution, despite being on Apple’s 4K compatibility list. (&lt;em&gt;Wait, no, they&amp;rsquo;ve removed it. MacBook 13&amp;rdquo; Retina before 2015 definitely doesn&amp;rsquo;t work at 60Hz.&lt;/em&gt;)&lt;/p&gt;

&lt;h2 id=&#34;the-good&#34;&gt;The Good&lt;/h2&gt;

&lt;p&gt;It’s an affordable 4K IPS monitor. What’s not to like?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I paid AUD$747 with a 15% discount coupon &amp;ndash; easy to find if you Google a little bit. It’s cheaper again in the US.&lt;/li&gt;
&lt;li&gt;It’s 4K (3840x2160) and looks absolutely stunning. Once you go Retina, you never go back.

&lt;ul&gt;
&lt;li&gt;Now you can see all of the flaws in your photos. Time to buy a new camera.&lt;/li&gt;
&lt;li&gt;Practically all of the downloadable wallpaper labelled 4K is actually resized from something lower &amp;ndash; it’s noticeably blurry.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;IPS panel means that you don’t get colour or brightness shift as you change viewing angle. It always looks accurate.&lt;/li&gt;
&lt;li&gt;The display rotates 90&amp;deg; onto its side so you can easily adjust the cables.&lt;/li&gt;
&lt;li&gt;Built-in USB3 hub.&lt;/li&gt;
&lt;li&gt;Ships with a mini-DP to DisplayPort cable. This is a great choice. Laptop users can use mini-DP on the laptop to DisplayPort on the display; desktop users can use DisplayPort on the computer to mini-DP on the display.

&lt;ul&gt;
&lt;li&gt;Out of the box, this worked perfectly with OS X Mavericks and a Geforce GTX 660 at 60Hz.&lt;/li&gt;
&lt;li&gt;It &lt;em&gt;didn’t&lt;/em&gt; work with a 2014 Retina MacBook Pro.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;OS X&amp;rsquo;s displays thing lets you choose the native 3840x2160, which is very usable if you have good eyesight. I’m mostly running at scaled 2560x1440 equivalent. Scaling doesn’t seem to hurt performance at all on the GTX 660.&lt;/li&gt;
&lt;li&gt;Power consumption is about 30W, which is half of the monitor it’s replacing (a Dell 2407WFP).&lt;/li&gt;
&lt;li&gt;The panel has a matte finish, but it doesn&amp;rsquo;t get in the way &amp;ndash; some high-resolution matte displays look shimmery.&lt;/li&gt;
&lt;li&gt;Games look &lt;em&gt;amazing&lt;/em&gt;, but the GTX 660 isn’t really up to the task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;the-bad&#34;&gt;The Bad&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sometimes the display wakes up at 2560x1440 instead of 3840x2160. Turning it off and on again fixes this, but it also turns off any connected USB devices, so don’t do that if you have a hard drive connected.&lt;/li&gt;
&lt;li&gt;There&amp;rsquo;s no more speaker power connector, which I used to power my DAC.&lt;/li&gt;
&lt;li&gt;The display locks hard occasionally and needs to be power-cycled (unplug power cable).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;the-macbook&#34;&gt;The MacBook&lt;/h3&gt;

&lt;p&gt;I have a mid-2014 13&amp;rdquo; Retina MacBook Pro. This machine &lt;em&gt;was&lt;/em&gt; on Apple’s 4K compatibility list but has since been removed.&lt;/p&gt;

&lt;p&gt;Using the DisplayPort interface, it syncs at up to 2560x1440.&lt;/p&gt;

&lt;p&gt;Using the HDMI interface, it syncs at 3840x2160, but only at 30Hz. This is mostly OK.&lt;/p&gt;

&lt;p&gt;Oddly, using HDMI, the list of offered scaling options is different to my Mac Pro using DisplayPort. Using the Mac Pro, I get a &amp;lsquo;looks like 2560x1440&amp;rsquo; option, which is my preference. Using the Macbook, the only options are 3840x2160 (native), 1920x1080 (native HiDPI), 1504x846 (scaled HiDPI) and 1152x648 (scaled HiDPI).&lt;/p&gt;

&lt;p&gt;Fortunately I didn’t buy this monitor to use with the Macbook, so I’m not at all bothered.&lt;/p&gt;

&lt;h2 id=&#34;the-imperfect-but-inconsequential&#34;&gt;The imperfect, but inconsequential&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The rotating stand needs a little fiddling to sit level.&lt;/li&gt;
&lt;li&gt;The OSD is still HD, not QHD (it pixel doubles)&lt;/li&gt;
&lt;li&gt;16:9 ratio, but we lost that battle a long time ago&lt;/li&gt;
&lt;li&gt;There are two DisplayPort ports but only one appears in the OSD and only one seems to work (the one near the power cable). No idea why.&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Common Production Tasks</title>
      <link>https://ianhowson.com/openedx/common-production-tasks/</link>
      <pubDate>Mon, 10 Nov 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/common-production-tasks/</guid>
      <description>

&lt;p&gt;Also see &lt;a href=&#34;https://github.com/edx/configuration/wiki/edX-Managing-the-Production-Stack#updating-versions-using-edx-repos&#34;&gt;https://github.com/edx/configuration/wiki/edX-Managing-the-Production-Stack#updating-versions-using-edx-repos&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&#34;install-updates&#34;&gt;Install updates&lt;/h2&gt;

&lt;p&gt;As &lt;code&gt;root&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;/edx/bin/update edx-platform release&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;upgrade-procedure&#34;&gt;Upgrade procedure&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;put up a ‘down for maintenance’ message&lt;/li&gt;
&lt;li&gt;make sure the original server is accessible to you (public must not be able to make changes); might need to close any existing connections&lt;/li&gt;
&lt;li&gt;take a snapshot using LXC (have to take down the server to do this); also verify that the snapshots can be restored if things go badly&lt;/li&gt;
&lt;li&gt;perform the upgrades&lt;/li&gt;
&lt;li&gt;verify that upgrades are working correctly&lt;/li&gt;
&lt;li&gt;remove the ‘down for maintenance’ message&lt;/li&gt;
&lt;li&gt;LATER: remove any snapshots once you’re sure that they’re not needed

&lt;ul&gt;
&lt;li&gt;List LXC snapshots with &lt;code&gt;lxc-snapshot -L -n &amp;lt;container name&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;update-database-tables&#34;&gt;Update database tables&lt;/h2&gt;

&lt;p&gt;On Devstack:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;paver update_db -s devstack&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In production:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ubuntu@edxprod:/edx/app/edxapp/edx-platform$ sudo -u www-data /edx/bin/python.edxapp ./manage.py lms migrate --settings=aws&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Configuration</title>
      <link>https://ianhowson.com/openedx/configuration/</link>
      <pubDate>Mon, 10 Nov 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/configuration/</guid>
      <description>

&lt;p&gt;After installation, there are a lot of settings that you&amp;rsquo;ll need to tweak to suit your situation. There&amp;rsquo;s a fast way to do this and a &amp;lsquo;correct&amp;rsquo; way to do this.&lt;/p&gt;

&lt;p&gt;Underneath, there&amp;rsquo;s one master config file (&lt;code&gt;server-vars.yml&lt;/code&gt;) which generates config files for each of the components.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a list of potential config variables here: &lt;a href=&#34;http://iambusychangingtheworld.blogspot.com.au/2014/05/edx-platform-server-varsyaml-variables.html&#34;&gt;http://iambusychangingtheworld.blogspot.com.au/2014/05/edx-platform-server-varsyaml-variables.html&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&#34;the-quick-way&#34;&gt;The quick way&lt;/h2&gt;

&lt;p&gt;Modify the config files in &lt;code&gt;/edx/app/edxapp/&lt;/code&gt; directly. &lt;code&gt;lms.env.json&lt;/code&gt; and &lt;code&gt;cms.env.json&lt;/code&gt; contain the most useful variables.&lt;/p&gt;

&lt;p&gt;The issue with this method is that your changes could be overwritten during an upgrade, so you&amp;rsquo;ll need to reapply them manually. The upside is that you can try things out relatively quickly, which is nice when you&amp;rsquo;re experimenting.&lt;/p&gt;

&lt;p&gt;After you modify a config file, you&amp;rsquo;ll need to restart the relevant service using &lt;code&gt;supervisorctl&lt;/code&gt;. Usually, this means:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;/edx/bin/supervisorctl restart edxapp:  # Run as root. Note the trailing colon.&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;the-right-way&#34;&gt;The right way&lt;/h2&gt;

&lt;p&gt;This method is more tolerant of server upgrades. Everything is stored in source control so it can be quickly deployed later.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create /edx/app/edx_ansible/server-vars.yml

&lt;ul&gt;
&lt;li&gt;But which script creates this file from source control? Is it from the &lt;code&gt;configuration&lt;/code&gt; repo?&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;ansible&lt;/code&gt; to generate the service config files&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;suggested-settings&#34;&gt;Suggested settings&lt;/h2&gt;

&lt;p&gt;You should almost certainly change the following settings:&lt;/p&gt;

&lt;p&gt;CODE_JAIL/python_bin: I make this something that isn&amp;rsquo;t real. I don&amp;rsquo;t run programming MOOCs and &lt;em&gt;do&lt;/em&gt; use LXC (which doesn&amp;rsquo;t have AppArmor support) and so want to hobble the sandbox as much as possible for security purposes.&lt;/p&gt;

&lt;p&gt;Various email addresses&lt;/p&gt;

&lt;p&gt;The Facebook address&lt;/p&gt;

&lt;p&gt;Twitter address&lt;/p&gt;

&lt;p&gt;SITE_NAME (most stuff works without it, but occasionally you&amp;rsquo;ll get broken links/IP addresses showing through)&lt;/p&gt;

&lt;p&gt;TIME_ZONE&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>BitTorrent Sync</title>
      <link>https://ianhowson.com/blog/bittorrent-sync/</link>
      <pubDate>Wed, 01 Oct 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/bittorrent-sync/</guid>
      <description>

&lt;!-- now &#39;resilio sync&#39; --&gt;

&lt;p&gt;Once upon a time, before Dropbox was a thing, I had a big desktop and a small laptop. I thought, wouldn&amp;rsquo;t it be wonderful if my files could be the same on both sides? And if it could do that automatically, without me asking. So I started writing a program called SyncDroid (no relation to the Android app) to do file-sync-over-LAN. I wrote &lt;a href=&#34;https://ianhowson.com/blog/file-synchronisation-algorithms/&#34;&gt;some blog posts&lt;/a&gt; to explain my thinking.&lt;/p&gt;

&lt;p&gt;Time passed, and I got busy with other things. I ceased having a consistent desk and became completely dependent on my laptop, which got much bigger to compensate for having to do &lt;em&gt;everything&lt;/em&gt; and at the same time only slightly physically heavier thanks to the wonders of technology.&lt;/p&gt;

&lt;p&gt;Time passed some more, and I find myself at the same desk for three days of the week, with a really nice desktop and another very nice but just-not-quite-as-beefy laptop. Dropbox is not a good option due to the proud tradition of Crap Australian Internet, and besides, security and cloud services do not mix. (Yes, I am aware of SpiderOak. No, I will not use it until I can audit it and compile it myself.)&lt;/p&gt;

&lt;p&gt;So BitTorrent Sync is a thing, which is basically what I dreamed of when I started SyncDroid. Zero-interaction LAN file sync between machines. No dependency on Internet services. Free. Sold.&lt;/p&gt;

&lt;p&gt;File sync is a really tricky problem. It cannot be fully and correctly solved without massively overhauling how applications deal with data. (The Cloud helps a lot. It is not the complete solution.) Therefore, I have some advice on how to make BitTorrent Sync work without too much pain or unexpected data loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you&amp;rsquo;re starting out&lt;/strong&gt; with say, your Documents folder, don&amp;rsquo;t try to sync two complete versions of the folder. You&amp;rsquo;ll end up with all files from both sides &lt;em&gt;on&lt;/em&gt; both sides, and/or a bunch of conflicts, where you just expected nothing to happen. Much better to delete all of one side and sync it across. It is slow, sadly, but you only have to do it once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BitTorrent Sync has a &amp;lsquo;relay service&amp;rsquo;&lt;/strong&gt;. If two machines are not on the same LAN but do have Internet access, they can talk through the relay service.&lt;/p&gt;

&lt;p&gt;I live in the Land Down Under with Slow Internet, so relaying through servers in the US is too slow to be useful. In each folder&amp;rsquo;s preferences (on &lt;em&gt;every single peer&lt;/em&gt;) you need to unclick &amp;lsquo;Use tracker server&amp;rsquo;, &amp;lsquo;Search DHT network&amp;rsquo; and &amp;lsquo;Use relay server when required&amp;rsquo;. (Try version 1.4.83 if the setting isn&amp;rsquo;t working.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Checksumming and transferring large files (like VM images) takes a long time&lt;/strong&gt;. There is still room for someone to make an efficient VM synchronisation system. It might be impossible to make it &amp;lsquo;nice&amp;rsquo;, but you could at least provide snapshots or something rather than leaving one side corrupted most of the time. Parallels might do this by accident, but I&amp;rsquo;m not willing to risk my data to find out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you&amp;rsquo;re setting up to start with&lt;/strong&gt;, I found it easiest to copy the Share URLs into a file in Dropbox and copy-paste them into BTS on the receiving end.&lt;/p&gt;

&lt;p&gt;DO NOT copy the keys into Dropbox if you worry about the NSA reading your data. Those keys don&amp;rsquo;t expire and give access to your data. Dropbox keeps snapshots of everything and the NSA works with Dropbox. Of course, we can&amp;rsquo;t audit BTS anyway, so probably best to keep government secrets locked away a little more securely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pay attention to the &amp;lsquo;Store deleted files in folder archive&amp;rsquo; setting&lt;/strong&gt;. Definitely turn it off for VM folders.&lt;/p&gt;

&lt;p&gt;Deleted or modified files go into an archive folder (&lt;code&gt;.sync/Archive&lt;/code&gt; under the synced folder root). I&amp;rsquo;ve seen references to a 30 day cleanup period, but am yet to confirm this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don&amp;rsquo;t use BTS to sync your Dropbox folder between machines&lt;/strong&gt; unless the Dropbox client is only running on one of them. They&amp;rsquo;ll confuse each other. Dropbox already does LAN sync.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rather than having many shares&lt;/strong&gt;, you can store the canonical copy of each folder in a synced folder and then symlink it to where you want it to appear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sync isn&amp;rsquo;t necessarily between two machines.&lt;/strong&gt; You can sync three or more machines to the same folder.&lt;/p&gt;

&lt;p&gt;This would be awesome if you had, say, office workers which need disconnected access to a shared folder. You can then disconnect a machine, keep the local copy, modify it and have your changes sync when you reconnect. This might cut down your need for corporate fileservers and VPNs&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you&amp;rsquo;re using a Mac, you might want to prevent BTS from syncing .FinderInfo and .ResourceFork files&lt;/strong&gt;. As of October 2014 (version 1.4.83) they fail to sync but BTS can&amp;rsquo;t figure out why, causing your folders to perpetually be out of sync. Add the following to the end of .sync/IgnoreList in your folder:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;*.FinderInfo
*.ResourceFork&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Hopefully you don&amp;rsquo;t actually need the resource fork for any of your files. Does OS X actually use the resource fork these days?&lt;/p&gt;

&lt;p&gt;I had to disconnect the folder from each peer and reconnect it. The peers remember all of the FinderInfo files that they&amp;rsquo;re meant to be ignoring. Disconnecting forces BTS to start over without the FinderInfo files. Sometimes you can just disconnect and reconnect one peer (usually the one that started with all of the data).&lt;/p&gt;

&lt;p&gt;This also highlights a nuisance in the system: configuration is not synced. You need to do this on every single machine. Did I mention that file sync is nontrivial?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;.SyncIgnore is not a thing any more&lt;/strong&gt;. It is now called &lt;code&gt;.sync/IgnoreList&lt;/code&gt;. The format and use of the file is the same, but &lt;code&gt;.SyncIgnore&lt;/code&gt; no longer works.&lt;/p&gt;

&lt;p&gt;This is a bit of shame, really, because (I never tested this, but&amp;hellip;) &lt;code&gt;.SyncIgnore&lt;/code&gt; could get synced automatically, saving you from manually making the same config changes on each host. Perhaps it caused flapping and conflicts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is useful logging&lt;/strong&gt; in &lt;code&gt;~/Library/Application Support/BitTorrent Sync/sync.log&lt;/code&gt;. You don&amp;rsquo;t need to turn on debug logging in the menu (it&amp;rsquo;s extremely verbose).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sleep mode might be affected on Macs?&lt;/strong&gt; My MacBook seems to run down the battery while it&amp;rsquo;s supposed to be sleeping. And it&amp;rsquo;s definitely awake some of the time &amp;ndash; sync continues while it&amp;rsquo;s asleep. Hopefully it doesn&amp;rsquo;t do this while it&amp;rsquo;s disconnected from the network (i.e. in my bag, away from home).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallels virtual machines&lt;/strong&gt; take forever to sync after modification and burn a lot of CPU power.&lt;/p&gt;

&lt;p&gt;Hopefully you realised this already, but &lt;em&gt;never boot the same VM image on two different machines at once&lt;/em&gt;. BTS will faithfully propagate the changes to the other machines, which will be unaware that their disk images are changing underneath them, and you&amp;rsquo;ll probably end up with corrupt, unusable VMs everywhere.&lt;/p&gt;

&lt;p&gt;I still can&amp;rsquo;t recommend BTS for synchronising virtual machines. Hashing a 60GB image just takes too long. To reduce the time, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split the VM image into 2GB chunks

&lt;ul&gt;
&lt;li&gt;These should take under 20 seconds to hash, though it&amp;rsquo;s still a long time for one file; many might change&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Take snapshots when the VM is stable (i.e. will not change too much more)

&lt;ul&gt;
&lt;li&gt;The snapshots will take a while to sync, but they shouldn&amp;rsquo;t change much afterwards. The diff-since-snapshot should be relatively small and easy to synchronise&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Put data files in the host filesystem (i.e. outside the VM) wherever possible

&lt;ul&gt;
&lt;li&gt;This is a good strategy if you use Time Machine or other backups, too. An entire VM image is difficult to back up efficiently for the same reason that it&amp;rsquo;s difficult to synchronise. Should the VM be corrupted, your data is still intact &lt;em&gt;if&lt;/em&gt; you kept it outside the VM. The data is also relatively small and easy to synchronise.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also see relatively slow sync speeds (3-6MB/sec, even though the network will easily do ten times that). That&amp;rsquo;s a different avenue that I should explore.&lt;/p&gt;

&lt;p&gt;I am tempted to write a VM-specific sync application that solves these problems, but it&amp;rsquo;s very likely that I can do no better anyway. If you have some spare time and want to try it, I suggest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When scanning for changes in a large image, skim across the whole at say, 1k or 4k intervals and just check a byte at a time. I&amp;rsquo;m hoping that this will let you compute a hash while still detecting the area that changes occur in fairly quickly. (This might not actually help, as your disk will still need to retrieve all of the data anyway; perhaps try larger intervals like 1M or 20M.)&lt;/li&gt;
&lt;li&gt;Transfer data faster. I don&amp;rsquo;t know why BTS is so slow for me. Perhaps it&amp;rsquo;s to keep the machine load low, or perhaps it&amp;rsquo;s a bug.&lt;/li&gt;
&lt;li&gt;Changes should be in-place as there&amp;rsquo;s usually not enough disk space (or time) to create a duplicate file and copy it atomically.&lt;/li&gt;
&lt;li&gt;Put in some application-specific knowledge, such as

&lt;ul&gt;
&lt;li&gt;taking advantage of Parallels&amp;rsquo; snapshots to transfer less data &lt;em&gt;and&lt;/em&gt; maintain correctness of data if the sync is not complete&lt;/li&gt;
&lt;li&gt;lock the VM image so it cannot be modified if it is inconsistent&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Underneath it all, most filesystems (a) do not track changes within a file, and (b) do not checksum files. If you were to put your VM images on say, a ZFS volume, changes to them could be synchronised &lt;em&gt;very quickly and efficiently&lt;/em&gt; (seconds, instead of hours) simply because the filesystem already keeps the hashes and diffs that are needed for the synchronisation app to do its job. Without that information the app must scan through the (extremely large) VM images to find the (relatively small) changes that it should propagate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;If you restore a peer from Time Machine&lt;/strong&gt;, things seem to go screwy. By &amp;lsquo;screwy&amp;rsquo;, I mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Files on &lt;em&gt;all peers&lt;/em&gt; reverted to the time of the Time Machine backup (very bad!)&lt;/li&gt;
&lt;li&gt;Failure to resync (FinderInfo issues resurface)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you&amp;rsquo;re going to restore a peer from Time Machine, I would suggest removing any synced folders from it altogether and resyncing them from the other peers.&lt;/p&gt;

&lt;h2 id=&#34;notes-on-specific-applications&#34;&gt;Notes on specific applications&lt;/h2&gt;

&lt;h3 id=&#34;office-for-mac-2011&#34;&gt;Office for Mac 2011&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Outlook for Mac 2011 stores its data files in Documents, so if you sync that, rename each machine&amp;rsquo;s identity so they don&amp;rsquo;t conflict. (You&amp;rsquo;ll get duplicate emails and error messages.)&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Shut down all Office applications&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Go to &lt;code&gt;Documents/Microsoft User Data/Office 2011 Identities&lt;/code&gt; and rename &amp;lsquo;Main Identity&amp;rsquo; (or whatever you use) to something else; I use &amp;lsquo;Main Identity &lt;computer name&gt;&amp;lsquo;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Open Microsoft Database Utility, click your renamed identity, click the gear icon and click &amp;lsquo;Set as Default&amp;rsquo;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Custom Theme</title>
      <link>https://ianhowson.com/openedx/custom-theme/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/custom-theme/</guid>
      <description>

&lt;h2 id=&#34;theming&#34;&gt;Theming&lt;/h2&gt;

&lt;p&gt;See here: &lt;a href=&#34;https://github.com/edx/edx-platform/wiki/Developing-on-the-edX-Developer-Stack#configuring-themes-in-devstack&#34;&gt;https://github.com/edx/edx-platform/wiki/Developing-on-the-edX-Developer-Stack#configuring-themes-in-devstack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you go to rename the .scss file, leave the leading underscore in place or you&amp;rsquo;ll get:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;error /edx/app/edxapp/themes/usbs/static/sass/example.scss (Line &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;47&lt;/span&gt;: Undefined variable: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;&lt;/span&gt;$sans&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;-serif&amp;#34;&lt;/span&gt;.)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;i.e. rename it to &lt;code&gt;_example.scss&lt;/code&gt;, not &lt;code&gt;example.scss&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I also had issues if the LMS was running at the same time I was trying to update (&lt;code&gt;Line 47: Undefined variable: &amp;quot;$sans-serif&amp;quot;.&lt;/code&gt;). Things worked much better if I shut it down first.&lt;/p&gt;

&lt;p&gt;To run the custom them, you need to start the LMS with&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;paver devstack lms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will recompile everything at startup. The easiest way I&amp;rsquo;ve found to do theme development is to just Ctrl-C the paver process and restart it when I change something.&lt;/p&gt;

&lt;p&gt;Then, to deploy the theme on your production server:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;TODO

/edx/bin/supervisorctl -c /edx/etc/supervisord.conf restart edxapp:
# Note the colon at the end!&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;deploying-themes&#34;&gt;Deploying themes&lt;/h2&gt;

&lt;p&gt;The default reference is at [&lt;a href=&#34;https://github.com/edx/edx-platform/wiki/Custom-Theming&#34;&gt;https://github.com/edx/edx-platform/wiki/Custom-Theming&lt;/a&gt;]. It mostly worked, but I had issues with the SSH key; /edx/app/edxapp/tmp_id_rsa was being zeroed.&lt;/p&gt;

&lt;p&gt;I could not find any reference to &lt;code&gt;EDXAPP_LOCAL_GIT_IDENTITY&lt;/code&gt; in the edX code. The relevant Ansible playbook uses the &lt;code&gt;content&lt;/code&gt; directive, so&amp;hellip; I guess we put the private key directly in &lt;code&gt;server-vars.yml&lt;/code&gt;. Don&amp;rsquo;t do this for a key with read-write access to anything!&lt;/p&gt;

&lt;p&gt;Here&amp;rsquo;s my working server-vars.yml that worked, sans key. Note the indentation before the contents of the key.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;edxapp_use_custom_theme: &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;true&lt;/span&gt;
edxapp_theme_name: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;themename&amp;#39;&lt;/span&gt;
edxapp_theme_source_repo: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;git@bitbucket.org:username/themename-edx-theme.git&amp;#39;&lt;/span&gt;
edxapp_theme_version: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;HEAD&amp;#39;&lt;/span&gt;
edxapp_git_identity: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;/edx/app/edxapp/tmp_id_rsa&amp;#39;&lt;/span&gt;
EDXAPP_GIT_IDENTITY: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;|
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;  -----BEGIN RSA PRIVATE KEY-----
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;  MII...
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;  ...
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;  ...Wg
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;  -----END RSA PRIVATE KEY-----&lt;/span&gt;
EDXAPP_USE_GIT_IDENTITY: &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Getting Started</title>
      <link>https://ianhowson.com/openedx/getting-started/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/getting-started/</guid>
      <description>&lt;p&gt;&lt;em&gt;I know nothing about edX and want an instance to start playing with right away. What&amp;rsquo;s the easiest thing to do?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Set up the &lt;a href=&#34;https://github.com/edx/configuration/wiki/Single-AWS-server-installation-using-Amazon-Machine-Image&#34;&gt;Amazon AMI&lt;/a&gt;. You can have an instance to work with inside an hour.&lt;/p&gt;

&lt;p&gt;Amazon&amp;rsquo;s hosting is pretty expensive, but you&amp;rsquo;ll be up and running fast.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;OK, so USD$1500/year for Amazon hosting is pretty insane. What do I do?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Install edX using the Ubuntu 12.04 instructions below. You have a few options when looking for a machine to run it on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find a VPS host. In the US, I&amp;rsquo;ve had great experiences with RamNode. I rent a VPS from them with similar specs to the Amazon AMI and (as of Sep 2014) it&amp;rsquo;s costing USD$120/year.&lt;/li&gt;
&lt;li&gt;In Australia, I&amp;rsquo;m happy with &lt;a href=&#34;http://www.ransomit.com.au/vps&#34;&gt;Ransom IT&lt;/a&gt;, though you either have to buy their largest services or negotiate something with the owner. That comes to $480/year.

&lt;ul&gt;
&lt;li&gt;You can trim down edX&amp;rsquo;s memory usage without too much trouble. This lets you fit into smaller and cheaper VPSes.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;You could also buy a really nice desktop computer for $500-$1000 and run edX directly off it. This assumes that you have a fast Internet connection and your admins don&amp;rsquo;t mind or don&amp;rsquo;t know that you have a public-facing server within your office.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(Obviously, these prices will change. USD$1500/year is what Australia-region Amazon hosting would cost us once bandwidth and storage are factored in. Both Amazon and VPS hosts are continually dropping their prices, so do your own research, slacker!)&lt;/p&gt;

&lt;p&gt;You can run the Production edX stack on your dev machine/laptop, but it uses 4GB of RAM and isn&amp;rsquo;t at all convenient for development. It runs tolerably with 2GB of RAM. You probably want the Developer Stack instead.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Hosting Options</title>
      <link>https://ianhowson.com/openedx/hosting-options/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/hosting-options/</guid>
      <description>

&lt;p&gt;Assuming that you&amp;rsquo;re setting up a small instance (less than 1000 users), the most difficult hosting requirement is enough RAM. Open edX is composed of many software components, all of which use RAM even when they&amp;rsquo;re idle.&lt;/p&gt;

&lt;!--
add to edx post: here is an example of a similar very large site and the hardware they need http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow/
in short, a frighteningly large site (#237 in the world) runs on only a few machines. I expect that edX is significantly less effienct, but still, you can serve a lot of people on not much hardware. You need to have a really large site before you need serious AWS-style scaling.
--&gt;

&lt;p&gt;Put simply, you need 4GB of RAM. More won&amp;rsquo;t hurt. You can get away with less, but you need to fiddle around; see my post on &lt;a href=&#34;https://ianhowson.com/openedx/reducing-memory-consumption/&#34;&gt;reducing edX memory consumption&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;For a single machine instance, you&amp;rsquo;ve got a few options for hosting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Web Services&lt;/li&gt;
&lt;li&gt;A VPS&lt;/li&gt;
&lt;li&gt;Your own hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;amazon-web-services-aws&#34;&gt;Amazon Web Services (AWS)&lt;/h2&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extremely easy to set up (minutes)&lt;/li&gt;
&lt;li&gt;Scales up as much as you like

&lt;ul&gt;
&lt;li&gt;Amazon&amp;rsquo;s tools make scaling extremely easy&lt;/li&gt;
&lt;li&gt;Open edX is designed to run on AWS and automatically scale, so you&amp;rsquo;ll save a lot of effort if your instance is big enough&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;ve got a large/busy setup you might end up saving money by scaling up/down with demand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expensive

&lt;ul&gt;
&lt;li&gt;Amazon will bill you for instance usage, disk space, I/O and network usage separately.&lt;/li&gt;
&lt;li&gt;In Sep 2014 the instance costs $0.098/hour (in Sydney), but my actual running costs for an &lt;em&gt;empty&lt;/em&gt; edX setup were around $120/month due to the extras.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Low performance disk and CPU

&lt;ul&gt;
&lt;li&gt;You have to worry about scaling issues sooner&lt;/li&gt;
&lt;li&gt;You have to buy more hardware to compensate for your low-performance hardware&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Amazon doesn&amp;rsquo;t make any guarantees about uptime. If you only have one machine and it goes down, your whole edX instance will be down until you start up a new machine.

&lt;ul&gt;
&lt;li&gt;Amazon machines used to be quite unreliable and go down every few weeks. In the last year or two they&amp;rsquo;re much better and will usually last for months at a time.&lt;/li&gt;
&lt;li&gt;Amazon&amp;rsquo;s storage infrastructure (EBS) means that if a machine goes down, you can just start a new one; your data should remain intact.&lt;/li&gt;
&lt;li&gt;Starting up a new machine is pretty easy, but you&amp;rsquo;ll be out for a few minutes, assuming you find out about the outage quickly.&lt;/li&gt;
&lt;li&gt;Amazon encourage you to design your service to tolerate machine outages.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;moving-an-ami-to-your-region&#34;&gt;Moving an AMI to your region&lt;/h3&gt;

&lt;p&gt;The Open edX AMIs are only available in a few regions. I wanted one in Sydney. To transfer the AMI to ap-southeast-2 (Sydney):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Switch to the region that you&amp;rsquo;re going to copy the AMI from (e.g. us-east-1)&lt;/li&gt;
&lt;li&gt;Ideally, you&amp;rsquo;d find it in the directory, go to Actions, hit Copy AMI and choose Sydney. For whatever reason, Copy AMI is disabled, so&amp;hellip;&lt;/li&gt;
&lt;li&gt;Create a new Micro instance in us-east-1 using the openedx AMI&lt;/li&gt;
&lt;li&gt;Boot the instance&lt;/li&gt;
&lt;li&gt;In the instance&amp;rsquo;s Actions menu, click Create Image&lt;/li&gt;
&lt;li&gt;Wait a few minutes. Don&amp;rsquo;t shut down the instance yet.&lt;/li&gt;
&lt;li&gt;In the AMIs page, the new AMI will list as &amp;lsquo;pending&amp;rsquo;. Wait until it&amp;rsquo;s &amp;lsquo;available&amp;rsquo;.&lt;/li&gt;
&lt;li&gt;You can now shut down the new instance&lt;/li&gt;
&lt;li&gt;In the AMIs page, &lt;em&gt;now&lt;/em&gt; you can go to Actions-&amp;gt;Copy AMI to Sydney.&lt;/li&gt;
&lt;li&gt;Wait about half an hour&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;ebs-vs-instance-storage&#34;&gt;EBS vs. Instance Storage&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;EBS is unlimited, highly reliable storage. It lasts forever.&lt;/li&gt;
&lt;li&gt;Instance Store is temporary, high-performance storage. Physically, it&amp;rsquo;s disks on the same VM server that your instance is running on. Once you shut down the instance, anything that you put in here is lost.

&lt;ul&gt;
&lt;li&gt;Obvious uses for this are &lt;code&gt;/tmp&lt;/code&gt; and swap&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;a-virtual-private-server-vps&#34;&gt;A Virtual Private Server (VPS)&lt;/h2&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Affordable

&lt;ul&gt;
&lt;li&gt;Quality hosting in the US is about USD$120/year.&lt;/li&gt;
&lt;li&gt;In Australia, AUD$480/year.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Per-machine, performance is higher than AWS (assuming you&amp;rsquo;ve chosen a reputable VPS host that doesn&amp;rsquo;t overprovision their machines). Less machines means less maintenance.&lt;/li&gt;
&lt;li&gt;You might get an uptime guarantee

&lt;ul&gt;
&lt;li&gt;This is probably not worth anything; your compensation if a machine fails will be an email apology, if anything&lt;/li&gt;
&lt;li&gt;You can check reviews and uptime reports to see if your host has a good record&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;You could rent a managed VPS and thereby outsource some of the administration hassle

&lt;ul&gt;
&lt;li&gt;Administering a Linux VPS is trivially easy compared with Open edX, so I wouldn&amp;rsquo;t bother spending the money&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scaling up/down with demand is your problem.

&lt;ul&gt;
&lt;li&gt;For a small enough instance (a few thousand users), you&amp;rsquo;ll fit on one machine, so this isn&amp;rsquo;t an issue&lt;/li&gt;
&lt;li&gt;You could build your own &lt;a href=&#34;http://www.openstack.org/&#34;&gt;OpenStack&lt;/a&gt; or &lt;a href=&#34;http://cloudstack.apache.org/&#34;&gt;CloudStack&lt;/a&gt; cluster, but this involves significant engineering effort; AWS probably works out cheaper once engineering cost is considered.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;openvz-or-kvm&#34;&gt;OpenVZ or KVM?&lt;/h3&gt;

&lt;p&gt;Roughly, OpenVZ isolates multiple users of the same hardware and kernel from each other. KVM is a virtual machine, so it will behave more like a real computer.&lt;/p&gt;

&lt;p&gt;Both have some advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenVZ can change RAM, disk and CPU allocations instantly

&lt;ul&gt;
&lt;li&gt;There&amp;rsquo;s no rebooting or repartitioning if you need to scale up/down&lt;/li&gt;
&lt;li&gt;This is nice if you&amp;rsquo;re not sure how much hardware you need or you get unexpected traffic&lt;/li&gt;
&lt;li&gt;With KVM, you need to reboot and resize the partitions by hand; downtime could be significant&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;KVM can use &lt;code&gt;swapspace&lt;/code&gt; (file-based swap) or &lt;code&gt;zram&lt;/code&gt; (compressed RAM-backed swap) to squeeze more RAM out of the same hardware

&lt;ul&gt;
&lt;li&gt;OpenVZ usually blocks you from changing the swap config; you will need to pay for more RAM if you run out&lt;/li&gt;
&lt;li&gt;If you run out of RAM, the OOM killer will kill a process. Usually nothing bad will happen, but occasionally it&amp;rsquo;ll hit something important, like Postgres.&lt;/li&gt;
&lt;li&gt;You can&amp;rsquo;t load kernel modules in OpenVZ, so you can&amp;rsquo;t use &lt;code&gt;zram&lt;/code&gt; unless the host explicitly supports it (unlikely)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;KVM can use LXC or OpenVZ containers within your VM

&lt;ul&gt;
&lt;li&gt;You can nest virtual machines inside your virtual machines. Confused yet?&lt;/li&gt;
&lt;li&gt;This is really handy for running production and staging systems on the same (virtual) hardware without paying for a second VPS&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My preference is KVM, but if I had more users, I&amp;rsquo;d be leaning toward OpenVZ or Amazon.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s some excellent discussion of different virtualisation options &lt;a href=&#34;http://www.reddit.com/r/webhosting/comments/1pf6t3/psa_know_what_vps_youre_buying/&#34;&gt;over here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;your-own-hardware&#34;&gt;Your own hardware&lt;/h2&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cheap! A $400 desktop is plenty to get started and there are no ongoing rental fees.&lt;/li&gt;
&lt;li&gt;Fast&lt;/li&gt;
&lt;li&gt;You might have suitable hardware lying around already&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need a really fast Internet connection&lt;/li&gt;
&lt;li&gt;Any hardware failures are your problem&lt;/li&gt;
&lt;li&gt;You need to know how to build and maintain the hardware&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Monitoring</title>
      <link>https://ianhowson.com/openedx/monitoring/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/monitoring/</guid>
      <description>

&lt;h2 id=&#34;setting-up-sentry-crash-reporting&#34;&gt;Setting up Sentry crash reporting&lt;/h2&gt;

&lt;p&gt;If you want to provide a reliable service, it&amp;rsquo;s extremely important to be aware of when things are going wrong on the website.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://getsentry.com&#34;&gt;Sentry&lt;/a&gt; is a wonderful free system to catch Python exceptions.&lt;/p&gt;

&lt;p&gt;For Django, we use &lt;a href=&#34;http://raven.readthedocs.org/en/latest/index.html&#34;&gt;raven&lt;/a&gt; to catch and report the errors back to Sentry.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Set up a Sentry server&lt;/p&gt;

&lt;p&gt;This is left as an exercise to the reader. Commercial hosting is available if you don&amp;rsquo;t want to administer another service (or &lt;a href=&#34;mailto:ian@mutexlabs.com&#34;&gt;email me&lt;/a&gt; and I&amp;rsquo;ll do it for a fixed fee).&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Set up two new services (LMS and Studio) in Sentry and get their DSN strings&lt;/p&gt;

&lt;p&gt;The DSN strings are inside your Sentry instance under the Settings-&amp;gt;Python-&amp;gt;Django page.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Modify &lt;code&gt;/edx/app/edxapp/edx-platform/lms/envs/common.py&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Add to the end of the file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# Sentry integration
INSTALLED_APPS += (&#39;raven.contrib.django.raven_compat&#39;,)

RAVEN_CONFIG = {
    &#39;dsn&#39;: &#39;&amp;lt;your-DSN-string&amp;gt;&#39;,
}
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Install the &lt;code&gt;raven&lt;/code&gt; module&lt;/p&gt;

&lt;p&gt;Ensure that you&amp;rsquo;re using the edxapp virtualenv. One easy way to do this is by typing &lt;code&gt;which python&lt;/code&gt;. If you get back &lt;code&gt;/usr/bin/python&lt;/code&gt;, you&amp;rsquo;re NOT in the virtualenv. If you get back &lt;code&gt;/edx/app/edxapp/venvs/edxapp/bin/python&lt;/code&gt;, you are.&lt;/p&gt;

&lt;p&gt;Then,&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install raven
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you&amp;rsquo;re using an &lt;code&gt;edx-platform&lt;/code&gt; fork, you might want to add &lt;code&gt;raven&lt;/code&gt; to &lt;code&gt;edx-platform/requirements/edx/base.txt&lt;/code&gt; so it gets installed automatically (e.g. when you bring up Devstack).&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Repeat for Studio&lt;/p&gt;

&lt;p&gt;This time, modify &lt;code&gt;/edx/app/edxapp/edx-platform/cms/envs/common.py&lt;/code&gt; and use the DSN string for Studio.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restart everything&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/edx/bin/supervisorctl restart edxapp:
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Test&lt;/p&gt;

&lt;p&gt;Hopefully, your Open edX instance doesn&amp;rsquo;t regularly give 500 errors. If you want to verify that things are working, we need to induce some.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;TODO: describe how to modify some edx-platform code to break.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&#34;server-uptime-monitoring&#34;&gt;Server/uptime monitoring&lt;/h2&gt;

&lt;p&gt;I strongly recommend setting up something for server monitoring; it will alert you when the server goes down, and it&amp;rsquo;ll warn you if you&amp;rsquo;re running out of memory.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m using Observium, primarily because it has a TurnKey Linux image and modern web interface.&lt;/p&gt;

&lt;p&gt;Go to RamNode, set up their cheapest VM (right now, $5/quarter) and load the Observium image.&lt;/p&gt;

&lt;p&gt;Cheap, reliable and will help you sleep at night.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Setting Up Devstack</title>
      <link>https://ianhowson.com/openedx/setting-up-devstack/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/setting-up-devstack/</guid>
      <description>

&lt;p&gt;Development is much easier using Devstack instead of modifying a production instance. Significantly, it&amp;rsquo;s configured to only use 2GB of RAM, which makes it fit much better on your dev machine.&lt;/p&gt;

&lt;p&gt;The instructions on this are reasonable, but they&amp;rsquo;re scattered around a bit, so:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download the base VM&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There&amp;rsquo;s no resuming on the download, so I used the torrent (4GB is nontrivial in Australia).&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Set up VirtualBox
Get the exact version mentioned on the edX wiki. Newer ones will cause headaches. I&amp;rsquo;ve had success with 4.3.12 and 4.3.20.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&amp;ldquo;mount.nfs: requested NFS version or transport protocol is not supported&amp;rdquo;&lt;/p&gt;

&lt;p&gt;I never figured out exactly what went wrong here. The &lt;code&gt;nfsd&lt;/code&gt; port was being held open by something, but &lt;code&gt;lsof&lt;/code&gt; would not list a process. A reboot fixed it.&lt;/p&gt;

&lt;p&gt;Also, if you are prompted for a password, it means the admin/root password on the host machine, not the VM. This is not entirely clear.&lt;/p&gt;

&lt;p&gt;You can log in to Devstack with&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;vagrant ssh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You will need a particular user/virtualenv to do anything, so almost always do&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo su edxapp&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;on the VM after you ssh in.&lt;/p&gt;

&lt;p&gt;Devstack doesn&amp;rsquo;t start the services automatically. To start the LMS, for instance, run the following on the VM:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;cd&lt;/span&gt; /edx/app/edxapp/edx-platform
paver devstack lms&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will take a little while the first time around.&lt;/p&gt;

&lt;p&gt;You can then access the LMS at &lt;a href=&#34;http://localhost:9000&#34;&gt;http://localhost:9000&lt;/a&gt; (on the host).&lt;/p&gt;

&lt;h2 id=&#34;mongodb-won-t-start&#34;&gt;MongoDB won&amp;rsquo;t start&lt;/h2&gt;

&lt;p&gt;I see this error a lot:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;pymongo.errors.ConnectionFailure: could not connect to localhost:27017: [Errno &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;111&lt;/span&gt;] Connection refused&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If this is the first time installing/running Devstack, running &lt;code&gt;vagrant provision&lt;/code&gt; again is probably the right thing to do. It reinstalls everything, so don&amp;rsquo;t use it after you&amp;rsquo;ve made changes.&lt;/p&gt;

&lt;p&gt;Usually the error is caused by MongoDB shutting down unexpectedly. I put the following into a shell script and run it whenever I see the error:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo rm /edx/var/mongo/mongodb/mongod.lock
sudo -u mongodb mongod --dbpath /edx/var/mongo/mongodb --repair --repairpath /edx/var/mongo/mongodb
sudo start mongodb&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If that doesn&amp;rsquo;t fix it for you, there is more information on repairing the Mongo database at &lt;a href=&#34;http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/&#34;&gt;http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;more-errors&#34;&gt;More errors&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ImportError at /
No module named exceptions&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Per &lt;a href=&#34;https://groups.google.com/forum/#!topic/openedx-ops/bk4dvZRH1dk:&#34;&gt;https://groups.google.com/forum/#!topic/openedx-ops/bk4dvZRH1dk:&lt;/a&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;pip uninstall edx-analytics-api-client
pip install -e git+https://github.com/edx/edx-analytics-data-api-client.git@0.1.0#egg=edx-analytics-data-api-client&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;unsorted-notes&#34;&gt;Unsorted notes&lt;/h2&gt;

&lt;p&gt;This uses VirtualBox, which is really a lowest-common-denominator type of decision. My Mac freezes hard if I start up another VM (either Parallels or HAXM for Android), which is somewhat inconvenient, because all of my VMs are in Parallels and Android dev is very slow without HAXM.&lt;/p&gt;

&lt;p&gt;There are hooks in the Vagrantfile to use VMware Fusion, but I haven&amp;rsquo;t tried it.&lt;/p&gt;

&lt;p&gt;There is a Parallels target for Vagrant, but I haven&amp;rsquo;t tried it.&lt;/p&gt;

&lt;p&gt;The Developer Stack is configured with 2GB of RAM by default, but you can reduce it to 1GB through the VirtualBox GUI. Performance at 1GB is fine. This helps a lot if your dev machine only has 4GB of RAM.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Using LXC Containers</title>
      <link>https://ianhowson.com/openedx/using-lxc-containers/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/using-lxc-containers/</guid>
      <description>

&lt;p&gt;&lt;strong&gt;This page is extremely rough. It&amp;rsquo;s just my notes with very little editing or checking. Be warned!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of running Open edX directly on your VPS, consider running it within an LXC container.&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run production and staging systems on the same hardware

&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s usually cheaper to rent one big VPS instead of many small ones&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Run other services on the same hardware. Open edX requires specific versions of everything which will conflict with your other services.&lt;/li&gt;
&lt;li&gt;Test security patches or custom code on a production-like system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some increase in complexity&lt;/li&gt;
&lt;li&gt;You have more machines to deploy security updates onto

&lt;ul&gt;
&lt;li&gt;You could automatically deploy updates&lt;/li&gt;
&lt;li&gt;Better yet, you could use your staging container to test the updates before you deploy them onto the production container&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;A trivial decrease in performance&lt;/li&gt;
&lt;li&gt;A very small increase in disk space usage

&lt;ul&gt;
&lt;li&gt;The LXC host needs its own copy of the Ubuntu base packages&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;You need enough RAM to run both systems simultaneously

&lt;ul&gt;
&lt;li&gt;A VM would use double the RAM, as you&amp;rsquo;d make a fixed allocation. With LXC, you&amp;rsquo;re just running two copies of the applications under the same kernel, so you&amp;rsquo;ll use the space much more efficiently.&lt;/li&gt;
&lt;li&gt;Therefore, swap is shared. If you have applications which aren&amp;rsquo;t use often &amp;ndash; like for a staging site &amp;ndash; they might get swapped out and thus not impact your production site too much.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Open edX uses AppArmor for a number of security features, and it&amp;rsquo;s not clear to me that that works within an LXC container

&lt;ul&gt;
&lt;li&gt;LXC is probably not a good idea if you&amp;rsquo;re running untrusted code (e.g. for a programming MOOC).&lt;/li&gt;
&lt;li&gt;LXC containers, right now, do not provide strong protection of the host against malicious clients (such as your students).&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;what-is-lxc&#34;&gt;What is LXC?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://linuxcontainers.org/&#34;&gt;LXC&lt;/a&gt; is containers for the Linux kernel. What that means is that you can run multiple Linux userspace instances under one kernel. They&amp;rsquo;re isolated but can still share resources efficiently.&lt;/p&gt;

&lt;p&gt;As LXC isn&amp;rsquo;t full machine virtualisation (like VMware, Parallels or VirtualBox), you can run it underneath another VM instance, such as one rented from a VPS host.&lt;/p&gt;

&lt;h2 id=&#34;how-do-i-set-up-the-lxc-host&#34;&gt;How do I set up the LXC host?&lt;/h2&gt;

&lt;p&gt;LXC features are built into most modern kernels. The tools are available back to Ubuntu 12.04 LTS (and probably further). I&amp;rsquo;ve had significantly better results under Ubuntu 14.04 than 12.04.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re running Ubuntu, just install the &lt;code&gt;lxc&lt;/code&gt; package.&lt;/p&gt;

&lt;h2 id=&#34;how-do-i-set-up-lxc-guests&#34;&gt;How do I set up LXC guests?&lt;/h2&gt;

&lt;p&gt;If you want to set up (say) an Ubuntu guest, you would run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;lxc-create -n &amp;lt;container name&amp;gt; -t ubuntu&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&amp;lt;container name&amp;gt; can be anything; I use &amp;lsquo;edxstaging&amp;rsquo; and &amp;lsquo;edxprod&amp;rsquo;.&lt;/p&gt;

&lt;p&gt;The files for the guest will appear under &lt;code&gt;/var/lib/lxc/&amp;lt;container name&amp;gt;/rootfs&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The first time you set up a particular template/release combination, packages will probably be downloaded. Be prepared to wait a little. Subsequent creations of the same template/release should be very fast.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s useful to be able to specify exactly which release of Ubuntu will be installed. For edX, you probably want the 12.04 AMD64 release, so run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;lxc-create -n &amp;lt;container name&amp;gt; -t ubuntu -- -r precise -a amd64&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;(I don&amp;rsquo;t know what happens if you try to run amd64 on a 32-bit host; it probably won&amp;rsquo;t work.)&lt;/p&gt;

&lt;p&gt;Start the guest with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;lxc-start -d -n &amp;lt;container name&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;-d&lt;/code&gt; starts the container in the background. If you leave this off, the container will run in your terminal like a program and it&amp;rsquo;ll die when you close the terminal (unless you were already in a &lt;code&gt;tmux&lt;/code&gt; session).&lt;/p&gt;

&lt;p&gt;You can then get a console on the container with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;lxc-console -n &amp;lt;container name&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For the &lt;code&gt;ubuntu&lt;/code&gt; template, you can log in at the console with username &lt;code&gt;ubuntu&lt;/code&gt; and password &lt;code&gt;ubuntu&lt;/code&gt;. You can then set up Open edX as you would a normal VM, per the &lt;a href=&#34;https://ianhowson.com/2014-09-25-open-edx-deployment-checklist.html&#34;&gt;deployment checklist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Later on, you might like to stop the guest with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;lxc-stop -n &amp;lt;container name&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;or delete the guest with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;lxc-destroy -n &amp;lt;container name&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;let-the-guest-access-the-network&#34;&gt;Let the guest access the network&lt;/h2&gt;

&lt;p&gt;Depending on your release and configuration, the guest might not be able to access the network by default. Assuming that you&amp;rsquo;re running Ubuntu 12.04, you&amp;rsquo;ll need to set up an IP address on the guest and then use &lt;code&gt;iptables&lt;/code&gt; to share the host&amp;rsquo;s network with the guest.&lt;/p&gt;

&lt;p&gt;On the guest, as root, edit &lt;code&gt;/etc/network/interfaces&lt;/code&gt; to look like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
    address &amp;lt;guest IP address&amp;gt;
    netmask 255.255.255.0
    network 10.0.3.0
    gateway 10.0.3.1
    dns-nameservers 8.8.8.8 8.8.4.4&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Change &amp;lt;guest IP address&amp;gt; to something unique for that guest. The default uses 10.0.3.1 for the host, so use something like 10.0.3.100 for the guest. Each guest must have its own IP, obviously.&lt;/p&gt;

&lt;p&gt;On the guest, run &lt;code&gt;/etc/init.d/networking restart&lt;/code&gt; to apply these changes. You should then be able to access the Internet through the guest.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I&amp;rsquo;m not sure why the default config uses a 10.x.x.x IP but only a /24 subnet; doesn&amp;rsquo;t hurt anything, though.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On the host, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;/sbin/iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
/sbin/iptables -I FORWARD &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;1&lt;/span&gt; -i eth0 -o lxcbr0 -m state --state RELATED,ESTABLISHED -j ACCEPT
/sbin/iptables -I FORWARD &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;1&lt;/span&gt; -i lxcbr0 -o eth0 -j ACCEPT&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Check that the updated &lt;code&gt;iptables&lt;/code&gt; rules make sense with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;/sbin/iptables -L -v&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To commit the rules for the next boot:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;iptables-save &amp;gt; /etc/iptables.up-rules&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;let-the-internet-see-your-edx-instance&#34;&gt;Let the Internet see your edX instance&lt;/h2&gt;

&lt;p&gt;You probably want to make the edX instances accessible to the Internet. If you want all of them (or other websites) accessible on port 80 but with different hostnames, use &lt;code&gt;nginx&lt;/code&gt; on the host per &lt;a href=&#34;http://nginx.org/en/docs/beginners_guide.html#proxy&#34;&gt;http://nginx.org/en/docs/beginners_guide.html#proxy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You&amp;rsquo;ll end up with stacked &lt;code&gt;nginx&lt;/code&gt; proxies (the host &lt;code&gt;nginx&lt;/code&gt; talks to the guest &lt;code&gt;nginx&lt;/code&gt;, which talks to the application server) but this isn&amp;rsquo;t a big deal.&lt;/p&gt;

&lt;p&gt;On Ubuntu-like hosts, you can just stick a file in &lt;code&gt;/etc/nginx/sites-enabled/edx&lt;/code&gt; that looks like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-nginx&#34; data-lang=&#34;nginx&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;server&lt;/span&gt; {
    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;server_name&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;edx.example.com&lt;/span&gt;;

    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;access_log&lt;/span&gt; on;

    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;location&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;/&lt;/span&gt; {
      &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_pass&lt;/span&gt;         &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;http://10.0.3.101:80&lt;/span&gt;;
      &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_redirect&lt;/span&gt;     &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;default&lt;/span&gt;;

      &lt;span style=&#34;color:#007f7f&#34;&gt;# These fix the headers for the guest&amp;#39;s server. Without these, you&amp;#39;ll get broken redirects and less useful logging.
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;      &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Real-IP&lt;/span&gt;  $remote_addr;
      &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-For&lt;/span&gt; $remote_addr;
      &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt;   &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;Host&lt;/span&gt; $host;
      &lt;span style=&#34;color:#007f7f&#34;&gt;#proxy_set_header   X-Forwarded-Proto $scheme;
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;At this point, the guest will log the IP address of the LXC host instead of the actual IP that requested the page. You can fix this by modifying the nginx config on the guest. For the LMS, edit &lt;code&gt;/edx/app/nginx/sites-available/lms&lt;/code&gt;. Where it says:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-nginx&#34; data-lang=&#34;nginx&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;location&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;@proxy_to_lms_app&lt;/span&gt; {
    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-Proto&lt;/span&gt; $scheme;
    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-Port&lt;/span&gt; $server_port;
    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-For&lt;/span&gt; $remote_addr;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;modify the X-Forwarded-For line like so:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-nginx&#34; data-lang=&#34;nginx&#34;&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;# forward the correct IP from our upstream nginx
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;proxy_set_header&lt;/span&gt; &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;X-Forwarded-For&lt;/span&gt; $http_X_Real_IP;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;starting-lxc-containers-automatically&#34;&gt;Starting LXC containers automatically&lt;/h2&gt;

&lt;p&gt;You probably want your edX LXC containers to start automatically when you boot the machine.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s some conflicting information about how to do this. The way I&amp;rsquo;m doing it (on Ubuntu 14.04 LTS) is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Edit &lt;code&gt;/var/lib/lxc/&amp;lt;container name&amp;gt;/config&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;lxc.start.auto = 1&lt;/code&gt; somewhere&lt;/li&gt;
&lt;li&gt;Verify that it has taken effect with &lt;code&gt;lxc-ls -f&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Also check that &lt;code&gt;/etc/default/lxc&lt;/code&gt; has &lt;code&gt;LXC_AUTO=&amp;quot;true&amp;quot;&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&#34;lxc-commands-very-slow-or-lxc-console-takes-minutes-to-return&#34;&gt;&lt;code&gt;lxc-*&lt;/code&gt; commands very slow, or &lt;code&gt;lxc-console&lt;/code&gt; takes minutes to return&lt;/h2&gt;

&lt;p&gt;Your guest network probably isn&amp;rsquo;t configured correctly.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;em&gt;Unsorted below this line. Beware!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You might also want to open up the SSH ports (depending on how paranoid you are about security). You can use &lt;code&gt;iptables&lt;/code&gt; again to forward ports on the host to the LXC guests:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;# TODO: this doesn&amp;#39;t work yet
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;/sbin/iptables -t nat -A PREROUTING -p tcp --dport &amp;lt;new ssh port&amp;gt; -j DNAT --to-destination &amp;lt;guest ip&amp;gt;:22
&lt;span style=&#34;color:#007f7f&#34;&gt;# e.g. /sbin/iptables -t nat -I PREROUTING 1 -p tcp --dport 2222 -j DNAT --to-destination 10.0.3.102:22
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;/sbin/iptables -I INPUT &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;1&lt;/span&gt; -p tcp -m state --state NEW -m tcp --dport &amp;lt;new ssh port&amp;gt; -j ACCEPT&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then, you&amp;rsquo;d access SSH on the guest with a command line like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssh -p &amp;lt;port number&amp;gt; &amp;lt;hostname&amp;gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Deployment Checklist</title>
      <link>https://ianhowson.com/openedx/deployment-checklist/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/deployment-checklist/</guid>
      <description>

&lt;p&gt;&lt;strong&gt;This page is &lt;em&gt;extremely rough&lt;/em&gt;. It&amp;rsquo;s just my notes with very little editing or checking. Be warned!&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&#34;deploying-a-production-service&#34;&gt;Deploying a production service&lt;/h2&gt;

&lt;p&gt;You probably want to use the &lt;code&gt;release&lt;/code&gt; branch of edx-platform. This is slightly more stable that the &lt;code&gt;main&lt;/code&gt; branch; apparently it is what runs on edx.org.&lt;/p&gt;

&lt;p&gt;Before updating your production server, it&amp;rsquo;s probably a good idea to run any updates against a staging server just to make sure things are sane. (A batch of unit testing wouldn&amp;rsquo;t hurt, either.)&lt;/p&gt;

&lt;h3 id=&#34;initial-user-and-security-setup&#34;&gt;Initial user and security setup&lt;/h3&gt;

&lt;p&gt;If you&amp;rsquo;re using a machine that is directly exposed to the Internet, the first thing to do is get basic account and network security in place. You can skip this if you&amp;rsquo;re in an LXC container on an isolated network.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new user account, if the default install has you log in as &lt;code&gt;root&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Fix &lt;code&gt;/etc/sudoers&lt;/code&gt; so that the new account can &lt;code&gt;sudo&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Copy your SSH public key to the new account&amp;rsquo;s &lt;code&gt;~/.ssh/authorized_keys&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Configure SSH: change the default port (&lt;code&gt;Port &amp;lt;new port number&amp;gt;&lt;/code&gt;), only allow the new user account (&lt;code&gt;AllowUser &amp;lt;username&amp;gt;&lt;/code&gt;) and disable password login (&lt;code&gt;PasswordAuthentication no&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;For some reason, &lt;code&gt;service ssh restart&lt;/code&gt; or &lt;code&gt;service ssh reload&lt;/code&gt; don&amp;rsquo;t actually do anything, so inside a &lt;code&gt;tmux&lt;/code&gt; session I do &lt;code&gt;/etc/init.d/ssh stop ; killall sshd ; /etc/init.d/ssh start&lt;/code&gt;. Obviously, this will kick you out of the SSH session, which is why we do it inside &lt;code&gt;tmux&lt;/code&gt;. If you mistype this or have done something wrong in the config, you will be locked out. Be warned. (This is also why we do this config right at the start, so we can nuke from orbit if necessary.)&lt;/li&gt;
&lt;li&gt;For paranoia&amp;rsquo;s sake, I set up the &lt;code&gt;ufw&lt;/code&gt; firewall. Yes, it&amp;rsquo;s fiddly and annoying. But remember, this is a public-facing web service. Randoms will poke and prod it. You probably want (as root):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ufw allow &amp;lt;ssh port&amp;gt;/tcp  &lt;span style=&#34;color:#007f7f&#34;&gt;# permit SSH
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;ufw allow &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;80&lt;/span&gt;/tcp          &lt;span style=&#34;color:#007f7f&#34;&gt;# permit edX LMS
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;ufw allow &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;18010&lt;/span&gt;/tcp       &lt;span style=&#34;color:#007f7f&#34;&gt;# permit edX Studio
&lt;/span&gt;&lt;span style=&#34;color:#007f7f&#34;&gt;&lt;/span&gt;ufw default deny          # drop anything &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;else&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again, setting these rules might lock you out of the system. Be careful.&lt;/p&gt;

&lt;p&gt;Side note 1: I don&amp;rsquo;t believe that firewalls actually achieve much in reality, but it&amp;rsquo;s cheap insurance. Notably, if you forget to make a service internal-only and accidentally bind it to a public IP, the firewall will still protect you.&lt;/p&gt;

&lt;p&gt;Side note 2: The edX codebase is huge and undoubtedly contains security problems. The firewall will not protect you against these. You will need to stay up-to-date with security alerts and patch your edX instance regularly.&lt;/p&gt;

&lt;h3 id=&#34;set-up-the-host-machine&#34;&gt;Set up the host machine&lt;/h3&gt;

&lt;p&gt;There are a few tweaks that I like to make to all new Ubuntu machines.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Install the following packages on all machines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;wget&lt;/code&gt;: downloads files through the command line. Needed for edX installation and not always installed by default.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;aptitude&lt;/code&gt;: like &lt;code&gt;apt-get&lt;/code&gt;, but better&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tmux&lt;/code&gt;: detach and resume terminal sessions&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Install the following packages on anything that isn&amp;rsquo;t an LXC or OpenVZ guest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;swapspace&lt;/code&gt;: automatically scaling swap files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;zram-config&lt;/code&gt;: automatically compresses memory (like swap)&lt;/li&gt;
&lt;li&gt;You can see how your swap is allocated with &lt;code&gt;cat /proc/swaps&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Obviously, it&amp;rsquo;s best for performance if you don&amp;rsquo;t need swap at all, but running out of memory and invoking the OOM killer can be dangerous. You don&amp;rsquo;t control which process is killed (usually, it&amp;rsquo;s the largest one). This works for a while but eventually something important (like a database) will be killed and Bad Things will happen.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iotop&lt;/code&gt;: tells you which processes are hammering the disk&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;setting-up-edx&#34;&gt;Setting up edX&lt;/h3&gt;

&lt;p&gt;Follow the instructions at [&lt;a href=&#34;https://github.com/edx/configuration/wiki/edX-Ubuntu-12.04-64-bit-Installation&#34;&gt;https://github.com/edx/configuration/wiki/edX-Ubuntu-12.04-64-bit-Installation&lt;/a&gt;]. If you&amp;rsquo;re in a rush, you can skip to &amp;lsquo;One step installation&amp;rsquo;, which I find works pretty well.&lt;/p&gt;

&lt;h4 id=&#34;lxc-apparmor-issues&#34;&gt;LXC: apparmor issues&lt;/h4&gt;

&lt;p&gt;While running &lt;code&gt;vagrant.sh&lt;/code&gt;, you&amp;rsquo;ll get an error like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;stderr: apparmor_parser: Unable to replace &amp;#34;/edx/app/edxapp/venvs/edxapp-sandbox/bin/python&amp;#34;.  Permission denied; attempted to load a profile while confined?&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I spent a while trying to get this to work correctly but was not successful. It&amp;rsquo;s related to a Python sandbox, used for programming MOOCs (to ensure that students can&amp;rsquo;t run malicious code on the server). I&amp;rsquo;m not running a programming MOOC, so I disabled it.&lt;/p&gt;

&lt;p&gt;Edit &lt;code&gt;/var/tmp/configuration/playbooks/roles/edxapp/defaults/main.yml&lt;/code&gt;. Change:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;EDXAPP_PYTHON_SANDBOX: &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;to&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;EDXAPP_PYTHON_SANDBOX: &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;false&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Re-run the deployment script with&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;cd&lt;/span&gt; /var/tmp/configuration/playbooks &amp;amp;&amp;amp; sudo ansible-playbook -c &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;local&lt;/span&gt; ./edx_sandbox.yml -i &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;localhost,&amp;#34;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is the same as the last line of &lt;code&gt;vagrant.sh&lt;/code&gt;. Ideally, you would check that config change into a local branch of the edX Configuration repository.&lt;/p&gt;

&lt;p&gt;The slightly nicer way to do this is to add the &lt;code&gt;EDXAPP_PYTHON_SANDBOX&lt;/code&gt; line to your &lt;code&gt;server-vars.yml&lt;/code&gt;, as described &lt;a href=&#34;https://ianhowson.com/open-edx-configuration.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;lxc-rabbitmq-issues&#34;&gt;LXC: rabbitmq issues&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;TASK: [rabbitmq | remove guest user]
stderr: Error: unable to connect to node rabbit@localhost: nodedown&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I didn&amp;rsquo;t solve this completely, but a functional (if horrible) workaround is to edit &lt;code&gt;/etc/hosts&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;127.0.0.1 &amp;lt;hostname&amp;gt;
127.0.0.1 localhost&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&#34;installing-your-theme&#34;&gt;Installing your theme&lt;/h3&gt;

&lt;h3 id=&#34;setting-up-user-accounts&#34;&gt;Setting up user accounts&lt;/h3&gt;

&lt;h3 id=&#34;setting-configuration-variables&#34;&gt;Setting configuration variables&lt;/h3&gt;

&lt;p&gt;In configuration repo, modify &lt;code&gt;/playbooks/roles/edxapp/defaults/main.yml&lt;/code&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;###&lt;/p&gt;

&lt;h1 id=&#34;tasks-to-complete-before-live-deployment&#34;&gt;Tasks to complete before live deployment&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Set up SSH access with public keys (preferably not on the default port 22)&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Disable the default accounts:&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/edx/edx-platform/wiki/Frequently-Asked-Questions&#34;&gt;https://github.com/edx/edx-platform/wiki/Frequently-Asked-Questions&lt;/a&gt;
User: honor Password: edx
User: audit Password: edx
User: verified Password: edx
User: staff Password:edx&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Verify that only your LMS, CMS and SSH ports are visible through the firewall. There are a lot of TCP-enabled services running; while they are probably configured to allow connections to localhost only, why take the chance?
** Run &lt;code&gt;netstat -al&lt;/code&gt; to check&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Review the settings in /edx/app/edxapp/&lt;em&gt;.json, especially things like cms.env.json which define contact details and titles for your instance.&lt;/em&gt;
** Or maybe you&amp;rsquo;re not supposed to touch those &amp;ndash; &lt;a href=&#34;https://groups.google.com/d/msg/edx-code/VjVFT4-Etjw/UrpzDbpazo0J&#34;&gt;https://groups.google.com/d/msg/edx-code/VjVFT4-Etjw/UrpzDbpazo0J&lt;/a&gt; says that they get overwritten during ansible update&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Add Google Analytics API key&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Set up your DNS to point to your instance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Talk about how to use different DNS names to give Studio vs. LMS instead of different port numbers&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Think about backups and disaster recovery&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Set up authentication (Shibboleth, LDAP)&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Important URLs&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Configure the instance&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Adding users&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Creating a course&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Setting start and end dates&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Uploading SCORM zip files&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Reducing Memory Consumption</title>
      <link>https://ianhowson.com/openedx/reducing-memory-consumption/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/reducing-memory-consumption/</guid>
      <description>&lt;p&gt;&lt;strong&gt;This page is &lt;em&gt;extremely rough&lt;/em&gt;. It&amp;rsquo;s just my rough notes with very little editing or checking. Be warned!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re just developing on your laptop/desktop, run Devstack. it is a lot easier to develop on than the full production server and &amp;ldquo;only&amp;rdquo; uses 2GB of RAM.&lt;/p&gt;

&lt;p&gt;The stock edX Ubuntu deployment is set up to give you good performance, but it assumes that you have a lot of hardware available.&lt;/p&gt;

&lt;p&gt;The recommended config for the edX Ubuntu deployment recommends an Amazon instance with 4GB of RAM. There are two problems with this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4GB of RAM is a lot to allocate to a VM in development. If you want to demo or develop on your laptop&amp;hellip; not a lot of laptops have 16GB of RAM yet, and spending half of your RAM on a VM is annoying.

&lt;ul&gt;
&lt;li&gt;For development, 2GB is sufficient, but it&amp;rsquo;s still chugworthy&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;4GB Amazon instances are &lt;em&gt;expensive&lt;/em&gt;. As of August 2014, they&amp;rsquo;re about $100/month, so $1200/year &lt;em&gt;just in hosting&lt;/em&gt;.

&lt;ul&gt;
&lt;li&gt;To put that in perspective, you could buy a really nice desktop computer or server, put it under your desk and use your university&amp;rsquo;s Internet connection. And have no ongoing costs.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re in the sort of institution whose IT department charges $15k/year for a server or even $100k (hello, Australian banking sector), then&amp;hellip; sucks to be you. I guess Amazon works out cheaper, then.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you&amp;rsquo;re running a big edX instance (tens of thousands of students), then yeah, you probably need some bigger hardware and a lot of RAM. If you&amp;rsquo;re just doing a closed course, 4GB instances are vast overkill.&lt;/p&gt;

&lt;p&gt;You can reduce the memory usage to something sane by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Reduce the number of workers for a bunch of services (lms, cms, xqueue). By default, lms uses 8, and each uses ~80MB of RAM (so 640MB &lt;em&gt;just for the LMS&lt;/em&gt;). I use 3 both in dev and production.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The optimal number here is highly debated. If you&amp;rsquo;re CPU-bound then &lt;number of CPUs plus one&gt; is good, but unless you&amp;rsquo;re using SSDs, you&amp;rsquo;re probably not CPU bound. Best to test it and adjust accordingly. Keep in mind that if you&amp;rsquo;re using regular spinning disks, you&amp;rsquo;ll probably never peg the CPUs even with many workers; adding workers will just make the disks thrash more. I/O is pretty much always the bottleneck.&lt;/li&gt;
&lt;li&gt;Also, timeout=300? That seems crazy. Who has a 5 minute request? Better to kill it early rather than block all the new requests coming in. Make it 30 seconds, tops (and even then, you&amp;rsquo;re still boned).&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restart the LMS with &lt;code&gt;/edx/bin/supervisorctl restart edxapp:lms&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restart the CMS with &lt;code&gt;/edx/bin/supervisorctl restart edxapp:cms&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restart xqueue with &lt;code&gt;/edx/bin/supervisorctl restart xqueue&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restart ora with &lt;code&gt;/edx/bin/supervisorctl restart ora&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restart ora_celery with &lt;code&gt;/edx/bin/supervisorctl restart ora_celery&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You might not use it at all, so you could just turn it off&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Restarting through supervisor doesn&amp;rsquo;t seem to change the number of workers; as a stop-gap solution, just reboot the machine (yuck!)&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;You could also turn off some services, like the grader, forums or java&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;You could also use zram&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Try KSM. Only for KVM VMs right now, but there are some attempts to make it work for all processes, which would be excellent with LXC:
&lt;a href=&#34;https://plus.google.com/+MaksimMelnikau/posts/QfhAchyzYva&#34;&gt;https://plus.google.com/+MaksimMelnikau/posts/QfhAchyzYva&lt;/a&gt;
&lt;a href=&#34;http://vleu.net/ksm_preload/&#34;&gt;http://vleu.net/ksm_preload/&lt;/a&gt;
&lt;a href=&#34;http://kerneldedup.org/en&#34;&gt;http://kerneldedup.org/en&lt;/a&gt;
&lt;a href=&#34;http://kerneldedup.org/en/projects/uksm/introduction/&#34;&gt;http://kerneldedup.org/en/projects/uksm/introduction/&lt;/a&gt;
&lt;a href=&#34;https://github.com/prashmohan/lxc-fork/blob/master/Documentation/vm/ksm.txt&#34;&gt;https://github.com/prashmohan/lxc-fork/blob/master/Documentation/vm/ksm.txt&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Common Errors</title>
      <link>https://ianhowson.com/openedx/common-errors/</link>
      <pubDate>Thu, 25 Sep 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/openedx/common-errors/</guid>
      <description>

&lt;h2 id=&#34;i-ve-got-internal-server-error-when-accessing-the-production-server-lms&#34;&gt;I&amp;rsquo;ve got Internal Server Error when accessing the production server (LMS)&lt;/h2&gt;

&lt;p&gt;Check &lt;code&gt;/edx/var/log/lms/edx.log&lt;/code&gt; for the reason.&lt;/p&gt;

&lt;h2 id=&#34;fixing-common-error-messages&#34;&gt;Fixing common error messages&lt;/h2&gt;

&lt;p&gt;(c/o [&lt;a href=&#34;https://groups.google.com/forum/#!topic/openedx-ops/bk4dvZRH1dk]&#34;&gt;https://groups.google.com/forum/#!topic/openedx-ops/bk4dvZRH1dk]&lt;/a&gt;)&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;from&lt;/span&gt; analyticsclient.exceptions &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;import&lt;/span&gt; ClientError
ImportError: No module named exceptions&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To fix, roll back to an old release version (in this case, v0.1.0):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo -u edxapp bash
&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;source&lt;/span&gt; /edx/app/edxapp/venvs/edxapp/bin/activate
pip uninstall edx-analytics-api-client
pip install -e git+https://github.com/edx/edx-analytics-data-api-client.git@0.1.0#egg=edx-analytics-data-api-client&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Implementing XKCD-style passwords on a real website: lessons learned</title>
      <link>https://ianhowson.com/blog/implementing-xkcd-style-passwords-on-a-real-website/</link>
      <pubDate>Wed, 09 Jul 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/implementing-xkcd-style-passwords-on-a-real-website/</guid>
      <description>

&lt;p&gt;&lt;a href=&#34;http://xkcd.com/936/&#34;&gt;&lt;img src=&#34;http://imgs.xkcd.com/comics/password_strength.png&#34; alt=&#34;XKCD comic on passwords using random words&#34;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I recently completed a project requiring a few thousand pre-set-up user accounts. For their passwords, I decided to implement XKCD-style passwords instead of the usual collection of random characters.&lt;/p&gt;

&lt;p&gt;I specifically wanted to avoid letting the users choose their own passwords. They would probably use the same password as a related, sensitive system. Keeping the two systems isolated was desirable.&lt;/p&gt;

&lt;h2 id=&#34;getting-the-dictionary-file-right-is-difficult&#34;&gt;Getting the dictionary file right is difficult&lt;/h2&gt;

&lt;p&gt;I used a fairly complete dictionary that I pulled from a mailing list (the exact URL eludes me, unfortunately).&lt;/p&gt;

&lt;p&gt;Dictionaries contain a lot of offensive words; we&amp;rsquo;d prefer not to use them for passwords. There&amp;rsquo;s a fuzzy line for what dictates &amp;lsquo;offensive&amp;rsquo;, though. The plural of &amp;lsquo;ball&amp;rsquo;, &amp;lsquo;balls&amp;rsquo;, can be offensive when combined with the right (or wrong) modifiers.&lt;/p&gt;

&lt;p&gt;What is offensive varies across cultures. For example, the Brits have a lot of words which are sort of funny and inoffensive if you&amp;rsquo;re a native English speaker (e.g. &amp;lsquo;spatchcock&amp;rsquo;). A large proportion of my users were not native English speakers. Many of the funny British-isms had to go.&lt;/p&gt;

&lt;p&gt;Even after screening for offensive individual words, it&amp;rsquo;s possible to get weird combinations of words that have meaning. After stripping out the obvious swear words and other dangerous (but non-sweary) words, I generated a bunch of random passwords and skimmed through them by hand. This turned up some other dangerous combinations, like &amp;lsquo;hate indian&amp;rsquo;. Not good.&lt;/p&gt;

&lt;p&gt;Once, I encountered this problem with a random character password; somehow the string &amp;lsquo;cute&amp;rsquo; snuck into a female user&amp;rsquo;s password.&lt;/p&gt;

&lt;p&gt;Dictionaries contain lots of obscure words and non-words words, like &amp;rsquo;re&amp;rsquo; or &amp;lsquo;b&amp;rsquo;. They have no meaning to me and thus no recall value.&lt;/p&gt;

&lt;h2 id=&#34;performance&#34;&gt;Performance&lt;/h2&gt;

&lt;p&gt;Dictionary files are big. They take a long time to read from disk and use a lot of memory. I elected to read the dictionary every time I created a user, which took a good fraction of a second each time.&lt;/p&gt;

&lt;p&gt;It would have been smarter to keep it in memory and put some effort into freeing that memory when done. Better yet, generate the passwords offline.&lt;/p&gt;

&lt;h2 id=&#34;user-acceptance&#34;&gt;User acceptance&lt;/h2&gt;

&lt;p&gt;Random word passwords look different to normal passwords, and most users have never encountered a passphrase before.&lt;/p&gt;

&lt;p&gt;Many users didn&amp;rsquo;t recognise the string as a password at all; they emailed saying that they hadn&amp;rsquo;t received a password, or thought that it was part of a sentence that had been mistyped, or asked what the words meant. One thought that it was a cryptic word puzzle that they had to solve and that once solved, that answer would be the password.&lt;/p&gt;

&lt;p&gt;Other users didn&amp;rsquo;t know where to put it. I had to draw a diagram showing that the passphrase should be typed in just like a regular password. It seems almost comical, in hindsight, but this genuinely reduced the number of support emails that I received.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#39;https://ianhowson.com/images/xkcd-password-example.png&#39; width=&#39;381&#39; height=&#39;311&#39;&gt;&lt;/p&gt;

&lt;h2 id=&#34;usability&#34;&gt;Usability&lt;/h2&gt;

&lt;p&gt;Long passwords are easy to mistype. I elected to not show the user&amp;rsquo;s password as they typed, but I think that might be a mistake. Such a long password is easy to get wrong.&lt;/p&gt;

&lt;p&gt;A lot of users don&amp;rsquo;t type the spaces. Some users will type in all caps.&lt;/p&gt;

&lt;p&gt;If the password is long and difficult to enter, users will just copy and paste it. This somewhat defeats the purpose of providing a memorable password.&lt;/p&gt;

&lt;p&gt;Even copy and paste has its problems. A lot of users will select too many or too few characters at the start and end of the password string, and if they can&amp;rsquo;t see the password when they paste it, they can&amp;rsquo;t see the error.&lt;/p&gt;

&lt;p&gt;I added a new Django auth backend to strip spaces and lowercase everything. It almost eliminated the &amp;ldquo;my password isn&amp;rsquo;t working&amp;rdquo; emails. I strongly recommend it.&lt;/p&gt;

&lt;p&gt;Logging your failed password attempts (securely!) will help a lot with diagnosing these problems.&lt;/p&gt;

&lt;h2 id=&#34;was-it-worth-it&#34;&gt;Was it worth it?&lt;/h2&gt;

&lt;p&gt;Quantifying security differences is tricky at the best of times.&lt;/p&gt;

&lt;p&gt;In this case, probably not. It wasn&amp;rsquo;t a system that required a high level of security. Once everyone had logged in at least once, there were no more complaints &amp;ndash; but a lot of people (0.5%?) had trouble entering that password correctly once.&lt;/p&gt;

&lt;p&gt;In hindsight, perhaps this is a policy best restricted to your own personal password security and not enforced on other people.&lt;/p&gt;

&lt;h2 id=&#34;links&#34;&gt;Links&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;http://preshing.com/20110811/xkcd-password-generator/&#34;&gt;Jeff Preshing&amp;rsquo;s xkcd Password Generator&lt;/a&gt;. I recommend that you go and mash the &amp;lsquo;Generate Another&amp;rsquo; button. The wordlist is dangerously small (a few hundred, I&amp;rsquo;m guessing), but it&amp;rsquo;s still easy to generate offensive passwords.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://correcthorsebatterystaple.net/&#34;&gt;Correct Horse Battery Staple&lt;/a&gt;: Another, slightly more paranoid option.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://www.xkpasswd.net/c/index.cgi&#34;&gt;xkpasswd&lt;/a&gt;: More paranoia again.&lt;/p&gt;

&lt;p&gt;I find the addition of numbers and punctuation to be a bit odd; the whole point of using words is that you get sufficient entropy for your password &lt;em&gt;without&lt;/em&gt; having to resort to difficult-to-memorise features such as numbers and punctuation.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://boingboing.net/2014/02/25/choosing-a-secure-password.html&#34;&gt;Choosing A Secure Password&lt;/a&gt; by Bruce Schneier. Long, but an excellent read.&lt;/p&gt;

&lt;h2 id=&#34;afterword&#34;&gt;Afterword&lt;/h2&gt;

&lt;p&gt;For a later group of users, I used standard Django random passwords (a sequence of 8 random numbers and letters).&lt;/p&gt;

&lt;p&gt;Not one user complained that their password was not being accepted. A few couldn&amp;rsquo;t figure out where to type it in (even with the helpful image!) but they could all readily identify the password.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Do you really need ECC RAM with ZFS?</title>
      <link>https://ianhowson.com/zfs/ecc-ram/</link>
      <pubDate>Thu, 27 Feb 2014 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/zfs/ecc-ram/</guid>
      <description>&lt;p&gt;In short, not really. But your life will be better if you do.&lt;/p&gt;

&lt;p&gt;The thing to remember is that ZFS will absolutely refuse to give you data that it thinks is incorrect. If it detects an error, it will give you an error. It will never ever (to a vanishingly tiny probability) give you wrong data.&lt;/p&gt;

&lt;p&gt;So, if your non-ECC RAM is already perfect, great! You won&amp;rsquo;t gain anything by getting ECC RAM.&lt;/p&gt;

&lt;p&gt;The thing is, all hardware is imperfect. Modern hard drives are specced with an error rate of 1 in 10^15 or so. And while this number should be taken with a large grain of salt, it&amp;rsquo;s worth mentioning that the capacity of hard drives is approaching this number. That is, if you merely &lt;em&gt;fill&lt;/em&gt; a modern hard drive with data, you should expect that the drive itself has introduced an error into your data.&lt;/p&gt;

&lt;p&gt;Most filesystems trust the data that the hardware gives them, and they in turn will pass that data to you, the user. And if there&amp;rsquo;s an imperfection, you&amp;rsquo;ll get that imperfection. You almost certainly won&amp;rsquo;t notice; nowadays, most data is highly-compressed video or audio or pictures, and humans are mostly forgiving of small flaws.&lt;/p&gt;

&lt;p&gt;The thing that makes ZFS difficult to use with non-ECC RAM is that it won&amp;rsquo;t give you flawed data; it&amp;rsquo;ll give you no data. If you have a 20GB VM image on a ZFS volume and it develops a single uncorrectable bit out of place, the whole thing is marked &amp;lsquo;broken&amp;rsquo; and ZFS won&amp;rsquo;t give it to you. Over a single bit. Which probably wasn&amp;rsquo;t important anyway.&lt;/p&gt;

&lt;p&gt;Note that I said &amp;lsquo;uncorrectable&amp;rsquo;. If your data hits the disk intact, that error will almost certainly be correctable by one of the other volume members.&lt;/p&gt;

&lt;p&gt;If your data hits the disk &lt;em&gt;incorrectly&lt;/em&gt;, such as if you have not-quite-perfect non-ECC RAM and it was written to all of the mirrors incorrectly, you&amp;rsquo;re in trouble. You now have redundantly incorrect data that ZFS won&amp;rsquo;t serve to you. Hope you have a backup.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t &lt;em&gt;need&lt;/em&gt; ECC memory for ZFS. It won&amp;rsquo;t run any better or faster or clean your bathroom. What it will do is reduce the chance that your data becomes inaccessible because it&amp;rsquo;s slightly wrong; something which you didn&amp;rsquo;t know happened before, but which ZFS makes obvious.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How to set up a private IPython parallel cluster</title>
      <link>https://ianhowson.com/blog/how-to-set-up-a-private-ipython-cluster/</link>
      <pubDate>Mon, 03 Jun 2013 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/how-to-set-up-a-private-ipython-cluster/</guid>
      <description>

&lt;p&gt;IPython Notebook (now Jupyter Notebooks) is frickin&amp;rsquo; awesome. With the parallel extensions, it&amp;rsquo;s even awesomer.&lt;/p&gt;

&lt;p&gt;I want to use spare desktops around my house to speed up my parallel jobs. There is lots of &lt;a href=&#34;http://ipython.org/ipython-doc/stable/parallel/parallel_intro.html&#34;&gt;documentation&lt;/a&gt; on how to do this. It is very long.&lt;/p&gt;

&lt;p&gt;My setup is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MacBook running OS X 10.8&lt;/li&gt;
&lt;li&gt;Two desktops running Ubuntu&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;step-1-install-the-software&#34;&gt;Step 1: Install the software&lt;/h2&gt;

&lt;p&gt;On the MacBook, you need ipython+notebook+parallel. I use MacPorts, so you can install this with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo port install py-ipython +notebook +parallel +scientific&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;On the Ubuntu machines, you just need the &lt;code&gt;ipython-notebook&lt;/code&gt; package, and the rest of the dependencies will install automatically:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo aptitude install ipython-notebook&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;step-2-test-it-out&#34;&gt;Step 2: Test it out&lt;/h2&gt;

&lt;p&gt;To start some workers (&amp;lsquo;engines&amp;rsquo;) on your local machine, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ipcluster start --n=&lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;4&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To start a Notebook instance, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ipython notebook&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You should get a Notebook instance in your web browser. Fire it up and run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;from&lt;/span&gt; IPython.parallel &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;import&lt;/span&gt; Client

c = Client()
c.ids
c[:].apply_sync(&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;lambda&lt;/span&gt;: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;Hello, world!&amp;#34;&lt;/span&gt;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You should get back:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;[&lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;2&lt;/span&gt;, &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;3&lt;/span&gt;]
[&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Hello, world!&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Hello, world!&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Hello, world!&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Hello, world!&amp;#39;&lt;/span&gt;]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You get one for each engine.&lt;/p&gt;

&lt;p&gt;Shut down the cluster on your local computer by hitting Ctrl-C on the terminal window running it.&lt;/p&gt;

&lt;h2 id=&#34;step-3-connect-more-cluster-nodes&#34;&gt;Step 3: Connect more cluster nodes&lt;/h2&gt;

&lt;h3 id=&#34;set-up-your-laptop&#34;&gt;Set up your laptop&lt;/h3&gt;

&lt;p&gt;Cluster configuration is described in a &amp;lsquo;profile&amp;rsquo;. On your local machine, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;ipython profile create --parallel --profile=home&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This creates a profile called &amp;lsquo;home&amp;rsquo;. Modify &lt;code&gt;~/.ipython/profile_home/ipcluster_config.py&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;c = get_config()

c.IPClusterEngines.engine_launcher_class = &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;SSH&amp;#39;&lt;/span&gt;
c.LocalControllerLauncher.controller_args = [&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;--ip=&amp;#39;*&amp;#39;&amp;#34;&lt;/span&gt;]

c.SSHEngineSetLauncher.engines = {
    &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;localhost&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;4&lt;/span&gt;,
    &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;tyler&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;4&lt;/span&gt;,
    &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;par&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;4&lt;/span&gt;,
}

&lt;span style=&#34;color:#007f7f&#34;&gt;# FIXME NASTY HACK We need to use non-system-default Python on the Mac&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;# (i.e. the /opt path in the default config below) but we want default&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;# Python on the Linux machine. I couldn&amp;#39;t figure out a way to specify it&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;# on a per-host basis, and profile/bashrc/whatever are not executed for&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;# ssh login, and the ~ alias doesn&amp;#39;t seem to work, so... I created a&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;# symlink in / for ipengine (i.e. `ln -s /opt/local/bin/ipengine&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;# /ipengine`). Horrible, but it works!&lt;/span&gt;
c.SSHEngineSetLauncher.engine_cmd = [&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;/ipengine&amp;#39;&lt;/span&gt;] &lt;span style=&#34;color:#007f7f&#34;&gt;# works on Linux (thought nothing necessary)&lt;/span&gt;
&lt;span style=&#34;color:#007f7f&#34;&gt;#c.SSHEngineSetLauncher.engine_cmd = [&amp;#39;/opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python&amp;#39;, &amp;#39;-c&amp;#39;, &amp;#39;from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()&amp;#39;] # works on Mac&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note that this will open up these ports on your MacBook to the whole local network. If you&amp;rsquo;re on an untrusted network segment, don&amp;rsquo;t do this! A future revision of this guide might deal with doing everything across SSH port forwarding.&lt;/p&gt;

&lt;h3 id=&#34;set-up-the-cluster-nodes&#34;&gt;Set up the cluster nodes&lt;/h3&gt;

&lt;p&gt;You need to be able to log in to the remote Linux machines automatically via SSH, ideally using your username on the frontend machine. Check &lt;a href=&#34;http://serverfault.com/a/241593/82278&#34;&gt;here&lt;/a&gt; if you&amp;rsquo;re not sure how.&lt;/p&gt;

&lt;p&gt;Per the above dodgy hack, you also need to link &lt;code&gt;/ipengine&lt;/code&gt; to the relevant &lt;code&gt;ipengine&lt;/code&gt; binary on that machine. For the Macs:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo ln -s /opt/local/bin/ipengine /ipengine&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Also make sure you enable SSH (Control Panels -&amp;gt; Sharing -&amp;gt; Remote Login). You also need passwordless login on your local machine; test with &lt;code&gt;ssh localhost&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For the Ubuntu machines:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;sudo ln -s /usr/bin/ipengine /ipengine&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;ipcluster&lt;/code&gt; will try to share config files with the engines. By default, the directories do not exist on the Linux hosts. On each of them, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;mkdir -p .ipython/profile_home/security/&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&#34;running-the-thing&#34;&gt;Running the thing&lt;/h3&gt;

&lt;p&gt;To start the cluster, on your local machine, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ipcluster start --profile=home&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you add new machines to the cluster, you need to re-run the &lt;code&gt;Client(profile=&#39;home&#39;)&lt;/code&gt; line to get access to them. This also fixes things if you break the cluster (e.g. by exhausting RAM).&lt;/p&gt;

&lt;!--
{# TODO describe number of engine selection. Usually you want CPUcores+1 (to ensure they&#39;re all maxed out), but CPUcores is also a good choice. If per-thread performance is important, count each pair of Hyperthreaded cores as a single core. On your local machine, you might want to use CPUcores-1 to improve responsiveness. #}

{# TODO: distributed FS that hosts on your macbook #}

{# TODO: and then, with all of that done, you can go to the Clusters tab in ipython notebook, click &#39;Start&#39;, and parallel computer happily! #}

{# TODO: do it all on a usb stick! #}

--&gt;
</description>
    </item>
    
    <item>
      <title>Attacks on Proximity Card Systems</title>
      <link>https://ianhowson.com/blog/attacks-on-proximity-card-systems/</link>
      <pubDate>Tue, 28 May 2013 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/attacks-on-proximity-card-systems/</guid>
      <description>

&lt;!--
  can you provide numbers for how often wiegand 26 systems are installed, or how many systems exist? point out that they are installed in buildings and tend to have a very long lifespan (10+ years --&gt;

&lt;p&gt;&lt;strong&gt;DRAFT&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&#34;introduction&#34;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;Many attacks have been described against low-frequency (125kHz) proximity card systems. Nobody should be surprised to learn that they&amp;rsquo;re not considered very secure &amp;ndash; and yet, most new installations use these cards.&lt;/p&gt;

&lt;p&gt;I want to raise awareness of the limitations of these systems, demonstrate the many simple ways in which they can be defeated, and discourage new installations.
This article lists the most practical ways in which to attack these systems. As I become aware of new attacks, I will update the article.&lt;/p&gt;

&lt;p&gt;I will mention specific product names, but these attacks can be applied to many different products which use the same operating principles.&lt;/p&gt;

&lt;h2 id=&#34;system-design&#34;&gt;System design&lt;/h2&gt;

&lt;p&gt;We will be dealing with the &amp;lsquo;common case&amp;rsquo; when proximity cards are used for access control: HID Proximity cards, an HID reader such as the ProxPoint, transmitting Wiegand back to the controller. Other designs are possible (and desirable), but in my experience, the vast majority use this setup.&lt;/p&gt;

&lt;p&gt;An access control system using proximity cards is usually laid out like so:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/aopcs-system.png&#34; alt=&#34;Typical physical access control system&#34; /&gt;&lt;/p&gt;

&lt;p&gt;For a user to authenticate against the system, the following steps take place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The user brings the card within range of the reader.&lt;/li&gt;
&lt;li&gt;The field around the reader powers up the card&lt;/li&gt;
&lt;li&gt;The card transmits its pre-programmed code to the reader&lt;/li&gt;
&lt;li&gt;The reader transmits the code to the door controller&lt;/li&gt;
&lt;li&gt;The door controller may decide to allow, powering the door strike (and unlocking the door) or it can defer to the management backend for an allow/deny decision&lt;/li&gt;
&lt;li&gt;If the door strike is powered (the solenoid activates and unlocks the door), the user can push the door open and enter&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;card&#34;&gt;Card&lt;/h3&gt;

&lt;p&gt;All proximity cards contain a coil and some circuitry. Some long-range types contain a battery, but we will not be dealing with them here. The coil picks up an RF field from the reader and uses it to provide power to the internal circuitry.&lt;/p&gt;

&lt;p&gt;The most common coils are tuned to 125kHz or 13.56MHz. We will refer to cards running at about 125kHz as Low Frequency (LF) cards. We will refer to cards running at 13.56MHz as High Frequency (HF) cards.&lt;/p&gt;

&lt;p&gt;There are many different types of card on the market. The most common cards run at 125kHz and are made by HID (for example, the &lt;a href=&#34;http://www.hidglobal.com/main/id-cards/hid-proximity/1326-proxcard-ii-clamshell-card.html&#34;&gt;ProxCard II&lt;/a&gt;. There are very similar cards running at 128kHz or 134kHz from manufacturers such as Indala.&lt;/p&gt;

&lt;p&gt;These cards all behave in the same way. When the card is placed in the reader&amp;rsquo;s field, the circuitry derives a power supply and clock. It then transmits a preprogrammed code to the reader and powers down.&lt;/p&gt;

&lt;p&gt;The method of transmission is interesting. Instead of the card actively transmitting RF energy, it manipulates its own power draw in a way that can be detected by the reader. Because the card and reader are inductively coupled together, an increase in load causes a decrease in the output voltage on the reader side. This change in voltage can be measured by the reader and is used to transfer information from card to reader. (For more information, see &lt;a href=&#34;http://www.rfid-handbook.de/rfid/types_of_rfid.html&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;

&lt;h3 id=&#34;card-reader&#34;&gt;Card reader&lt;/h3&gt;

&lt;p&gt;The reader generates an RF carrier to power and clock the card. When it receives a valid transmission from a card, it will transmit the card number out of its Wiegand interface.&lt;/p&gt;

&lt;p&gt;The reader does not make the access control decision (allow or deny). This is good design; the reader is physically accessible to the insecure side of the door and can easily be tampered with. Instead, it transmits the card&amp;rsquo;s code to the door controller using its Wiegand output.&lt;/p&gt;

&lt;h3 id=&#34;the-wiegand-interface&#34;&gt;The Wiegand interface&lt;/h3&gt;

&lt;p&gt;The Wiegand interface uses three wires: GND, D0 and D1. To transmit a &amp;lsquo;0&amp;rsquo; bit, the D0 line is pulled to 5V. To transmit a &amp;lsquo;1&amp;rsquo; bit, the D1 line is pulled to 5V. There are no formal timing requirements, but most devices transmit and recieve pulses around 50uS wide and with a gap of 5000uS between pulses.&lt;/p&gt;

&lt;!--
 TODO: diagram. Also check your timing statement.
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;Sect2=HITOFF&amp;d=PG01&amp;p=1&amp;u=/netahtml/PTO/srchnum.html&amp;r=1&amp;f=G&amp;l=50&amp;s1=%2220100034375%22.PGNR.&amp;OS=DN/20100034375&amp;RS=DN/20100034375

http://www.pacom.com/card-reader-interfaces.php - door controllers - allow offline behaviour, rex switches, door status and the like, connect back to control panel with multidrop rs485
http://www.ibtechnology.co.uk/pdf/magswipe_dec.PDF - has some stuff on timing and parity
#}
--&gt;

&lt;h3 id=&#34;wiegand-formats&#34;&gt;Wiegand formats&lt;/h3&gt;

&lt;!--
TODO: http://www.hidglobal.com/documents/understandCardDataFormats_wp_en.pdf
--&gt;

&lt;p&gt;The most common format for card numbers is as follows:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/aopcs-wiegand-26.png&#34; alt=&#34;Wiegand 26 format&#34; /&gt;&lt;/p&gt;

&lt;p&gt;(image from &lt;a href=&#34;http://www.hidglobal.com/documents/understandCardDataFormats_wp_en.pdf&#34;&gt;HID&amp;rsquo;s Understanding Card Data Formats document&lt;/a&gt;)&lt;/p&gt;

&lt;!--

what&#39;s important here
8-bit site code (implying up to 256 facilities
16-bit user code (implying up to 65536 users)
2 bites of parity (to

 TODO diagram

8 bit site code, 16-bit user code, two parity bits, show how they&#39;re calculated


a card number and the RF transmission can be generated from each other; they&#39;re just different representations of the same number

--&gt;

&lt;p&gt;This is often referred to as Wiegand 26.&lt;/p&gt;

&lt;p&gt;Cards almost always support other formats, but Wiegand 26 is the defacto standard. Almost all system components default to Wiegand 26 without further configuration.&lt;/p&gt;

&lt;!--
TODO: transmitted MSB or LSB first?
--&gt;

&lt;p&gt;User codes should be different for every card. They&amp;rsquo;re often printed on the card itself:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/aopcs-card-with-number.jpeg&#34; alt=&#34;HID proximity card with user ID printed on it&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Facility codes are the same for an entire &amp;lsquo;facility&amp;rsquo;, which is usually the domain controlled by a single access control system. This can be as small as a single office or can span multiple buildings. It has the obvious purpose that if two people from different companies have the same user code, they can&amp;rsquo;t open doors at each other&amp;rsquo;s buildings. It partitions the code space to prevent collisions.&lt;/p&gt;

&lt;h3 id=&#34;door-controller&#34;&gt;Door controller&lt;/h3&gt;

&lt;p&gt;The door controller receives and decodes the Wiegand signal from the reader. Depending on how the controller is configured, it can make a decision (allow or deny) based on that signal, or it can forward it on to the management backend. Usually, the door controller communicates with the backend through an RS485 bus. This is a multidrop bus, so many door controllers can communicate using a single set of cabling.&lt;/p&gt;

&lt;p&gt;If the door controller or management backend elect to unlock the door, a relay on the door controller is energised.&lt;/p&gt;

&lt;h3 id=&#34;door-strike&#34;&gt;Door strike&lt;/h3&gt;

&lt;p&gt;The door strike is wired to the relay on the door controller. Its purpose is to physically lock or unlock the door. Normally-open and normally-closed types are available, so:&lt;/p&gt;

&lt;!--
i&#39;ve also seen &#39;fail locked&#39; and &#39;fail unlocked&#39;
--&gt;

&lt;ul&gt;
&lt;li&gt;With an NC type, the strike is locked by default. When powered, the strike is unlocked. This is also called fail-secure.&lt;/li&gt;
&lt;li&gt;With an NO type, the strike is unlocked by default. When powered, the strike is locked. This is also called fail-safe.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strike can also be wired to the relay in normally-open or normally-closed configurations. In this way, default behaviour for the system (when unpowered) can be specified. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you want the door to unlock when the power fails (improving safety but reducing security), use an NO strike wired to the relay in NC configuration.&lt;/li&gt;
&lt;li&gt;If you want the door to absolutely positively not unlock unless the management backend asks for it, use a NC strike wired to the relay in NO configuration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Usually, there is a fire safety requirement that says that there must be an exit path in the event of a fire. This may require you to choose fail-safe strikes and have them continuously energised in some cases.&lt;/p&gt;

&lt;p&gt;Door strikes usually require 12V at a few hundred milliamps to trigger.&lt;/p&gt;

&lt;h2 id=&#34;weaknesses-and-attacks&#34;&gt;Weaknesses and attacks&lt;/h2&gt;

&lt;p&gt;Most attacks occur in two stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Obtaining a card number (facility and user code)&lt;/li&gt;
&lt;li&gt;Using the card number to gain access to the facility&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;wiring&#34;&gt;Wiring&lt;/h3&gt;

&lt;p&gt;The wiring between components &lt;strong&gt;must be physically secure&lt;/strong&gt;. Communication between card reader, door controller and door strike is completely unencrypted and unauthenticated.&lt;/p&gt;

&lt;p&gt;If an attacker can access the wiring carrying the Wiegand codes between reader and door controller, they can sniff the Wiegand codes. They can then replay the codes directly into the wiring or use them to create a new card.&lt;/p&gt;

&lt;p&gt;The same cards may be used for unrelated systems (e.g. printers, vending machines) which may not have secured cabling. The same Wiegand codes can be sniffed from there and used to clone a card.&lt;/p&gt;

&lt;p&gt;If an attacker can access the door strike wiring, they can manually energise the strike (by applying a voltage across the wiring) or de-energise it (by cutting one of the wires).&lt;/p&gt;

&lt;p&gt;If an attacker can access the door controller itself, the preceding attacks are also possible.&lt;/p&gt;

&lt;p&gt;Most RS485 wiring between door controllers and the management backend can also be sniffed and replayed, though the exact formats are not standardised. Some door controllers and backends do use encrypted communication over the RS485 lines.&lt;/p&gt;

&lt;p&gt;One big problem is that the card reader itself must be able to withstand physical attacks. The card reader usually has a &amp;lsquo;pigtail&amp;rsquo; collection of wires coming out of the back, including the critical Wiegand lines. If an attacker removes the reader from the wall, the Wiegand lines are exposed. This is usually easy to do. Some readers have their mounting screws well secured, such as these:&lt;/p&gt;

&lt;!--
the screws are in the insecure zone! HID ProxPoint includes a snap-on cover to make this less obvious, but the point remains: you can remove the cover, unscrew the reader and tamper with the wiring, which needs to be secured!

 The biggest flaw with the system is that it is completely dependent on the security of the transmission cable. Transmissions are not encrypted or authenticated. Transmissions are identical every time and so are vulnerable to replay attacks. Transmissions are low-speed and can be captured and replayed with very simple hardware.
 TODO: find an example
--&gt;

&lt;p&gt;Some have their mounting screws exposed, so a few moments with an electric drill is all that&amp;rsquo;s necessary:&lt;/p&gt;

&lt;!--
TODO: pick on Smart Innovations reader?
--&gt;

&lt;p&gt;Some are physically quite robust. Potting of the electronics is common, and is a good precaution.&lt;/p&gt;

&lt;!--
TODO: find a physically robust reader

TODO: find one with the mounting screws in the secure zone
--&gt;

&lt;p&gt;One factor that makes physically securing a reader difficult is that any metals in or around the reader will affect the read range. As a result, almost all readers are made of plastic and can be broken off the wall with a hammer.&lt;/p&gt;

&lt;!--
TODO move up

TODO: can you find an example of a Wiegand sniffer/replayer? There was a ruxcon talk, perhaps?
capture and replay the wiegand traffic - the thing you saw on hackaday/defcon

--&gt;

&lt;p&gt;The same is true of door strikes, but even low-end door strikes are solidly built; they just need to be at least as strong as the door that they lock.&lt;/p&gt;

&lt;h3 id=&#34;attacks-on-the-rf-transmission&#34;&gt;Attacks on the RF transmission&lt;/h3&gt;

&lt;h4 id=&#34;use-an-off-the-shelf-reader&#34;&gt;Use an off-the-shelf reader&lt;/h4&gt;

&lt;p&gt;One obvious way to read people&amp;rsquo;s cards is with an off-the-shelf reader, like what might be installed on a building. They can be powered from batteres and the beep can be disabled. The attacker might get near someone on a lift and swipe the reader over their pockets. Some companies mandate visible security passes, making it even easier as you can see the pass and know where to swipe the reader.&lt;/p&gt;

&lt;p&gt;The reader will output the Wiegand code of the card, so you need a sniffer/replayer and some way to use that code (either replay it into the Wiegand wiring or clone a card). Slightly simpler might be to use a reader with RS232 output and connect that to a laptop.&lt;/p&gt;

&lt;p&gt;This is a design flaw in the system &amp;ndash; there is nothing the card can do to know that it&amp;rsquo;s talking to a legitimate reader. As soon as it&amp;rsquo;s powered, it transmits its code, not knowing if the receiver is friendly or malicious.&lt;/p&gt;

&lt;p&gt;A less practical attack, but using the same design flaw, is to put a fake reader on a wall near an entry point. Users will swipe their cards on it thinking that it will open the door. The card numbers can be collected and exploited as above.&lt;/p&gt;

&lt;h4 id=&#34;rf-sniffing&#34;&gt;RF sniffing&lt;/h4&gt;

&lt;p&gt;The reader and cards transmit on a known frequency. If you can get close to a reader while it&amp;rsquo;s communicating with a card, you can capture the card&amp;rsquo;s transmission. It may be possible to do this at long range, since the reader operates at a very high power level (necessary to power the card). The attacker only needs to observe the card&amp;rsquo;s transmission, not power the card.&lt;/p&gt;

&lt;p&gt;Again, the card does not make any attempt to hide its transmission. Every transmission is identical, permitting replay attacks.&lt;/p&gt;

&lt;!--
TODO: find examples of this attack
http://proxmark.org/forum/viewtopic.php?id=1110
http://proxmark.org/forum/viewforum.php?id=11

TODO: find examples of this attack
TODO: how long is the effective range?
--&gt;

&lt;p&gt;Once you&amp;rsquo;ve captured the card&amp;rsquo;s transmission, you can either replay it directly at a reader or extract the user/site code.&lt;/p&gt;

&lt;!-- TODO examples for above --&gt;

&lt;p&gt;With the user/site code, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;produce your own card&lt;/li&gt;
&lt;li&gt;program a programmable card with the code&lt;/li&gt;
&lt;li&gt;purchase a card from a vendor&lt;/li&gt;
&lt;li&gt;conduct Wiegand wiring-level attacks&lt;/li&gt;
&lt;/ul&gt;

&lt;!--
TODO: are there any online vendors that will let you specify a number?

Related is the guy who did an RFID coil around a whole doorway and was able to read people&#39;s cards as they walked through.  

The undisputed master of tools to do this is the ProxMark 3. You can buy it from a website or from eBay for about AUD$200. You can sniff and replay RF.
http://cq.cx/proxmark3.pl - captures of RF traffic

http://proxclone.com/index.html
http://proxclone.com/reader_cloner.html
https://courses.cit.cornell.edu/ee476/FinalProjects/s2006/cjr37/Website/index.htm
http://proxclone.com/spoofer.html
http://ww1.microchip.com/downloads/en/DeviceDoc/51115F.pdf
http://www.youtube.com/watch?v=4jpRFgDPWVA
http://www.youtube.com/watch?v=fDimlEdeGjM&amp;feature=related
http://proxclone.com/Long_Range_Cloner.html

--&gt;

&lt;h3 id=&#34;attacks-on-the-card-numbers&#34;&gt;Attacks on the card numbers&lt;/h3&gt;

&lt;p&gt;The numbers themselves provide opportunities for attacks.&lt;/p&gt;

&lt;h4 id=&#34;brute-force&#34;&gt;Brute force&lt;/h4&gt;

&lt;p&gt;The total number of code available is relatively small (by cryptographic standards). With Wiegand 26, there are 256 site codes and 65536 user codes, for a total of 16,777,216 card numbers.&lt;/p&gt;

&lt;p&gt;Assume a random distribution of card numbers and that you can make one swipe attempt every three seconds. If there is exactly one valid card in the system, it will take (on average) 291 days to find it; not very useful.&lt;/p&gt;

&lt;p&gt;If you know the site code (say, you cozy up to someone at their security vendor), you can guess a single valid card in 1.4 days, on average.&lt;/p&gt;

&lt;p&gt;If there are more cards set up in the system, guessing a valid user code becomes proportionally easier. If you have 100 cards set up, a valid card can be guessed in 2.8 hours, on average.&lt;/p&gt;

&lt;!--

TODO: birthday paradox; this is wrong

TODO for attacks: birthday attack on user code within a company. Mitigation is to use a longer or non-standard Wiegand code - it may not be &#39;secure&#39;, but it&#39;s &#39;not being the slowest runner away from the bear&#39;

--&gt;

&lt;p&gt;If you know someone&amp;rsquo;s user code (not many people know that the number printed on their card is important!) you can brute-force the site code in 13 minutes, on average.&lt;/p&gt;

&lt;!--

TODO recalc; this is wrong

TODO probably good to recalc all of these when you&#39;re less sleepy
remind that a human need not be present for these attacks - proxmark could be scripted to perform them
http://dl.packetstormsecurity.net/papers/general/proxbrute-proxcard.pdf

--&gt;

&lt;h4 id=&#34;guessing-more-user-numbers&#34;&gt;Guessing more user numbers&lt;/h4&gt;

&lt;p&gt;Usually, cards are sold in sequential order. If you know one card number (perhaps your own, if you&amp;rsquo;re an inside attacker or corporate spy) it&amp;rsquo;s very likely that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;there are other card numbers near your own&lt;/li&gt;
&lt;li&gt;lower-numbered cards belong to longer-serving employees, potentially with more access rights&lt;/li&gt;
&lt;li&gt;less lower-numbered cards will work, due to employees moving on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also means that you really shouldn&amp;rsquo;t reissue old cards to new employees, as the old employee may be able to produce a new card with the same site/user code. They&amp;rsquo;re cheap; destroy them (securely!)&lt;/p&gt;

&lt;!--

- simply purchase cards with the same site code but different user code from a vendor - are there any online vendors?
cards are often sold in a box of 100, so if you know one number, you can guess others pretty quickly

http://www.hidglobal.com/documents/125khz_htog_en.pdf page 4 - you do have the option to get random internal numbering without matching numbers on the outside. i have never seen such a card, though i expect they do exist. this is the best option for security.

--&gt;

&lt;p&gt;Some companies require that people display their security passes on their body (often with additional conditions like &amp;ldquo;it must be above your waist&amp;rdquo; and &amp;ldquo;you must challenge anyone who isn&amp;rsquo;t wearing a pass&amp;rdquo;). Some subset of those companies &lt;strong&gt;also&lt;/strong&gt; print the user number on the card. Obtaining a user code then becomes a simple matter of &lt;em&gt;reading the number&lt;/em&gt; off people outside the building.&lt;/p&gt;

&lt;h2 id=&#34;attack-tools&#34;&gt;Attack tools&lt;/h2&gt;

&lt;h3 id=&#34;wiegand-sniffer-replayer&#34;&gt;Wiegand sniffer/replayer&lt;/h3&gt;

&lt;p&gt;This is a device which connects to the D0, D1 and GND lines of the Wiegand interface. When it detects a Wiegand transmission, it captures it. Under user control, it can replay that same transmission onto the Wiegand lines. Each transmission is identical, so that will appear the same as a legitimate card swipe.&lt;/p&gt;

&lt;p&gt;Examples of these devices are:&lt;/p&gt;

&lt;!-- TODO --&gt;

&lt;h3 id=&#34;rf-sniffer-replayer&#34;&gt;RF sniffer/replayer&lt;/h3&gt;

&lt;p&gt;The RF sniffer/replayer works identically to the Wiegand sniffer/replayer, but looking at the RF transmissions instead. An antenna or coil picks up transmissions from legitimate cards, stores them and later replays them, impersonating the original card.&lt;/p&gt;

&lt;p&gt;Examples of these devices include:&lt;/p&gt;

&lt;!-- ProxMark, proxclone --&gt;

&lt;h3 id=&#34;or-just-purchase-cards-online&#34;&gt;Or just purchase cards online&lt;/h3&gt;

&lt;p&gt;If you get someone&amp;rsquo;s card number and facility code, the low-tech approach is to simply order an identical card from the manufacturer.&lt;/p&gt;

&lt;!--

http://www.cardquest.com/access-cards_control-cards_id-cards_card-readers/hid/proximity-cards~1.cfm - you can order any facility code and start number - even a free sample!

--&gt;

&lt;h2 id=&#34;mitigations&#34;&gt;Mitigations&lt;/h2&gt;

&lt;p&gt;In a perfect world, you&amp;rsquo;d use a different access control system. If you must use 125kHz prox cards, there are some things you can do to make life more difficult for attackers.&lt;/p&gt;

&lt;!--

** Pot the wiring to make it harder (but still not impossible) to tamper with it from outside. Many readers are potted, but not the wiring.

How to fix this?
Best:
- Use cards with unique cryptographic keys (solves duplicate-card problem)
- Use cards with challenge-response authentication (solves replay)
- Reader needs to be able to authenticate itself to the controller

Better than nothing:
- Encrypt/authenticate the wire
- Ensure that the installation and device make it possible to have a secured (unencrypted) cable while keeping the entire cable run secure. big caveat: most of the time, these cables are run through a ceiling. ceiling cavities are often easily accessible from adjacent offices, corridors and other areas. Of course, ensure that you can&#39;t enter the secured area through the ceiling cavity as well - this would be even sillier.

- Trick someone into reading you their card number

Go through each of the attacks and write down how to prevent them

--&gt;

&lt;h2 id=&#34;security-context&#34;&gt;Security context&lt;/h2&gt;

&lt;p&gt;Does any of this matter?&lt;/p&gt;

&lt;p&gt;It depends entirely on your situation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are your security requirements? Do you just need to keep out random passers-by? Do you have valuables or secrets? What is the impact of a successful attack?&lt;/li&gt;
&lt;li&gt;Is it worthwhile to use a more expensive but secure system?&lt;/li&gt;
&lt;li&gt;Can you add additional authenticators (biometrics or PINs/passwords)?&lt;/li&gt;
&lt;li&gt;Can you mitigate the above threats through other means (cameras, security checkpoints)?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most physical access control systems are subject to the following attacks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bash the door down&lt;/li&gt;
&lt;li&gt;Smash a window and climb in&lt;/li&gt;
&lt;li&gt;Steal an access card&lt;/li&gt;
&lt;li&gt;Coerce someone with access into letting you in (bribes or &amp;lsquo;rubber hose attack&amp;rsquo;)&lt;/li&gt;
&lt;li&gt;Tailgate in after someone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Like all security systems, you must weigh up the costs and risks for your own situation.&lt;/p&gt;

&lt;h2 id=&#34;future-work&#34;&gt;Future work&lt;/h2&gt;

&lt;p&gt;For me, the most interesting future work here is the ability to clone a person&amp;rsquo;s card from a distance. Physical attacks on the reader and wiring make your entry obvious and traceable. Trying to excite a person&amp;rsquo;s card while it&amp;rsquo;s on their body carries risk that you may later be identified as &amp;ldquo;that guy who stood too close to me&amp;rdquo;. RF capture and replay offers a low-risk option.&lt;/p&gt;

&lt;p&gt;To this end, I intend to replicate the work on proxclone.org, focusing on long-range capture of card transmissions. I will document the work and provide costings to make it easier to evaluate the costs of an attack on these systems.&lt;/p&gt;

&lt;!--
Relay Attacks on Passive Keyless Entry Systems in Modern Cars: http://eprint.iacr.org/2010/332.pdf  . I&#39;m waiting for this to be used on PayPass/PayWave.

before release
make sure you use &#39;wiegand&#39;, not &#39;weigand&#39;
never use the term &#39;enrolled&#39;
fix the filename once you decide on a title
check spelling of whole doc

proxmark is probably only popular in australia - maybe US - probably not europe. say something like &#39;where i live, proxmark is the most popular&#39;

followup: r-prox-card-remote-cloning.md is your pocket capture/replay device
followup: r-a-better-prox-system.md
LATER: do an &#39;air interface between card and reader&#39; section between &#39;card&#39; and &#39;reader&#39;; document the HID on-air format

TODO when you write up the wiegand article, note that your apartment and your office use the same facility code (but don&#39;t specify the facility code, as your employer is known from your web page)

TODO can you use chinese prox cards or programmable prox cards in place of hid cards?

--&gt;
</description>
    </item>
    
    <item>
      <title>A quick guide to using MySQL in Python</title>
      <link>https://ianhowson.com/blog/a-quick-guide-to-using-mysql-in-python/</link>
      <pubDate>Sun, 03 Jul 2011 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/a-quick-guide-to-using-mysql-in-python/</guid>
      <description>

&lt;p&gt;Need to access some MySQL databases in Python &lt;em&gt;right now&lt;/em&gt;? As in &lt;em&gt;now, really, I don&amp;rsquo;t have time to read stuff, and please stop rambling because you&amp;rsquo;re wasting my time&lt;/em&gt; now? Read on!&lt;/p&gt;

&lt;h2 id=&#34;getting-started&#34;&gt;Getting started&lt;/h2&gt;

&lt;p&gt;Access to MySQL databases is through the MySQLdb module. It&amp;rsquo;s available in the python-mysqldb package for Debian/Ubuntu users.&lt;/p&gt;

&lt;p&gt;Your first step in any Python code is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;import&lt;/span&gt; MySQLdb&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Python database access modules all have similar interfaces, described by the &lt;a href=&#34;http://www.python.org/dev/peps/pep-0249/&#34;&gt;Python DB-API&lt;/a&gt;. Most database modules use the same interface, thus maintaining the illusion that you can substitute your database at any time without changing your code. I suspect that anyone doing this in reality has failed &lt;i&gt;with hilarious consequences&lt;/i&gt;, but nonetheless&amp;hellip;&lt;/p&gt;

&lt;p&gt;Create the connection with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;db = MySQLdb.connect(host=&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;localhost&amp;#34;&lt;/span&gt;, port=&lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;3306&lt;/span&gt;, user=&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;foo&amp;#34;&lt;/span&gt;, passwd=&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;bar&amp;#34;&lt;/span&gt;, db=&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;qoz&amp;#34;&lt;/span&gt;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;substituting appropriate local values for each argument.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;db&lt;/code&gt; is now a handle to the database. Normally, you&amp;rsquo;ll create a cursor on this handle like so:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;cursor = db.cursor()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;MySQL doesn&amp;rsquo;t really support cursors in any sense that&amp;rsquo;s useful to us here, but the DB-API requires that you interface to them that way. So just copy and paste the line into your code.&lt;/p&gt;

&lt;h2 id=&#34;queries&#34;&gt;Queries&lt;/h2&gt;

&lt;p&gt;To execute queries:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;cursor.execute(&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;SELECT name, phone_number FROM coworkers WHERE name=&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt; AND clue &amp;gt; &lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt; LIMIT 5&amp;#34;&lt;/span&gt;, (name, clue_threshold))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;String interpolation is a bit different here. You can still use Python&amp;rsquo;s built-in interpolation and write something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;cursor.execute(&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;SELECT name, phone_number FROM coworkers WHERE name=&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39; AND clue &amp;gt; &lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;%d&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt; LIMIT 5&amp;#34;&lt;/span&gt; % (name, clue_threshold))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;but the DB-API interpolation will automatically quote things and guard you from SQL injection attacks, to some extent. If you had a name value of &lt;code&gt;&amp;quot;&#39;; DELETE FROM coworkers;&amp;quot;&lt;/code&gt; in the first case, you&amp;rsquo;d be fine (as the single-quote character would be auto-quoted), but you might run into some slight data loss in the second case.&lt;/p&gt;

&lt;p&gt;SQL queries are a good place to use Python&amp;rsquo;s multi-line strings, so you can write something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;cursor.execute(&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#34;&amp;#34;&amp;#34;SELECT name, phone_number 
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;                  FROM coworkers 
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;                  WHERE name=&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt; 
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;                  AND clue &amp;gt; &lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;%s&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt; 
&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;                  LIMIT 5&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,
               (name, clue_threshold))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;if you want to get fancy about it.&lt;/p&gt;

&lt;p&gt;The DB-API quoting seems to work best when using %s quoting exclusively (even for numbers). I&amp;rsquo;m not exactly sure why.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cursor.execute()&lt;/code&gt; will return the number of rows modified or retrieved, just like in PHP.&lt;/p&gt;

&lt;p&gt;When performing a SELECT query, each row is represented in Python by an array. For the above SELECT query with columns &amp;lsquo;name&amp;rsquo; and &amp;lsquo;phone_number&amp;rsquo;, you&amp;rsquo;ll end up with something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;[&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Bob&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;9123 4567&amp;#39;&lt;/span&gt;]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;cursor.fetchall()&lt;/code&gt; will return you an array containing each row in your query results. That is, you get an array of arrays. So the above SELECT query might give you:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;[[&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Bob&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;9123 4567&amp;#39;&lt;/span&gt;], [&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Janet&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;8888 8888&amp;#39;&lt;/span&gt;]]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The easiest thing to do with this is to iterate with something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;data = cursor.fetchall()
&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;for&lt;/span&gt; row in data:
    do stuff&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also use &lt;code&gt;cursor.fetchone()&lt;/code&gt; if you want to retrieve one row at a time. This is handy when you&amp;rsquo;re doing queries like &lt;code&gt;&amp;quot;SELECT COUNT(*) ...&amp;quot;&lt;/code&gt; which only return a single row.&lt;/p&gt;

&lt;h2 id=&#34;cleanup&#34;&gt;Cleanup&lt;/h2&gt;

&lt;p&gt;Finally, &lt;code&gt;db.close()&lt;/code&gt; will close a database handle. I only mention this because some versions of MySQLdb don&amp;rsquo;t garbage collect correctly, so you can run out of database connections if you&amp;rsquo;re not careful.&lt;/p&gt;

&lt;p&gt;My own experience has been that exceptions make it extremely difficult to clean up fully by hand; you always end up leaking a connection here or there. I get around this by manually invoking the Python garbage collector:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;import&lt;/span&gt; gc 
gc.collect()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;which will close off any old MySQL connections. You could do it just before creating a new connection.&lt;/p&gt;

&lt;h2 id=&#34;getting-your-results-as-a-dictionary&#34;&gt;Getting your results as a dictionary&lt;/h2&gt;

&lt;p&gt;The Python DB-API doesn&amp;rsquo;t have a mysql_fetch_assoc() function like PHP. mysql_fetch_assoc() would return an associative array/dictionary containing the results of a SELECT query, like so:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;[name: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;Bob&amp;#39;&lt;/span&gt;, phone_number: &lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#0ff;font-weight:bold&#34;&gt;&amp;#39;9123 4567&amp;#39;&lt;/span&gt;]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The nice thing about this is that you can write code like &lt;code&gt;if row[&#39;name&#39;] == &#39;blah&#39;:&lt;/code&gt;, instead of being dependent on the row ordering in the query.&lt;/p&gt;

&lt;p&gt;I wrote this little function to do the same in Python. It&amp;rsquo;s MySQL-specific, which is why there&amp;rsquo;s no mysql_fetch_assoc() equivalent in the DB-API already:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;def&lt;/span&gt; FetchOneAssoc(cursor):
    data = cursor.fetchone()
    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;if&lt;/span&gt; data == None:
        &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;return&lt;/span&gt; None
    desc = cursor.description

    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;dict&lt;/span&gt; = {}

    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;for&lt;/span&gt; (name, value) in &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;zip&lt;/span&gt;(desc, data):
        &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;dict&lt;/span&gt;[name[&lt;span style=&#34;color:#ff0;font-weight:bold&#34;&gt;0&lt;/span&gt;]] = value

    &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#fff;font-weight:bold&#34;&gt;dict&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>A few notes on the Lenovo X220</title>
      <link>https://ianhowson.com/blog/a-few-notes-on-the-lenovo-x220/</link>
      <pubDate>Mon, 13 Jun 2011 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/a-few-notes-on-the-lenovo-x220/</guid>
      <description>&lt;p&gt;I ordered off an eBay seller in the US. Lenovo Australia doesn&amp;rsquo;t even &lt;em&gt;list&lt;/em&gt; the X220 yet (and they charge almost double what eBay sellers do.) So far, I&amp;rsquo;ve ordered one laptop from Lenovo directly and two from eBay sellers. So far, eBay is much cheaper and a little faster, despite this one getting stuck in Customs for about a month.&lt;/p&gt;

&lt;p&gt;It&amp;rsquo;s damn fast, and I can&amp;rsquo;t explain why. My T410 had a first-gen i5 and NVIDIA graphics. This has a second-gen i5 and Intel 3000, but once I stick in my LUKS password, it takes about a &lt;em&gt;second&lt;/em&gt; to reach the login screen.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m running Ubuntu Natty. All of the hardware just works. Suspend doesn&amp;rsquo;t, despite what&amp;rsquo;s listed on the Ubuntu Wiki, but installing PPA kernel 2.6.39rc4 fixes things. VMware doesn&amp;rsquo;t work with this kernel, but the patch on &lt;a href=&#34;http://weltall.heliohost.org/wordpress/2011/05/14/running-vmware-workstation-player-on-linux-2-6-39-updated/&#34;&gt;this page&lt;/a&gt; fixes &lt;em&gt;that&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I still get the occasional hard lock or failure to wake from suspend. The graphics driver seems to be the cause of most problems. There are occasional glitches like when opening the screen, occasionally you get random patterns (though the mouse pointer looks sane.) Switching to the console and back sometimes fixes it; closing the lid and opening sometimes fixes it; suspending and resuming sometimes fixes it, but just &lt;em&gt;occasionally&lt;/em&gt;, I have to reboot. I&amp;rsquo;ve never had random patterns on the DisplayPort, only the internal screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update, 04 April 2012&lt;/strong&gt;: I compiled a 3.0.22 kernel for Ubuntu Oneiric which has the RC6 fix. It&amp;rsquo;s been rock-solid and power consumption is 7-12W most of the time. Very happy. Ubuntu Precise should have the same fix, but I haven&amp;rsquo;t tested it yet.&lt;/p&gt;

&lt;p&gt;The IPS screen (&amp;ldquo;HD Premium&amp;rdquo;) looks &lt;em&gt;amazing&lt;/em&gt;, even better than my MacBook Pro thanks to the matte filter. There isn&amp;rsquo;t great mechanical isolation between the frame and the screen, so you get shimmering effects if you twist the screen or press the edges. The 16:9 ratio isn&amp;rsquo;t ideal, but it fits side-by-side 80 column terminal/gvim with Terminus 12, and that covers 90% of my usage.&lt;/p&gt;

&lt;p&gt;The backlight&amp;rsquo;s LED PWM controller runs at a sometimes-visible frequency. Seriously, people, 500Hz plus. There&amp;rsquo;s no good reason for LEDs to visibly flicker, EVER. Protip: if you run them at a constant current, you&amp;rsquo;ll achieve even better efficiency, and that means free battery life.&lt;/p&gt;

&lt;p&gt;The keyboard feels a bit better than the T410; less mushy. They must be changing the keyswitches or something between models, because it looks almost identical physically.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s no eSATA port on the machine, but it does work through the dock.&lt;/p&gt;

&lt;p&gt;DisplayPort works through the dock, too, unlike the T410. DisplayPort works happily with the 2560x1440 monitor at work.&lt;/p&gt;

&lt;p&gt;When you plug in a DisplayPort monitor, it shows up instantly (and potentially switches it on.) This is a massive improvement over mouse clicking through the NVIDIA control panel. &lt;code&gt;xrandr --auto&lt;/code&gt; works, as it should.&lt;/p&gt;

&lt;p&gt;The VGA output is reported to not work, but I had no issues. I finally have full-screen Flash videos. They didn&amp;rsquo;t work on NVIDIA. I &lt;strong&gt;don&amp;rsquo;t&lt;/strong&gt; have 3D acceleration in VMware, apparently, but who cares?&lt;/p&gt;

&lt;p&gt;The ThinkLight is brighter than before, but again, one must ask the question: who cares? You have a screen illuminating the keyboard or reading material or whatever. I suppose you could turn off the screen and use the ThinkLight while reading a book, but a $5 book light will achieve the same function and not run down your laptop battery. Remove it and add an ambient light sensor; they&amp;rsquo;re useful &lt;em&gt;and&lt;/em&gt; save battery power.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m hitting the touchpad a bit with my palm. Will probably disable it.&lt;/p&gt;

&lt;p&gt;The touchpad-with-integrated-buttons thing doesn&amp;rsquo;t really work. I mean, yeah, there&amp;rsquo;s clearly not enough room there for buttons. But the moment you touch the button like you&amp;rsquo;re going to press it, the pointer jitters all over the place. This makes it tough to actually click anything, which is sort of the point of having buttons. I recommend just using the upper buttons and ignoring the integrated buttons. Having the integrated buttons there is no worse than leaving them off, but this shouldn&amp;rsquo;t have made it into the product. The HP Mini puts the buttons on the sides of the touchpad, and that works pretty well; I think that Lenovo should do the same thing on the X220++.&lt;/p&gt;

&lt;p&gt;One &lt;em&gt;baffling&lt;/em&gt; bug that I&amp;rsquo;ve experienced is that my VMware virtual machines won&amp;rsquo;t start (&amp;lsquo;Unable to change virtual machine power state: Cannot find a valid peer process to connect to&amp;rsquo;) if the machine is in the dock. I have to undock, start the VMs and plug it back in. Virtualbox is fine.&lt;/p&gt;

&lt;p&gt;I had a &lt;strong&gt;lot&lt;/strong&gt; of trouble getting the thing to boot. For a few weeks, I carried around a USB stick with the System Rescue CD on it, just in case I needed to reboot. (Combined with not working out the suspend problem for a while, I was just leaving the machine running in my backpack for extended periods.) The X220 uses the newer EFI firmware standard, and it appears that Lenovo&amp;rsquo;s implementation won&amp;rsquo;t legacy boot from a GPT-partitioned disk. No idea why &amp;ndash; the T410 is perfectly happy with this arrangement. Once I worked that out, I converted back to MBR (gdisk makes this fairly safe), reinstalled GRUB, and things started working.&lt;/p&gt;

&lt;p&gt;I did spend a lot of time trying to make it boot in EFI mode. I could get both GRUB2 and ELILO to start up, but the moment they tried to execute the kernel, nothing. ELILO would reboot and GRUB2 would just hang. Natty is not really set up for EFI booting. There&amp;rsquo;s no consistent mount point for the EFI System Partition, so kernel updates are likely to fail, and building a startup disk yields an EFI-style startup disk that doesn&amp;rsquo;t work, either.&lt;/p&gt;

&lt;p&gt;Battery life is fantastic &amp;ndash; with the 90W battery, 11 hours is quite achievable, more if you dim down the screen and turn off WiFi. Of course, with the 90W battery, it feels like you&amp;rsquo;re carrying just the battery and there happens to be a screen hanging off it. Following the suggestions in Powertop helps a lot. There&amp;rsquo;s a &lt;a href=&#34;http://code.google.com/p/chromium/issues/detail?id=77625&#34;&gt;nasty bug in Chrome&lt;/a&gt; which ruins battery life &amp;ndash; it increases power consumption from about 8W to 16W (i.e. you will achieve &lt;em&gt;half&lt;/em&gt; of your battery life, or &lt;em&gt;just running Chrome&lt;/em&gt; is &lt;em&gt;doubling&lt;/em&gt; the machine&amp;rsquo;s power consumption.) Firefox doesn&amp;rsquo;t have this problem, but Firefox renders a lot of stuff strangely on Linux, so I guess I&amp;rsquo;ll live with it for now.&lt;/p&gt;

&lt;p&gt;I had a bit of trouble finding out what the wireless card was from the eBay seller &amp;ndash; he just said &amp;lsquo;wireless N&amp;rsquo; when I asked (repeatedly.) It is a:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter (rev 01)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It continues Realtek&amp;rsquo;s proud tradition of making really fucking awful network cards. When copying a large file, it reads less than 1MB/sec and jumps around a lot. Sending a file gets peaks of 3MB/sec then drops to near-zero for a while. This is noticeable during &lt;em&gt;web browsing&lt;/em&gt;, where if you load a bunch of pages at once the whole lot will stop.&lt;/p&gt;

&lt;p&gt;I swapped in the wireless card (Intel Ultimate-6300) from my T410 and things are much better &amp;ndash; 5MB/sec down, no pauses. I realised that I don&amp;rsquo;t have the white MIMO antenna cable &amp;ndash; I didn&amp;rsquo;t think to order a 3x3 WiFi antenna. I&amp;rsquo;m not sure why it&amp;rsquo;s an optional extra, especially as the WWAN antennae are always installed. I connected one of the WWAN antennae to the MIMO socket and it bumped throughput to about 7MB/sec, which I&amp;rsquo;m happy with. Lesson learned: don&amp;rsquo;t cheap out on the wireless card or antennae. I&amp;rsquo;m pretty sure that the eBay seller removed the original card and sold it separately &amp;ndash; the same happened with my X61s, and I&amp;rsquo;m doing the same with the T410 when &lt;em&gt;I&lt;/em&gt; sell it.&lt;/p&gt;

&lt;p&gt;When I reassembled the machine after messing with the WiFi cards, I had trouble getting the right-hand edge of the palmrest to sit flat. It turns out that the antenna cables taped just underneath had shifted. There&amp;rsquo;s no hook or plastic to hold them in position; you just have to tape them in the right spot to line up with the channel in the palmrest.&lt;/p&gt;

&lt;p&gt;Interestingly, the WLAN LED works with the Intel card, where it didn&amp;rsquo;t on the Realtek. It is a bit distracting when watching movies in the dark. It can be disabled with the &lt;code&gt;led_mode&lt;/code&gt; parameter to the &lt;code&gt;iwlagn&lt;/code&gt; module.&lt;/p&gt;

&lt;p&gt;The 7mm high drive bay (instead of the usual 9mm) could be a problem for some. It doesn&amp;rsquo;t bother me so much as I use Intel SSDs. With the 80GB MicroSSD option, though, I can see the sense in installing a huge spinning disk and using the 80GB to boot from. Except that there aren&amp;rsquo;t really any huge 7mm drives. So an SSD is your best bet, both on capacity and performance grounds (the 300 and 600GB G3 models both cost less than my 160GB G2!) The main place where I expect it to bite me is if I have a drive failure and need to buy/install something &lt;strong&gt;fast&lt;/strong&gt;; I can&amp;rsquo;t just buy a drive from any old computer store, whack it in and restore from backups. I need to have the drive that shipped with the machine.&lt;/p&gt;

&lt;p&gt;To handle this, I installed the shipping drive in an eSATA drive box. I have my root partition (LUKS-encrypted, including /home) set up as a RAID1. When I get to work, I hot-add the external drive to the RAID. It background syncs; when it&amp;rsquo;s complete, I can (theoretically) remove the drive from the box and plug it directly into my laptop to replace the failed SSD. The external drive is a lot slower than the SSD (due to the drive itself, not the interface) so I use the &lt;code&gt;--write-mostly&lt;/code&gt; parameter to &lt;code&gt;mdadm&lt;/code&gt;. It still slows things down a little, but it&amp;rsquo;s rarely an issue.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>File synchronisation algorithms</title>
      <link>https://ianhowson.com/blog/file-synchronisation-algorithms/</link>
      <pubDate>Wed, 18 Jun 2008 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/file-synchronisation-algorithms/</guid>
      <description>

&lt;p&gt;You have two filesystem trees, A and B. You want the files on both sides to be the same.&lt;/p&gt;
&lt;p&gt;Cases that you need to handle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;File exists on A but not on B (and vice-versa)&lt;/li&gt;
&lt;li&gt;File exists on both and is identical&lt;/li&gt;
&lt;li&gt;File exists on both and is different&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Right about this point in time, you&amp;#8217;re in trouble. (That was fast!) Only one of those situations can be handled automatically, and that&amp;#8217;s if the file is identical on both sides. You need a lot of user input to figure out what the directories should look like, and users tend to say &amp;#8220;too hard!&amp;#8221; Unison assumes that if a file is present on one side and not on the other, it has just been created. So it copies it across. Already we&amp;#8217;re in dangerous territory because this is frequently &lt;strong&gt;not&lt;/strong&gt; what you want to do.&lt;/p&gt;
&lt;p&gt;If the file exists and is different, you have to ask the user how to merge them or which one to pick. Asking regular users how to merge files is a bad idea. (Asking &lt;strong&gt;developers&lt;/strong&gt; how to merge files is usually a bad idea.)&lt;/p&gt;
&lt;p&gt;Sigh.&lt;/p&gt;
&lt;p&gt;This algorithm is not going to work very well. It doesn&amp;#8217;t handle any common cases, makes a lot of mistakes in its assumptions, and asks users too much information (which will probably be wrong anyway). Anyone using this algorithm in their synchronization product (&lt;em&gt;*cough* Microsoft *cough*&lt;/em&gt;) is going to have a lousy product.&lt;/p&gt;
&lt;p&gt;(Don&amp;#8217;t get me wrong. I like Office. I like many Microsoft games. I&amp;#8217;m not anti-Microsoft at all. It&amp;#8217;s just Sturgeon&amp;#8217;s Law: 90% of everything is crap.)&lt;/p&gt;
&lt;p&gt;Unfortunately, this case is unavoidable on the very first synchronization of a pair of trees. We have no history data -- even disconnected history data -- and so cannot make informed decisions about what&amp;#8217;s new, deleted or changed. The files just &lt;em&gt;are&lt;/em&gt; or they &lt;em&gt;are not&lt;/em&gt; and we can&amp;#8217;t say which of the two trees is correct.&lt;/p&gt;
&lt;p&gt;The next refinement is to store &lt;em&gt;history data&lt;/em&gt; when you look at the file trees. Every time you perform a synchronization you record some metadata for each file. You want to store the filename and the modification time. That way, when you do the &lt;strong&gt;next &lt;/strong&gt;synchronization, you look at what changed between time X and time Y and apply &lt;strong&gt;those&lt;/strong&gt; changes to the remote file tree, somewhat like generating a diff and then patching a tree. You do this twice -- once for each direction (A to B and B to A). You can get conflicts, of course.&lt;/p&gt;
&lt;p&gt;Conceptually, this looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/fsa-algo2.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Compare this with the first algorithm, which looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/fsa-algo1.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that if you have no history data, Algorithm 2 works exactly like Algorithm 1. Badly.&lt;/p&gt;
&lt;p&gt;This all operates much like a version control system and has similar problems and implications. A VCS usually can&amp;#8217;t detect renames of files or directories -- you have to explicitly tell the VCS what you&amp;#8217;ve done. When you want to perform a synchronization you have to traverse the entire directory tree to find out what&amp;#8217;s changed -- and this can be very time-consuming. The metadata has to be stored somewhere. Merges almost always require manual intervention and will often be unresolvable (either the user won&amp;#8217;t know what to do and will just overwrite one side, or the file format won&amp;#8217;t support lines-of-text style merging).&lt;/p&gt;
&lt;p&gt;Also note the similar distinction between traditional client-server VCS (e.g. CVS, Perforce) and modern distributed VCS (Mercurial, git). Client-server VCS and propagates the nodes (or the actual files being worked on). Distributed VCS propagates the edges (or diffs). Algorithm 1 is looking purely at the file data and attempting to match it on both sides; algorithm 2 is looking at the changes between the &amp;#8217;sync points&amp;#8217; (or nodes) and propagating the changes.&lt;/p&gt;
&lt;p&gt;The actions table for each file looks something like:&lt;/p&gt;
&lt;table class=&#39;ui celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;File A change&lt;/th&gt;
      &lt;th&gt;File B change&lt;/th&gt;
      &lt;th&gt;Action&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&amp;nbsp;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Created (checksum P)&lt;/td&gt;
      &lt;td&gt;Created (checksum P)&lt;/td&gt;
      &lt;td&gt;Nothing&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Created (checksum P)&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Created (checksum Q)&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Merge&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Deleted&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;No change&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Delete&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Deleted&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Deleted&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Nothing&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;No change&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;No change&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Nothing&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Modified&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;No change&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Use file A&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Modified&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Modified&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Merge&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(The actions for File A and File B can be interchanged -- I didn&amp;#8217;t feel like writing out those cases twice.)&lt;/p&gt;
&lt;p&gt;If you include the possibility of renames (and horror of horrors, renames with modifies) then you can get a whole lot more combinations and it gets really nasty. I must give kudos to SourceGear for Vault for this: it does handle all of those nasty cases, a headache which I can do without.&lt;/p&gt;
&lt;p&gt;Detecting what&amp;#8217;s happened between time X and time Y is similarly mechanical. For a given file:&lt;/p&gt;
&lt;table class=&#39;ui celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Time X&lt;/th&gt;
      &lt;th&gt;Time Y&lt;/th&gt;
      &lt;th&gt;Change&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Does not exist&lt;/td&gt;
      &lt;td&gt;Exists&lt;/td&gt;
      &lt;td&gt;Created&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Exists&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Does not exist&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Deleted&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Checksum P&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Checksum P&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Nothing&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td valign=&#34;top&#34;&gt;Checksum P&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Checksum Q&lt;/td&gt;
      &lt;td valign=&#34;top&#34;&gt;Modified&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Without having looked at the source code, I’d say this is the algorithm that Unison uses. I’d also guess that most ‘proper’ synchronization programs use this. It’s the simplest thing that works in most cases.&lt;/p&gt;
&lt;p&gt;Note that you also need to be able to reliably detect a change in a file. The (almost) infallible way to do this is to hash the file. I say almost because hash collisions do happen -- they’re just extremely rare. ‘Extremely rare’ becomes a lot more common when you’re talking about a million files (32 bits of hash is not enough).&lt;/p&gt;
&lt;p&gt;The other option is to look at the modification time of the file. Software can and does manipulate the modtime, however, and you might miss changes. Users might change the system time and confuse your sync program (if a change was made a long time ago). You might not be syncing to a device that has a real-time clock (some mobile phones, notably). You also have to sync the times between the two systems, but that’s not too hard.&lt;/p&gt;
&lt;p&gt;Aaaanyway, the gist of it is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Checksums: reliable, slow (you have to read the entire contents of every file)&lt;/li&gt;
&lt;li&gt;Modification time: less reliable, much faster&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some filesystems such as JFFS2 keep a revision number on each block (roughly). If the revision number goes up, you can be assured that a write has happened regardless of what the modification time says. This is not a common feature, however, and probably not accessible to userspace programs anyway. There’s no easy solution here.&lt;/p&gt;

&lt;h2 id=&#34;it-still-sucks-how-to-make-it-usable&#34;&gt;It still sucks. How to make it usable&lt;/h2&gt;

&lt;p&gt;Algorithm 2 (a.k.a. &amp;#8216;what everyone is using&amp;#8217;) has some shortcomings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Detecting changes takes a long time&lt;/li&gt;
&lt;li&gt;It won&amp;#8217;t detect renames or directory moves&lt;/li&gt;
&lt;li&gt;There are still some cases where you need to resolve conflicts and/or merge files&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are also some usability issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You need to manually initiate a sync. You can&amp;#8217;t just pick up your laptop and go anytime.&lt;/li&gt;
&lt;li&gt;Performance sucks. I may have mentioned that a dozen or so times.&lt;/li&gt;
&lt;li&gt;There&amp;#8217;s nothing to stop you modifying a file on both sides; you have to remember which is the most recent and remember to sync before working on the other machine.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&amp;#8217;s how I&amp;#8217;ll fix these problems.&lt;/p&gt;
&lt;h3&gt;Constantly monitor for changes&lt;/h3&gt;
&lt;p&gt;The existing tools require you to manually initiate a sync, at which point you&amp;#8217;ll have a few minutes of disk grinding. I&amp;#8217;d rather have the program running constantly and being notified of changes as they happen. The common case is that only a few files will change between syncs -- reading &lt;strong&gt;all&lt;/strong&gt; of the files is inefficient.&lt;/p&gt;
&lt;p&gt;What I want is an API that notifies me when files change (or are created or deleted). I think &lt;a href=&#34;http://en.wikipedia.org/wiki/Inotify&#34;&gt;inotify&lt;/a&gt; will do this, perhaps &lt;a href=&#34;http://savannah.nongnu.org/projects/fam/&#34;&gt;FAM&lt;/a&gt;. I have no idea what to use on Windows or OSX yet. On a technical level, this is an unsolved problem.&lt;/p&gt;
&lt;p&gt;There is a risk here that if files are modified while the application is not running (and hence not receiving notifications) the modifications could be lost.&lt;/p&gt;
&lt;p&gt;The fallback option is to scan the file trees while the machine is idle. If you&amp;#8217;re checksumming files to detect changes, this can happen during idle time as well.&lt;/p&gt;
&lt;p&gt;I think idle time is a grossly underutilized resource right now -- we could be doing virus scanning, file indexing, backups and the like &lt;strong&gt;constantly&lt;/strong&gt; instead of at intervals (3am cronjob) or while the user is trying to use the system (like most on-demand virus scanners).&lt;/p&gt;
&lt;h3&gt;Constantly synchronize changes&lt;/h3&gt;
&lt;p&gt;If you&amp;#8217;re going to scan all of the time, you might as well copy files straight away rather than waiting until the user requests a sync. This will cut down the odds of a merge conflict somewhat, since the files are less likely to be modified simultaneously on both sides. This introduces the idea of a pair of machines being &lt;em&gt;connected&lt;/em&gt;; while they are connected, their files are always synchronized. Since you&amp;#8217;re probably modifying small amounts of data at a time, this will work reasonably well over a slow network connection.&lt;/p&gt;
&lt;h3&gt;Lock in-use files&lt;/h3&gt;
&lt;p&gt;Another way to prevent merge conflicts is to lock a file on machine A if it&amp;#8217;s being written to on machine B. This prevents an application on machine A from modifying it at the same time.&lt;/p&gt;
&lt;h3&gt;Identify machines by a UUID rather than IP address&lt;/h3&gt;
&lt;p&gt;A common situation is to have a laptop and a desktop that you want synchronized together. You have the laptop at home and sync the files. You take the laptop to work, but because you&amp;#8217;re on a different IP the sync program thinks it&amp;#8217;s a different machine. If you give each machine a UUID or name, you can be (reasonably) sure of its identity and hence use the right indexes or file trees.&lt;/p&gt;
&lt;h3&gt;Checksum files or their metadata in order to detect renames&lt;/h3&gt;
&lt;p&gt;If you&amp;#8217;ve got a checksum of each file (or just the modification time and size) and you detect a file deletion, you can look through any new files and see if they&amp;#8217;re actually the same file. You can then infer that a file was moved or renamed rather than deleted and a new file created, saving time and bandwidth during the synchronization. It may be possible to optimize this further by looking at inode numbers or their equivalent on whatever filesystem is in use.&lt;/p&gt;

&lt;h2 id=&#34;later-reflections&#34;&gt;Later reflections&lt;/h2&gt;

&lt;p&gt;In my classic inability to actually focus on a single task for any length of time, I&amp;#8217;ve been working on SyncDroid.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ve been attacking the tricky areas of data storage and what I refer to as the &amp;#8216;datapath&amp;#8217; -- the chain of events that takes place between a change occuring on a computer and it propagating (across physical space and time) to another computer . I can partly explain why nobody has done this before: it&amp;#8217;s really tricky.&lt;/p&gt;
&lt;p&gt;Unison (and most other synchronizers) make some simplifying assumptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There is always a master computer and a slave computer&lt;/li&gt;
&lt;li&gt;We only care about what is happening at this exact moment in time&lt;/li&gt;
&lt;li&gt;We can synchronize the times on the two computers when the synchronization occurs&lt;/li&gt;
&lt;li&gt;We can suck up as much CPU and IO time as we like while synchronization takes place&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unfortunately, none of these are true for SyncDroid. They have interesting consequences.&lt;/p&gt;
&lt;h3&gt;There is always a master and a slave&lt;/h3&gt;
&lt;p&gt;This makes configuration management really easy: you always look to the master computer. In network-connect hosts, the master is (by definition) contactable, so you can just tell it to update its configuration with any changes made on the slave end.&lt;/p&gt;
&lt;p&gt;SyncDroid doesn&amp;#8217;t have this luxury. In the case of USB-drive synchronization, the two computers cannot just tell each other about changes. So there&amp;#8217;s an interesting sub-synchronization problem: in order to know what &lt;em&gt;data&lt;/em&gt; we need to synchronize, we need to synchronize the &lt;em&gt;configuration &lt;/em&gt;first.&lt;/p&gt;
&lt;h3&gt;We only care about a single moment in time&lt;/h3&gt;
&lt;p&gt;There&amp;#8217;s really only one trap if you use this assumption: files might change between the time you detect a change and when you actually synchronize it. This is easy to solve if you take out an exclusive lock on the file-being-synchronized and ensure that it still looks like it did when you scanned it.&lt;/p&gt;
&lt;p&gt;SyncDroid cares about lots of points in time. Because it syncs constantly, we have to be very careful about what state we &lt;em&gt;think &lt;/em&gt;a file is in versus what state it &lt;em&gt;actually&lt;/em&gt; is in. If you&amp;#8217;re doing syncs to multiple partners, you have to keep track of all relevant metadata for all partners. If a partner goes away -- say the user loses the USB drive -- we shouldn&amp;#8217;t waste time and resources tracking data that will never be used. And we can&amp;#8217;t just rescan things constantly or lock files because that would hurt performance (or make it impossible for users to actually do work). I&amp;#8217;m a user of this thing, too, and if it doesn&amp;#8217;t perform acceptably, I won&amp;#8217;t use it!&lt;/p&gt;
&lt;h3&gt;We can synchronize computer times easily&lt;/h3&gt;
&lt;p&gt;On a network-connected synchronizer, this is easy. You run some variation of the NTP protocol between the two hosts and calculate an offset so that you don&amp;#8217;t disturb the user&amp;#8217;s clock. You can then work out relative change timings and the best course of action.&lt;/p&gt;
&lt;p&gt;Because this version of SyncDroid works over USB drives, it can&amp;#8217;t synchronize times easily. I get around that with a &amp;#8216;mountcount&amp;#8217; -- it&amp;#8217;s just a number that is incremented every time the metadata on a drive is loaded. RAID arrays use the same idea to detect drives that were unplugged from an array and are now out-of-sync with the rest of the array. Each computer using a USB drive can then use the mountcount to determine relative change times without being dependent on the computer&amp;#8217;s clock, which will probably be wrong.&lt;/p&gt;
&lt;p&gt;The consequence of the mountcount is that multiple access to the metadata is strictly forbidden. This is reasonably easy to ensure and shouldn&amp;#8217;t be visible to the user.&lt;/p&gt;
&lt;h3&gt;We can suck up as much CPU and IO time as we like&lt;/h3&gt;
&lt;p&gt;This is a big one, and it&amp;#8217;s one of the major reasons I started this project. None of the current synchronizers are sensitive to the user. Perhaps I&amp;#8217;m a dreamer, but I would like my files to be synchronized without taking a massive hit in PC performance (or battery life).&lt;/p&gt;
&lt;p&gt;Unison (as well as most synchronizers) will do exactly  what you tell them to. If you say &amp;#8217;scan for changes&amp;#8217;, they will scan &lt;em&gt;right now&lt;/em&gt;. If you say propagate changes, they will propagate right now. While they are working, the computer is struggling under massive IO load, and if you have large amounts of data (like I do) that could lead to several minutes where the disk is spinning and you can&amp;#8217;t use the computer and you have to sync &lt;em&gt;right now because your plane is leaving but it&amp;#8217;s still running and argh I&amp;#8217;m going to be late&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;SyncDroid has a fairly involved set of priorities to determine under what circumstances it should scan and sync and bookkeep. For example, it has two scanner types: a notification scanner (which uses the OS to determine when files have changed) and a comprehensive scanner (in case SyncDroid wasn&amp;#8217;t running and you changed a file). The notification scanner runs all of the time, but if you&amp;#8217;re on battery or using the computer, it just remembers the changes in RAM and gets out of the way as quickly as possible. The comprehensive scanner only runs when the computer is connected to power and you&amp;#8217;re not using it.  In this way, you get the effect of non-stop change scanning without any perceptible difference to your computer&amp;#8217;s responsiveness.&lt;/p&gt;
&lt;p&gt;There is a big &amp;#8216;but&amp;#8217; here, and it&amp;#8217;s one of those annoying engineering tradeoffs: if you are not aggressive enough about scanning, you will miss changes (say, the user disconnects their laptop without warning). If you are too aggressive, you&amp;#8217;ll slow down the computer. The trick is to find a set of tradeoffs that works well in most circumstances. In those cases that it &lt;em&gt;doesn&amp;#8217;t &lt;/em&gt;work, you can warn the user and give them an opportunity to fix the problem (by plugging the laptop back into the network for a minute, for example).&lt;/p&gt;
&lt;h3&gt;Data Storage&lt;/h3&gt;
&lt;p&gt;And then, there&amp;#8217;s the hairy issue of where to put all of this data that we&amp;#8217;re collecting. What we have is roughly a parallel filesystem to the one on the disk: for a file, we want to store some metadata. The best way to store this, from a design point of view, would be to store it in the filesystem itself, but this is impractical for a number of reasons (don&amp;#8217;t want to change the user-visible view of their data, no filesystem support, differing semantics between systems, and so on).&lt;/p&gt;
&lt;p&gt;So we have to create a filesystem within a filesystem. It&amp;#8217;s another meta-problem like the sub-synchronization problem in configuration management. I considered doing this in the literal fashion -- creating an image on disk with a virtual ext2 filesystem. Instead of files, there would be structs of metadata that I had collected. Licensing issues were, well, issues here, and it would require me to maintain a fairly complicated data access layer. The big technical problem is that contemporary filesystem assume a constant-sized disk, while I wanted to be able to expand and shrink the image size dynamically.&lt;/p&gt;
&lt;p&gt;My stopgap solution (while this is all stubbed out in my code) is to use a YAML file. I adore YAML. It is not a high-performance data access layer, however, and it was not designed as such. It&amp;#8217;s just very easy to use.&lt;/p&gt;
&lt;p&gt;Another option was a custom C data type -- or, phrased another way, &amp;#8216;write my own filesystem&amp;#8217;. Lots of effort. Transaction management is a big hairy problem that I don&amp;#8217;t want to get into.&lt;/p&gt;
&lt;p&gt;Finally, SQLite. I love SQLite -- it&amp;#8217;s very easy to use and gives you very powerful query functionality. It handles on-disk consistency well and -- used sensibly -- can be very high-performance.&lt;/p&gt;
&lt;p&gt;Many applications, sadly, do not use SQLite in a sensible fashion. (I&amp;#8217;m looking at you, Meta-Tracker). Like any SQL database, you can do silly things to it that will absolutely destroy its performance characteristics. A classic in this situation is if you want a directory listing and your rows look like { filename | data }; the database needs to do a &amp;#8217;starts-with&amp;#8217; check on each row in the database because there&amp;#8217;s no easy way to index efficiently by filename &lt;strong&gt;and &lt;/strong&gt;retain simplistic tree-searching operations. This is Really Really Slow.&lt;/p&gt;
&lt;p&gt;My current plan is to solve this by implementing a more traditional inode/parent structure within my database schema. I have the big advantage of knowing exactly which operations are necessary (read record by path+name, write record by id, create record by path+name, list children by path) and so can optimise specifically for them.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Getting Started with the BlueSMiRF Silver V2 Bluetooth Module</title>
      <link>https://ianhowson.com/blog/bluesmirf-silver-intro/</link>
      <pubDate>Fri, 09 May 2008 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/bluesmirf-silver-intro/</guid>
      <description>

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/r-bluesmirf-front.jpeg&#34; alt=&#34;BlueSMiRF Silver V2&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;quick-make-it-do-something&#34;&gt;Quick! Make it do something!&lt;/h2&gt;

&lt;p&gt;The board has the pinout printed on it. Connect 3.3V to the VCC pin and GND to the GND pin. The red LED should turn on. Hooray!&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://ianhowson.com/images/r-bluesmirf-back.jpeg&#34; alt=&#34;Back of the BlueSMiRF Silver V2&#34; /&gt;&lt;/p&gt;

&lt;p&gt;At this point, you won&amp;rsquo;t be able to see the module over Bluetooth. It starts up with the Bluetooth interface disabled. You need to send it some commands to get it started.&lt;/p&gt;

&lt;p&gt;I connected my module to my PC with a &lt;a href=&#34;http://www.sparkfun.com/commerce/product_info.php?products_id=449&#34;&gt;SparkFun RS232 Shifter board&lt;/a&gt;. You can then use a HyperTerminal (Windows) or minicom (Linux) to type commands directly to the module. Link the RTS/CTS pins on the module together while you&amp;rsquo;re at it. I connected all of this up with little IC test clips.&lt;/p&gt;

&lt;p&gt;The terminal settings are 9600bps, 8N1, no flow control. Type &amp;lsquo;+++&amp;rsquo; quickly to get to command mode on the module.&lt;/p&gt;

&lt;p&gt;As a quick sanity check, type&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;ATI&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The module should reply with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;1SPP - Ver: 1.2.5
OK&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Send:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;AT+BTSRV=1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and you should be able to pair with the module and send serial data through it.&lt;/p&gt;

&lt;p&gt;Read on for more advanced usage.&lt;/p&gt;

&lt;h2 id=&#34;at-commands&#34;&gt;AT Commands&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The wonderful thing about standards is that there are so many to choose from.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In an interesting tip of the hat to history, the module uses AT commands for control, just like a serial modem. After you send an AT command to a modem, it will always end its reply with OK or ERROR. You can send an empty command (&amp;lsquo;AT&amp;rsquo;) to confirm that the modem is responding as expected.&lt;/p&gt;

&lt;p&gt;Every Bluetooth serial modem has a different command set. The BlueSMiRF Silver V2 uses the one for the Philips/NXP BGP203. NXP appears to &lt;em&gt;dislike money&lt;/em&gt; and won&amp;rsquo;t give you programming info for their chips, even if you beg for it. To save us the hassle, SparkFun has dug up the &lt;a href=&#34;http://www.sparkfun.com/datasheets/Wireless/Bluetooth/BGB203_SPP_UserGuide.pdf&#34;&gt;programming info&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can also get the BGB203 &amp;lsquo;datasheet&amp;rsquo; from NXP, but it&amp;rsquo;s useless. Even after a few hours on the phone and offers to buy a lot of silicon, they wouldn&amp;rsquo;t give me anything better.&lt;/p&gt;

&lt;p&gt;You want to read the User Guide backwards. It has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the Table of Contents on the LAST PAGE&lt;/li&gt;
&lt;li&gt;a tutorial in chapter 9, near the end&lt;/li&gt;
&lt;li&gt;all of the parameters you need to get started at the back in &amp;lsquo;Default Configuration Parameters&amp;rsquo;&lt;/li&gt;
&lt;li&gt;all of the AT commands in the middle&lt;/li&gt;
&lt;li&gt;some useless stuff at the front, where it&amp;rsquo;s easy to find&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The interesting commands are:&lt;/p&gt;

&lt;p&gt;Get information on the firmware in the Bluetooth module:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;ATI
1SPP - Ver: 1.2.5
OK&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Get the Bluetooth display name:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;AT+BTLNM
+BTLNM: &amp;#34;SparkFun-BT&amp;#34;
OK&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Get the Bluetooth MAC address:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;AT+BTBDA
+BTBDA: 031F08071729
OK&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Get the UART parameters:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;AT+BTURT
+BTURT: 9600, 8, 0, 1, 0
OK&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Start the Bluetooth server on channel 1:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;AT+BTSRV=1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&#34;automating-startup&#34;&gt;Automating startup&lt;/h2&gt;

&lt;p&gt;Ultimately, you&amp;rsquo;re probably going to want to write a program to set all of this up automatically. The script that I use goes something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#e5e5e5;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;+++
AT&amp;amp;F
AT+BTLNM=&amp;#34;somename&amp;#34;
AT+BTAUT=1, 0
AT+BTURT=115200, 8, 0, 1, 0
AT+BTSEC=0
AT+BTFLS
AT+BTSRV=1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;with appropriate checks to make sure commands are actually executing properly.&lt;/p&gt;

&lt;p&gt;This script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resets to factory settings (so we know what state we&amp;rsquo;re in)&lt;/li&gt;
&lt;li&gt;changes the Bluetooth display name to &amp;lsquo;somename&amp;rsquo;&lt;/li&gt;
&lt;li&gt;allows automatic Bluetooth connections to the module&lt;/li&gt;
&lt;li&gt;sets the module to 115200bps (at which point you will probably have to change the bit rate on your UART as well)&lt;/li&gt;
&lt;li&gt;disables security so you don&amp;rsquo;t need a complex pairing process (naughty, but it makes prototyping a whole lot easier)&lt;/li&gt;
&lt;li&gt;writes all of this to the Flash on the module&lt;/li&gt;
&lt;li&gt;starts the Bluetooth server&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;tidbits&#34;&gt;Tidbits&lt;/h2&gt;

&lt;p&gt;Any Bluetooth activity (querying, scanning, etc) seems to block the AT command interface.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;re typing commands by hand, don&amp;rsquo;t hit backspace. The command will be rejected. I recommend typing commands into a text editor and copy/pasting them into the terminal window so that you don&amp;rsquo;t make mistakes.&lt;/p&gt;

&lt;p&gt;The module doesn&amp;rsquo;t appear to be case-sensitive to the AT commands, so you can type in lowercase and probably eliminate some errors that way.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Kinesis Advantage keyboard and learning Dvorak</title>
      <link>https://ianhowson.com/blog/kinesis-advantage-keyboard-and-learning-dvorak/</link>
      <pubDate>Sun, 27 Jan 2008 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/kinesis-advantage-keyboard-and-learning-dvorak/</guid>
      <description>&lt;p&gt;I bought a Kinesis Advantage keyboard with the intention of reducing my finger pain associated with typing. Obviously, I spend a good portion of every day typing, and my livelihood basically depends on my being able to continue typing.&lt;/p&gt;

&lt;p&gt;I also decided to learn the Dvorak layout while I was learning the Kinesis keyboard. I tried learning Dvorak a few years back but gave it up because I was working with a lot of other people&amp;#8217;s keyboards as well &amp;ndash; it was too inconvenient to keep switching layouts.&lt;/p&gt;

&lt;h2&gt;On the Kinesis/Dvorak learning process&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Muscle memory is a &lt;strong&gt;huge&lt;/strong&gt; factor in switching to a different keyboard or layout. Even now, when I&amp;#8217;m typing on my laptop I instinctively reach for the Enter key with my right thumb, because that&amp;#8217;s where it is on the Kinesis.&lt;/li&gt;
&lt;li&gt;The Kinesis makes a lot of bad habits difficult, whether by accident or by design. You can&amp;#8217;t really rest your hands on the pads while typing because then you can&amp;#8217;t reach all of the keys. You can&amp;#8217;t twist your hands around to move them around the layout because the keys are aligned to suit your hands in the home position. When you move your hands around the layout, suddenly they&amp;#8217;re un-aligned and awkward.&lt;/li&gt;
&lt;li&gt;I started out using a lot of &amp;#8216;mental CPU&amp;#8217; time to handle the conversion. In the beginning, it took all of my concentration just to hit the right keys -- I had to separate my thinking from my typing.&lt;/li&gt;
&lt;li&gt;While learning Dvorak, I noticed an interesting progression; I started out pressing just single keys at a time. Gradually, I started combining strings of keys into single motions (something I call &amp;#8216;chording&amp;#8217;, which I&amp;#8217;ll come back to). This is similar to how a child learns to read -- they recognize single letters, expand out to sounds and eventually can string together words.&lt;/li&gt;
&lt;li&gt;I made rapid progress for about three weeks. The first few days were difficult. I only used the Kinesis+Dvorak for a couple of hours each day because it was very frustrating to learn. I&amp;#8217;ve had 20 years with a nice solid brain-keyboard link via the keyboard, and suddenly it&amp;#8217;s horribly slow and error-prone. After the first few days things settled down a bit and I could manage an entire day&amp;#8217;s work on the Kinesis.&lt;/li&gt;
&lt;li&gt;After the first three weeks, progress slowed. I was still improving, but more in the areas of chording and accuracy. Some keys still gave me consistent problems on the Dvorak layout, particularly G and P.&lt;/li&gt;
&lt;li&gt;You need to sit a little higher than with a normal keyboard. This is a problem for me -- I already find standard office chairs too short (even the Aeron!). When time allows, I&amp;#8217;ll be buying a nicer chair and getting the gas-lift swapped for a taller one.&lt;/li&gt;
&lt;li&gt;Why the A and S keys -- two of the most common characters in English text -- are on my two &lt;strong&gt;weakest&lt;/strong&gt; fingers, I&amp;#8217;ll never know. The pinkies get more of a workout than usual because they&amp;#8217;re handling all of the keys on the edge of the keyboard, too. Placing these two common letters on already weak and overutilized fingers is probably the biggest flaw in the Dvorak keyboard that I&amp;#8217;ve found.&lt;/li&gt;
&lt;li&gt;Dvorak is lovely for English text. It&amp;#8217;s just a great feeling to feel the letters whiz by with so little effort. However, a lot of my typing load is &lt;strong&gt;not&lt;/strong&gt; English text. It&amp;#8217;s Linux terminal navigation, C and Python source code, all of which intentionally discard vowels in exchange for brevity. This makes Dvorak&amp;#8217;s plan to alternate hands work very poorly -- you&amp;#8217;re just not typing anything on the left hand.&lt;/li&gt;
&lt;li&gt;Some common Unix commands are absolutely worst-case scenarios on Dvorak. Take the command &amp;#8216;ls -l&amp;#8217;, which I would type dozens of times per day. On Dvorak, L, S and hyphen are all on the right pinkie. If you&amp;#8217;re using a standard keyboard, so is the Enter key. It&amp;#8217;s really, really unpleasant to type &amp;#8216;correctly&amp;#8217;.&lt;/li&gt;
&lt;li&gt;Naturally, position-based key bindings don&amp;#8217;t work: Vim&amp;#8217;s HJKL, WASD for games, Ctrl-Z/X/C/V for word processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I really tried to make Dvorak work. I gave it a month. At the end of the month, I switched back to QWERTY. It&amp;#8217;s a fantastic layout for cranking out lots of English text, but that&amp;#8217;s not my use case.&lt;/p&gt;

&lt;h2&gt;On the Kinesis keyboard&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The up and down arrows are backwards compared with the Vim convention. That&amp;#8217;s relatively easy to remap on the keyboard.&lt;/li&gt;
&lt;li&gt;The macros are rather buggy. Actually, scratch that. The keyboard&amp;#8217;s firmware is rubbish. I suspect that either the authors had never programmed on a microcontroller before or were not trained as programmers to begin with. I can &lt;strong&gt;crash&lt;/strong&gt; the keyboard in two fairly common scenarios. That said, the crap firmware is not a reason to not buy the keyboard. It&amp;#8217;s perfectly usable despite its flaws.&lt;/li&gt;
&lt;li&gt;I opened it up to look inside. It&amp;#8217;s pretty typical of small-scale electronics manufacturing: no surface-mount parts, revisions hot-glued to the case, off-the-shelf components. It&amp;#8217;s not badly made -- it feels very solid for its weight -- but it does have a few rough edges which might surprise you if you&amp;#8217;re expecting a mass-produced product.&lt;/li&gt;
&lt;li&gt;It&amp;#8217;s ripe for hacking. The main controller is an Atmel AT89S series microcontroller. The macro RAM is on standard serial EEPROMs. There&amp;#8217;s even a socket for a second one (the Advantage Pro upgrade). Apparently the firmware can be changed over the PS2 or USB port, but Kinesis didn&amp;#8217;t seem to willing to send it to me when I mentioned I wanted to modify it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Where to from here?&lt;/h2&gt;

&lt;p&gt;I still want to improve the keyboard layout. The Kinesis makes typing &lt;em&gt;less&lt;/em&gt; painful, but some of my pains appear to be linked to the QWERTY layout. And the feeling of effortlessly flying through English text with Dvorak was just amazing.&lt;/p&gt;

&lt;p&gt;To come up with a better keyboard layout, I want to log my keystrokes for a month. Each keystroke will be tagged with the time and the active process. The process lets me figure out whether the keystroke intent was a letter or a position. I can also detect errors by tracking Backspace presses. With that information, I can determine exactly which keystrokes or combinations are the most common for me.&lt;/p&gt;

&lt;p&gt;In addition, I want a &amp;#8216;trainer&amp;#8217; &amp;ndash; a program that will prompt me with an arbitrary series of keystrokes and time how long it takes me to hit them. This will give me information on how strong and fast my fingers are and if any of them are particularly error-prone. From that, I can generate a map of the keyboard, each key associated with a &amp;#8216;performance&amp;#8217; score. Combining the two datasets, I can then come up with an ideal keymap for me, given my typical usage patterns and my own brain-keyboard performance data.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;d also like to integrate information on common digraphs, but I&amp;#8217;m not sure how best to use them. I&amp;#8217;m not sure that Dvorak&amp;#8217;s assertion that alternating hands is the best thing to do. A common case for me is the &amp;#8216;chording&amp;#8217; I mentioned previously, where a single hand can hit a sequence of keys very rapidly. The timing is simpler &amp;ndash; I arrange my hand correctly on the keys, then use the individual fingers to press them in sequence. Of course, this sounds like the sort of thing that might cause tendon damage. But it&amp;#8217;s fast.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s more discussion on chording and performance &lt;a href=&#34;http://slashdot.org/comments.pl?sid=35481&amp;amp;cid=3832754&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Migrating your data between todo list programs</title>
      <link>https://ianhowson.com/blog/migrate-between-todo-list-programs/</link>
      <pubDate>Sun, 27 Jan 2008 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/blog/migrate-between-todo-list-programs/</guid>
      <description>

&lt;p&gt;There are lots of great options for todo-list tracking these days. Unfortunately, most don&amp;rsquo;t make it easy to export or import your data. Here&amp;rsquo;s a list of scripts that I&amp;rsquo;ve found to perform conversions. Please comment if you know of more!&lt;/p&gt;

&lt;h2 id=&#34;from-things&#34;&gt;From Things&lt;/h2&gt;

&lt;p&gt;There isn&amp;rsquo;t currently any way to export repeating tasks automatically via AppleScript.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://downloads.culturedcode.com/things/download/ThingsAppleScriptGuide.pdf&#34;&gt;The Things AppleScript guide&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&#34;to-omnifocus&#34;&gt;To OmniFocus&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/hvolkmer/4020468&#34;&gt;https://gist.github.com/hvolkmer/4020468&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://snipplr.com/view/40156/&#34;&gt;http://snipplr.com/view/40156/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;to-the-hit-list&#34;&gt;To The Hit List&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/ihowson/ThingsToTheHitList&#34;&gt;My Python script to convert from Things to The Hit List&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&#34;to-plain-text&#34;&gt;To plain text&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://github.com/thepoch/ExportThings&#34;&gt;https://github.com/thepoch/ExportThings&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&#34;from-the-hit-list&#34;&gt;From The Hit List&lt;/h2&gt;

&lt;h3 id=&#34;to-omnifocus-1&#34;&gt;To OmniFocus&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;http://forums.omnigroup.com/archive/index.php/t-13565.html&#34;&gt;http://forums.omnigroup.com/archive/index.php/t-13565.html&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Introduction</title>
      <link>https://ianhowson.com/fpga/introduction/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/introduction/</guid>
      <description>

&lt;h2 id=&#34;motivation&#34;&gt;Motivation&lt;/h2&gt;

&lt;p&gt;The use of cryptography is growing rapidly with the adoption of computer
technology. The design of cryptographic ciphers is still not well
understood; we cannot prove the security of an algorithm. Currently,
the only way to be sure of the security of an algorithm is to study
it for a long period of time and use the absence of attacks as evidence
confirming its security.&lt;/p&gt;

&lt;p&gt;All ciphers are vulnerable to an &lt;em&gt;exhaustive key search&lt;/em&gt; attack.
An attacker can try every single possible key to check its correctness.
This is time consuming, but feasible for several widely deployed ciphers.&lt;/p&gt;

&lt;p&gt;An obvious way to conduct an exhaustive key search attack is to write
software that will check each key in turn. Current microprocessors
have clock rates in the gigahertz range and can execute several instructions
per clock cycle. They are also cheap, highly available and easy to
program.&lt;/p&gt;

&lt;p&gt;Another possibility is to use a Field Programmable Gate Array (FPGA)
device to conduct an exhaustive key search. FPGAs provide the functionality
of a custom chip without the high up-front cost and lead time. They
have much lower clock rates than general-purpose CPUs, but can be
designed to perform one task exceptionally well. Parallelism can also
be exploited to increase the overall search rate.&lt;/p&gt;

&lt;p&gt;We thus have several questions requiring investigation with regard
to FPGA technology in cryptanalysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which cryptanalytic tasks can FPGAs complete more quickly than CPUs?&lt;/li&gt;
&lt;li&gt;What are the price/performance benefits of FPGAs over CPUs?&lt;/li&gt;
&lt;li&gt;What other technologies are there that might allow us to complete these tasks faster or cheaper?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;why-exhaustive-key-search&#34;&gt;Why exhaustive key search?&lt;/h2&gt;

&lt;p&gt;Exhaustive key search is guaranteed to be a possible attack for any
cipher, but not necessarily feasible. Most new ciphers that are being
deployed have key lengths of 128 bits or greater. A cipher with such
a key length cannot be feasibly attacked with current technology.
Nevertheless, there are many reasons why conducting research into
exhaustive key search attacks is worthwhile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A lot of currently deployed encryption is vulnerable to key
search attacks.&lt;/strong&gt; The default encryption used by GSM mobile phones
and 802.11b wireless networks uses a key which is short enough to
facilitate exhaustive key search. The DES cipher was widely deployed
in the banking industry (amongst others) and is vulnerable. Many websites
using SSL encryption are also vulnerable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Export restrictions in many areas prevent the use of ciphers
with long key lengths.&lt;/strong&gt; The United States has a history of restricting
the export of strong cryptography, often using key length as a deciding
factor. The Wassenaar agreement stipulates similar limitations and
is enforced by 33 countries around the world, including Australia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Small embedded devices may not be able to support ciphers
with long key lengths.&lt;/strong&gt; Cheap smart card devices containing encryption
software are becoming more widespread. In order to meet cost or size
constraints, many of these devices use very short key lengths or known
weak ciphers. The encrypted data transmitted from many of these devices
can be attacked using exhaustive key search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Many other attacks employ an exhaustive key search.&lt;/strong&gt; Many
attacks work by reducing the key space to an amount which can be feasibly
searched or by removing large sections of the key space that can be
proven to not contain the target key. Time/memory tradeoff attacks
usually require a large preprocessing step which resembles key search.
Both of these attack types require a key search to be conducted as
part of their operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key search machines can be useful research tools.&lt;/strong&gt; Research
into other attacks may require a cipher to perform particular operations
or to generate plaintext or ciphertext with certain characteristics.
Exhaustive key search can be used to achieve this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weak encryption has been used extensively in the past.&lt;/strong&gt; Significant
amounts of information has been encrypted with ciphers that are vulnerable
to exhaustive key search or other attacks. Encrypted data could be
stored until the technology or techniques to reveal that data become
available. Key search machines may still be able to reveal valuable
information that was encrypted in the past. Similarly, future technology
may be able to reveal even today&amp;rsquo;s strongly encrypted data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exhaustive key search is highly parallelisable.&lt;/strong&gt; This makes
it a valuable application with which to experiment with parallel computing
techniques.&lt;/p&gt;

&lt;h2 id=&#34;approach&#34;&gt;Approach&lt;/h2&gt;

&lt;p&gt;In order to determine the utility of FPGAs when conducting exhaustive
key search attacks, we need to consider their potential price and
performance benefits over other technologies such as ASICs and CPUs.
Pricing data can be obtained from suppliers, while performance data
can be gathered from implementations. Performing implementations should
also provide useful insights into the issues involved with cipher
and key search machine design.&lt;/p&gt;

&lt;p&gt;CPU pricing can be obtained from suppliers and performance measured
with benchmark software. ASIC price and performance estimates can
be obtained from suppliers.&lt;/p&gt;

&lt;p&gt;The optimal family and device within each technology can be determined
by computing the price for a certain search rate. Comparing price/performance
ratios between technologies for different ciphers will help to determine
which technology is best under what conditions.&lt;/p&gt;

&lt;p&gt;From these analyses, it should be possible to recognise situations
where FPGAs can be beneficial in key search applications.&lt;/p&gt;

&lt;h2 id=&#34;thesis-organisation&#34;&gt;Thesis organisation&lt;/h2&gt;

&lt;p&gt;&lt;a href=&#34;https://ianhowson.com/fpga/background/&#34;&gt;Chapter 2&lt;/a&gt; describes all of the past work, theory and knowledge that
will be needed to understand the remainder of the thesis. It also
sets the context for the new developments made by this thesis. &lt;a href=&#34;https://ianhowson.com/fpga/design/&#34;&gt;Chapter 3&lt;/a&gt; describes the design and implementation work that was performed
in order to gather meaningful data. It allows the data analysis to
use real-world data. &lt;a href=&#34;https://ianhowson.com/fpga/analysis/&#34;&gt;Chapter 4&lt;/a&gt; analyses the gathered data to form
conclusions on a wide variety of areas, and forms the bulk of this
thesis. &lt;a href=&#34;https://ianhowson.com/fpga/conclusion/&#34;&gt;Chapter 5&lt;/a&gt; summarises the conclusions and provides directions
for future work.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>FPGA price/performance tables</title>
      <link>https://ianhowson.com/fpga/fpga-price-performance/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/fpga-price-performance/</guid>
      <description>

&lt;p&gt;FPGA pricing is specified in USD and was obtained from Avnet &lt;a href=&#34;#ref45&#34;&gt;[45]&lt;/a&gt;
on October 15th, 2003. In all cases, the cheapest package available
was used; this was usually also the smallest package.&lt;/p&gt;

&lt;p&gt;Spartan 3 devices only started shipping recently, and pricing is still
highly unstable. No price could be obtained for the XC3S200 device.&lt;/p&gt;

&lt;p&gt;RC4 performance figures use the same relative performance ratios as
RC5; both are RAM-based cipher implementations. Resource figures were
inferred from &lt;a href=&#34;#ref21&#34;&gt;[21]&lt;/a&gt;.&lt;/p&gt;

&lt;table class=&#39;ui compact attached celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th colspan=&#39;2&#39;&gt;&lt;/th&gt;&lt;th colspan=&#39;3&#39;&gt;RC5&lt;/th&gt;&lt;th colspan=&#39;3&#39;&gt;DES&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;Family&lt;/th&gt;&lt;th&gt;Speed&lt;/th&gt;&lt;th&gt;MHz&lt;/th&gt;&lt;th&gt;SU slices&lt;/th&gt;&lt;th&gt;SU RAM&lt;/th&gt;&lt;th&gt;MHz&lt;/th&gt;&lt;th&gt;SU slices&lt;/th&gt;&lt;th&gt;SU RAM&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;Virtex II Pro&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;138&lt;/td&gt;&lt;td&gt;666&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;238&lt;/td&gt;&lt;td&gt;1774&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Virtex II&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;120&lt;/td&gt;&lt;td&gt;663&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;209&lt;/td&gt;&lt;td&gt;1774&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Spartan 3&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;156&lt;/td&gt;&lt;td&gt;657&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;280&lt;/td&gt;&lt;td&gt;1774&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Spartan IIE&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;103&lt;/td&gt;&lt;td&gt;700&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;149&lt;/td&gt;&lt;td&gt;1806&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Virtex E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;103&lt;/td&gt;&lt;td&gt;700&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;149&lt;/td&gt;&lt;td&gt;1806&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Relative FPGA family performance&lt;/div&gt;

&lt;p&gt;&lt;/p&gt;

&lt;table class=&#39;ui small compact attached celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th colspan=&#39;4&#39;&gt;&lt;/th&gt;&lt;th colspan=&#39;2&#39;&gt;DES&lt;/th&gt;&lt;th colspan=&#39;2&#39;&gt;RC5&lt;/th&gt;&lt;th colspan=&#39;2&#39;&gt;RC4&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;FPGA&lt;/th&gt;&lt;th&gt;Speed&lt;/th&gt;&lt;th&gt;Package&lt;/th&gt;&lt;th&gt;Price&lt;/th&gt;&lt;th&gt;Mk/s&lt;/th&gt;&lt;th&gt;$/Mk/s&lt;/th&gt;&lt;th&gt;Mk/s&lt;/th&gt;&lt;th&gt;$/Mk/s&lt;/th&gt;&lt;th&gt;Mk/s&lt;/th&gt;&lt;th&gt;$/Mk/s&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP2&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FG256C&lt;/td&gt;&lt;td&gt;$62&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.59&lt;/td&gt;&lt;td&gt;$104.59&lt;/td&gt;&lt;td&gt;1.02&lt;/td&gt;&lt;td&gt;$60.63&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP4&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FG256C&lt;/td&gt;&lt;td&gt;$113&lt;/td&gt;&lt;td&gt;238&lt;/td&gt;&lt;td&gt;$0.48&lt;/td&gt;&lt;td&gt;1.18&lt;/td&gt;&lt;td&gt;$96.26&lt;/td&gt;&lt;td&gt;2.37&lt;/td&gt;&lt;td&gt;$47.83&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP7&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FG456C&lt;/td&gt;&lt;td&gt;$176&lt;/td&gt;&lt;td&gt;476&lt;/td&gt;&lt;td&gt;$0.37&lt;/td&gt;&lt;td&gt;2.06&lt;/td&gt;&lt;td&gt;$85.45&lt;/td&gt;&lt;td&gt;3.72&lt;/td&gt;&lt;td&gt;$47.28&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP20&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FG676C&lt;/td&gt;&lt;td&gt;$299&lt;/td&gt;&lt;td&gt;1190&lt;/td&gt;&lt;td&gt;$0.25&lt;/td&gt;&lt;td&gt;4.12&lt;/td&gt;&lt;td&gt;$72.58&lt;/td&gt;&lt;td&gt;7.44&lt;/td&gt;&lt;td&gt;$40.16&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP30&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FG676C&lt;/td&gt;&lt;td&gt;$508&lt;/td&gt;&lt;td&gt;1666&lt;/td&gt;&lt;td&gt;$0.31&lt;/td&gt;&lt;td&gt;5.88&lt;/td&gt;&lt;td&gt;$86.36&lt;/td&gt;&lt;td&gt;11.51&lt;/td&gt;&lt;td&gt;$44.17&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP40&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FG676C&lt;/td&gt;&lt;td&gt;$790&lt;/td&gt;&lt;td&gt;2380&lt;/td&gt;&lt;td&gt;$0.33&lt;/td&gt;&lt;td&gt;8.53&lt;/td&gt;&lt;td&gt;$92.56&lt;/td&gt;&lt;td&gt;16.24&lt;/td&gt;&lt;td&gt;$48.63&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP50&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FF1152C&lt;/td&gt;&lt;td&gt;$1477&lt;/td&gt;&lt;td&gt;3094&lt;/td&gt;&lt;td&gt;$0.48&lt;/td&gt;&lt;td&gt;10.30&lt;/td&gt;&lt;td&gt;$143.45&lt;/td&gt;&lt;td&gt;19.63&lt;/td&gt;&lt;td&gt;$75.27&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP70&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FF1517C&lt;/td&gt;&lt;td&gt;$2256&lt;/td&gt;&lt;td&gt;4284&lt;/td&gt;&lt;td&gt;$0.53&lt;/td&gt;&lt;td&gt;14.71&lt;/td&gt;&lt;td&gt;$153.35&lt;/td&gt;&lt;td&gt;27.75&lt;/td&gt;&lt;td&gt;$81.31&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2VP100&lt;/td&gt;&lt;td&gt;-5&lt;/td&gt;&lt;td&gt;FF1696C&lt;/td&gt;&lt;td&gt;$5579&lt;/td&gt;&lt;td&gt;5712&lt;/td&gt;&lt;td&gt;$0.98&lt;/td&gt;&lt;td&gt;19.42&lt;/td&gt;&lt;td&gt;$287.29&lt;/td&gt;&lt;td&gt;37.56&lt;/td&gt;&lt;td&gt;$148.54&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V40&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;CS144C&lt;/td&gt;&lt;td&gt;$22&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.00&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.29&lt;/td&gt;&lt;td&gt;$76.04&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V80&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;CS144C&lt;/td&gt;&lt;td&gt;$28&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.00&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.59&lt;/td&gt;&lt;td&gt;$48.35&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V250&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FG256C&lt;/td&gt;&lt;td&gt;$79&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.51&lt;/td&gt;&lt;td&gt;$155.09&lt;/td&gt;&lt;td&gt;1.76&lt;/td&gt;&lt;td&gt;$45.16&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V500&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FG256C&lt;/td&gt;&lt;td&gt;$134&lt;/td&gt;&lt;td&gt;209&lt;/td&gt;&lt;td&gt;$0.64&lt;/td&gt;&lt;td&gt;1.02&lt;/td&gt;&lt;td&gt;$131.12&lt;/td&gt;&lt;td&gt;2.34&lt;/td&gt;&lt;td&gt;$57.27&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V1000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FG256C&lt;/td&gt;&lt;td&gt;$195&lt;/td&gt;&lt;td&gt;418&lt;/td&gt;&lt;td&gt;$0.47&lt;/td&gt;&lt;td&gt;1.79&lt;/td&gt;&lt;td&gt;$108.71&lt;/td&gt;&lt;td&gt;2.93&lt;/td&gt;&lt;td&gt;$66.47&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V1500&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FG676C&lt;/td&gt;&lt;td&gt;$301&lt;/td&gt;&lt;td&gt;836&lt;/td&gt;&lt;td&gt;$0.36&lt;/td&gt;&lt;td&gt;2.81&lt;/td&gt;&lt;td&gt;$107.09&lt;/td&gt;&lt;td&gt;3.52&lt;/td&gt;&lt;td&gt;$85.74&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V2000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FG676C&lt;/td&gt;&lt;td&gt;$428&lt;/td&gt;&lt;td&gt;1254&lt;/td&gt;&lt;td&gt;$0.34&lt;/td&gt;&lt;td&gt;4.09&lt;/td&gt;&lt;td&gt;$104.52&lt;/td&gt;&lt;td&gt;4.10&lt;/td&gt;&lt;td&gt;$104.34&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V3000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FG676C&lt;/td&gt;&lt;td&gt;$658&lt;/td&gt;&lt;td&gt;1672&lt;/td&gt;&lt;td&gt;$0.39&lt;/td&gt;&lt;td&gt;5.37&lt;/td&gt;&lt;td&gt;$122.42&lt;/td&gt;&lt;td&gt;7.03&lt;/td&gt;&lt;td&gt;$93.57&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V4000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FF1152C&lt;/td&gt;&lt;td&gt;$1552&lt;/td&gt;&lt;td&gt;2508&lt;/td&gt;&lt;td&gt;$0.62&lt;/td&gt;&lt;td&gt;8.70&lt;/td&gt;&lt;td&gt;$178.42&lt;/td&gt;&lt;td&gt;8.79&lt;/td&gt;&lt;td&gt;$176.62&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V6000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FF1152C&lt;/td&gt;&lt;td&gt;$2936&lt;/td&gt;&lt;td&gt;3971&lt;/td&gt;&lt;td&gt;$0.74&lt;/td&gt;&lt;td&gt;13.05&lt;/td&gt;&lt;td&gt;$224.99&lt;/td&gt;&lt;td&gt;10.55&lt;/td&gt;&lt;td&gt;$278.40&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2V8000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FF1152C&lt;/td&gt;&lt;td&gt;$7446&lt;/td&gt;&lt;td&gt;5434&lt;/td&gt;&lt;td&gt;$1.37&lt;/td&gt;&lt;td&gt;17.91&lt;/td&gt;&lt;td&gt;$415.73&lt;/td&gt;&lt;td&gt;12.30&lt;/td&gt;&lt;td&gt;$605.21&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC3S50&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;VQ100&lt;/td&gt;&lt;td&gt;$9&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.33&lt;/td&gt;&lt;td&gt;$27.06&lt;/td&gt;&lt;td&gt;0.38&lt;/td&gt;&lt;td&gt;$23.45&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC3S200&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;VQ100&lt;/td&gt;&lt;td&gt;$16&lt;/td&gt;&lt;td&gt;280&lt;/td&gt;&lt;td&gt;$0.06&lt;/td&gt;&lt;td&gt;0.67&lt;/td&gt;&lt;td&gt;$24.05&lt;/td&gt;&lt;td&gt;1.15&lt;/td&gt;&lt;td&gt;$13.89&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC3S400&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;TQ144&lt;/td&gt;&lt;td&gt;$24&lt;/td&gt;&lt;td&gt;560&lt;/td&gt;&lt;td&gt;$0.04&lt;/td&gt;&lt;td&gt;1.66&lt;/td&gt;&lt;td&gt;$14.43&lt;/td&gt;&lt;td&gt;1.54&lt;/td&gt;&lt;td&gt;$15.63&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC3S1000&lt;/td&gt;&lt;td&gt;-4&lt;/td&gt;&lt;td&gt;FT256&lt;/td&gt;&lt;td&gt;$67&lt;/td&gt;&lt;td&gt;1120&lt;/td&gt;&lt;td&gt;$0.06&lt;/td&gt;&lt;td&gt;3.66&lt;/td&gt;&lt;td&gt;$18.31&lt;/td&gt;&lt;td&gt;2.30&lt;/td&gt;&lt;td&gt;$29.09&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S50E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;TQ144C&lt;/td&gt;&lt;td&gt;$12&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.22&lt;/td&gt;&lt;td&gt;$56.35&lt;/td&gt;&lt;td&gt;0.51&lt;/td&gt;&lt;td&gt;$24.50&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S100E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;TQ144C&lt;/td&gt;&lt;td&gt;$15&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.22&lt;/td&gt;&lt;td&gt;$69.12&lt;/td&gt;&lt;td&gt;0.63&lt;/td&gt;&lt;td&gt;$24.05&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S150E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;PQ208C&lt;/td&gt;&lt;td&gt;$21&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.44&lt;/td&gt;&lt;td&gt;$46.96&lt;/td&gt;&lt;td&gt;0.76&lt;/td&gt;&lt;td&gt;$27.23&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S200E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;PQ208C&lt;/td&gt;&lt;td&gt;$25&lt;/td&gt;&lt;td&gt;149&lt;/td&gt;&lt;td&gt;$0.17&lt;/td&gt;&lt;td&gt;0.66&lt;/td&gt;&lt;td&gt;$37.65&lt;/td&gt;&lt;td&gt;0.88&lt;/td&gt;&lt;td&gt;$28.07&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S300E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;PQ208C&lt;/td&gt;&lt;td&gt;$39&lt;/td&gt;&lt;td&gt;149&lt;/td&gt;&lt;td&gt;$0.26&lt;/td&gt;&lt;td&gt;0.88&lt;/td&gt;&lt;td&gt;$44.45&lt;/td&gt;&lt;td&gt;1.01&lt;/td&gt;&lt;td&gt;$38.66&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S400E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;FT256C&lt;/td&gt;&lt;td&gt;$61&lt;/td&gt;&lt;td&gt;298&lt;/td&gt;&lt;td&gt;$0.20&lt;/td&gt;&lt;td&gt;1.32&lt;/td&gt;&lt;td&gt;$46.29&lt;/td&gt;&lt;td&gt;2.53&lt;/td&gt;&lt;td&gt;$24.15&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XC2S600E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;FG456C&lt;/td&gt;&lt;td&gt;$153&lt;/td&gt;&lt;td&gt;447&lt;/td&gt;&lt;td&gt;$0.34&lt;/td&gt;&lt;td&gt;1.98&lt;/td&gt;&lt;td&gt;$77.36&lt;/td&gt;&lt;td&gt;4.55&lt;/td&gt;&lt;td&gt;$33.64&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV50E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;CS144C&lt;/td&gt;&lt;td&gt;$33&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.22&lt;/td&gt;&lt;td&gt;$150.76&lt;/td&gt;&lt;td&gt;1.01&lt;/td&gt;&lt;td&gt;$32.78&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV100E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;CS144C&lt;/td&gt;&lt;td&gt;$49&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;&amp;ndash;&lt;/td&gt;&lt;td&gt;0.22&lt;/td&gt;&lt;td&gt;$224.62&lt;/td&gt;&lt;td&gt;1.26&lt;/td&gt;&lt;td&gt;$39.07&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV200E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;CS144C&lt;/td&gt;&lt;td&gt;$87&lt;/td&gt;&lt;td&gt;149&lt;/td&gt;&lt;td&gt;$0.58&lt;/td&gt;&lt;td&gt;0.66&lt;/td&gt;&lt;td&gt;$131.98&lt;/td&gt;&lt;td&gt;1.77&lt;/td&gt;&lt;td&gt;$49.19&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV300E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;PQ240C&lt;/td&gt;&lt;td&gt;$144&lt;/td&gt;&lt;td&gt;149&lt;/td&gt;&lt;td&gt;$0.97&lt;/td&gt;&lt;td&gt;0.88&lt;/td&gt;&lt;td&gt;$164.04&lt;/td&gt;&lt;td&gt;2.02&lt;/td&gt;&lt;td&gt;$71.33&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV400E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;PQ240C&lt;/td&gt;&lt;td&gt;$222&lt;/td&gt;&lt;td&gt;298&lt;/td&gt;&lt;td&gt;$0.75&lt;/td&gt;&lt;td&gt;1.32&lt;/td&gt;&lt;td&gt;$168.63&lt;/td&gt;&lt;td&gt;2.53&lt;/td&gt;&lt;td&gt;$87.99&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV600E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;HQ240C&lt;/td&gt;&lt;td&gt;$376&lt;/td&gt;&lt;td&gt;447&lt;/td&gt;&lt;td&gt;$0.84&lt;/td&gt;&lt;td&gt;1.98&lt;/td&gt;&lt;td&gt;$190.23&lt;/td&gt;&lt;td&gt;4.55&lt;/td&gt;&lt;td&gt;$82.72&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV1000E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;HQ240C&lt;/td&gt;&lt;td&gt;$938&lt;/td&gt;&lt;td&gt;894&lt;/td&gt;&lt;td&gt;$1.05&lt;/td&gt;&lt;td&gt;3.73&lt;/td&gt;&lt;td&gt;$251.32&lt;/td&gt;&lt;td&gt;6.06&lt;/td&gt;&lt;td&gt;$154.82&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV1600E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;BG560C&lt;/td&gt;&lt;td&gt;$1522&lt;/td&gt;&lt;td&gt;1192&lt;/td&gt;&lt;td&gt;$1.28&lt;/td&gt;&lt;td&gt;4.83&lt;/td&gt;&lt;td&gt;$315.10&lt;/td&gt;&lt;td&gt;9.09&lt;/td&gt;&lt;td&gt;$167.46&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV2000E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;BG560C&lt;/td&gt;&lt;td&gt;$2142&lt;/td&gt;&lt;td&gt;1490&lt;/td&gt;&lt;td&gt;$1.44&lt;/td&gt;&lt;td&gt;5.93&lt;/td&gt;&lt;td&gt;$361.19&lt;/td&gt;&lt;td&gt;10.10&lt;/td&gt;&lt;td&gt;$212.03&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV2600E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;FG1156C&lt;/td&gt;&lt;td&gt;$4620&lt;/td&gt;&lt;td&gt;2086&lt;/td&gt;&lt;td&gt;$2.21&lt;/td&gt;&lt;td&gt;7.91&lt;/td&gt;&lt;td&gt;$584.35&lt;/td&gt;&lt;td&gt;11.62&lt;/td&gt;&lt;td&gt;$397.72&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XCV3200E&lt;/td&gt;&lt;td&gt;-6&lt;/td&gt;&lt;td&gt;CG1156CES&lt;/td&gt;&lt;td&gt;$6155&lt;/td&gt;&lt;td&gt;2533&lt;/td&gt;&lt;td&gt;$2.43&lt;/td&gt;&lt;td&gt;10.10&lt;/td&gt;&lt;td&gt;$609.21&lt;/td&gt;&lt;td&gt;13.13&lt;/td&gt;&lt;td&gt;$468.69&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;FPGA price/performance&lt;/div&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p&gt;&lt;a name=&#39;ref21&#39;&gt;&lt;/a&gt;[21] K. L. K.H. Tsoi and P. Leong, &amp;ldquo;A massively parallel RC4 key search engine,&amp;rdquo; in &lt;em&gt;Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)&lt;/em&gt;, 2002, pp. 13 21. [Online]. Available: &lt;a href=&#34;http://www.cse.cuhk.edu.hk/~phwl/papers/vrvw_fccm02.pdf&#34;&gt;http://www.cse.cuhk.edu.hk/~phwl/papers/vrvw_fccm02.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref45&#39;&gt;&lt;/a&gt;[45] (2003, October) Avnet electronics marketing. [Online]. Available: &lt;a href=&#34;http://em.avnet.com/&#34;&gt;http://em.avnet.com/&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>CPU benchmark results</title>
      <link>https://ianhowson.com/fpga/cpu-benchmarks/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/cpu-benchmarks/</guid>
      <description>&lt;table class=&#39;ui compact attached celled table&#39;&gt;
  &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Processor&lt;/th&gt;&lt;th&gt;Clock rate (MHz)&lt;/th&gt;&lt;th&gt;Core&lt;/th&gt;&lt;th&gt;Speed (Mkeys/sec)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;Pentium IV&lt;/td&gt;&lt;td&gt;2533&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;10.3&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon 2500+ (Barton)&lt;/td&gt;&lt;td&gt;1833&lt;/td&gt;&lt;td&gt;d.net (Byte Bryd)&lt;/td&gt;&lt;td&gt;10.2&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP 1900+&lt;/td&gt;&lt;td&gt;1600&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;9.0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium IV&lt;/td&gt;&lt;td&gt;1800&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;7.5&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium IV-M&lt;/td&gt;&lt;td&gt;1700&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;7.5&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Duron&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;5.4&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium III M&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;4.3&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium MMX&lt;/td&gt;&lt;td&gt;200&lt;/td&gt;&lt;td&gt;d.net (MMX bitslice)&lt;/td&gt;&lt;td&gt;2.9&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium II&lt;/td&gt;&lt;td&gt;233&lt;/td&gt;&lt;td&gt;d.net (MMX bitslice)&lt;/td&gt;&lt;td&gt;2.6&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron-A&lt;/td&gt;&lt;td&gt;450&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;2.0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium II&lt;/td&gt;&lt;td&gt;233&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;1.1&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium MMX&lt;/td&gt;&lt;td&gt;200&lt;/td&gt;&lt;td&gt;SolNET (BrydDES)&lt;/td&gt;&lt;td&gt;1.0&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;DES software benchmark results&lt;/div&gt;

&lt;p&gt;&lt;/p&gt;

&lt;table class=&#39;ui compact attached celled table&#39;&gt;
  &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Processor&lt;/th&gt;&lt;th&gt;Clock rate (MHz)&lt;/th&gt;&lt;th&gt;Core&lt;/th&gt;&lt;th&gt;Speed (Mkeys/sec)&lt;/th&gt;&lt;/tr&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP 2500+ (Barton)&lt;/td&gt;&lt;td&gt;1833&lt;/td&gt;&lt;td&gt;SS 2-pipe&lt;/td&gt;&lt;td&gt;6.0&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP 1900+&lt;/td&gt;&lt;td&gt;1600&lt;/td&gt;&lt;td&gt;SS 2-pipe&lt;/td&gt;&lt;td&gt;5.3&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium IV HT&lt;/td&gt;&lt;td&gt;3060&lt;/td&gt;&lt;td&gt;DG 3-pipe&lt;/td&gt;&lt;td&gt;4.3&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium IV&lt;/td&gt;&lt;td&gt;2533&lt;/td&gt;&lt;td&gt;DG 3-pipe&lt;/td&gt;&lt;td&gt;3.5&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Duron&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;&lt;td&gt;SS 2-pipe&lt;/td&gt;&lt;td&gt;3.1&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium IV-M&lt;/td&gt;&lt;td&gt;1700&lt;/td&gt;&lt;td&gt;DG 3-pipe&lt;/td&gt;&lt;td&gt;2.4&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;PowerPC 740/750 G3&lt;/td&gt;&lt;td&gt;900&lt;/td&gt;&lt;td&gt;MH 1-pipe&lt;/td&gt;&lt;td&gt;2.3&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium III-M&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;&lt;td&gt;SES 2-pipe&lt;/td&gt;&lt;td&gt;2.1&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium III&lt;/td&gt;&lt;td&gt;533&lt;/td&gt;&lt;td&gt;SES 2-pipe&lt;/td&gt;&lt;td&gt;1.1&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron-A&lt;/td&gt;&lt;td&gt;450&lt;/td&gt;&lt;td&gt;SES 2-pipe&lt;/td&gt;&lt;td&gt;0.9&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium II&lt;/td&gt;&lt;td&gt;233&lt;/td&gt;&lt;td&gt;SES 2-pipe&lt;/td&gt;&lt;td&gt;0.5&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;RC5 software benchmark results&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>CPU price/performance tables</title>
      <link>https://ianhowson.com/fpga/cpu-price-performance/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/cpu-price-performance/</guid>
      <description>&lt;table class=&#39;ui small compact attached celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th colspan=&#39;4&#39;&gt;&lt;/th&gt;&lt;th colspan=&#39;2&#39;&gt;RC5&lt;/th&gt;&lt;th colspan=&#39;2&#39;&gt;DES&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;Family&lt;/th&gt;&lt;th&gt;Rating&lt;/th&gt;&lt;th&gt;MHz&lt;/th&gt;&lt;th&gt;Price&lt;/th&gt;&lt;th&gt;Mk/s&lt;/th&gt;&lt;th&gt;$/Mk/s&lt;/th&gt;&lt;th&gt;Mk/s&lt;/th&gt;&lt;th&gt;$/Mk/s&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP&lt;/td&gt;&lt;td&gt;1900+&lt;/td&gt;&lt;td&gt;1600&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;5.3&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;9.0&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP&lt;/td&gt;&lt;td&gt;2000+&lt;/td&gt;&lt;td&gt;1667&lt;/td&gt;&lt;td&gt;$101&lt;/td&gt;&lt;td&gt;5.5&lt;/td&gt;&lt;td&gt;$18.27&lt;/td&gt;&lt;td&gt;9.4&lt;/td&gt;&lt;td&gt;$10.76&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP&lt;/td&gt;&lt;td&gt;2200+&lt;/td&gt;&lt;td&gt;1800&lt;/td&gt;&lt;td&gt;$108&lt;/td&gt;&lt;td&gt;6.0&lt;/td&gt;&lt;td&gt;$18.14&lt;/td&gt;&lt;td&gt;10.1&lt;/td&gt;&lt;td&gt;$10.68&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP&lt;/td&gt;&lt;td&gt;2400+&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;$125&lt;/td&gt;&lt;td&gt;6.6&lt;/td&gt;&lt;td&gt;$18.80&lt;/td&gt;&lt;td&gt;11.3&lt;/td&gt;&lt;td&gt;$11.07&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP (Barton)&lt;/td&gt;&lt;td&gt;2500+&lt;/td&gt;&lt;td&gt;1833&lt;/td&gt;&lt;td&gt;$135&lt;/td&gt;&lt;td&gt;&lt;strong&gt;6.0&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;$22.58&lt;/td&gt;&lt;td&gt;10.3&lt;/td&gt;&lt;td&gt;$13.14&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP (Barton)&lt;/td&gt;&lt;td&gt;2600+&lt;/td&gt;&lt;td&gt;2083&lt;/td&gt;&lt;td&gt;$156&lt;/td&gt;&lt;td&gt;6.8&lt;/td&gt;&lt;td&gt;$22.93&lt;/td&gt;&lt;td&gt;11.7&lt;/td&gt;&lt;td&gt;$13.34&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP (Barton)&lt;/td&gt;&lt;td&gt;2700+&lt;/td&gt;&lt;td&gt;2167&lt;/td&gt;&lt;td&gt;$212&lt;/td&gt;&lt;td&gt;7.1&lt;/td&gt;&lt;td&gt;$29.86&lt;/td&gt;&lt;td&gt;12.2&lt;/td&gt;&lt;td&gt;$17.38&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP (Barton)&lt;/td&gt;&lt;td&gt;2800+&lt;/td&gt;&lt;td&gt;2086&lt;/td&gt;&lt;td&gt;$275&lt;/td&gt;&lt;td&gt;6.8&lt;/td&gt;&lt;td&gt;$40.34&lt;/td&gt;&lt;td&gt;11.7&lt;/td&gt;&lt;td&gt;$23.48&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP (Barton)&lt;/td&gt;&lt;td&gt;3000+&lt;/td&gt;&lt;td&gt;2167&lt;/td&gt;&lt;td&gt;$397&lt;/td&gt;&lt;td&gt;7.1&lt;/td&gt;&lt;td&gt;$56.01&lt;/td&gt;&lt;td&gt;12.2&lt;/td&gt;&lt;td&gt;$32.59&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Athlon XP (Barton)&lt;/td&gt;&lt;td&gt;3200+&lt;/td&gt;&lt;td&gt;2250&lt;/td&gt;&lt;td&gt;$687&lt;/td&gt;&lt;td&gt;7.4&lt;/td&gt;&lt;td&gt;$93.32&lt;/td&gt;&lt;td&gt;12.7&lt;/td&gt;&lt;td&gt;$54.30&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Duron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;1000&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;3.1&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;5.4&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Duron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;1400&lt;/td&gt;&lt;td&gt;$51&lt;/td&gt;&lt;td&gt;4.3&lt;/td&gt;&lt;td&gt;$11.73&lt;/td&gt;&lt;td&gt;7.6&lt;/td&gt;&lt;td&gt;$6.73&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Duron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;1600&lt;/td&gt;&lt;td&gt;$58&lt;/td&gt;&lt;td&gt;5.0&lt;/td&gt;&lt;td&gt;$11.73&lt;/td&gt;&lt;td&gt;8.6&lt;/td&gt;&lt;td&gt;$6.73&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;td&gt;$94&lt;/td&gt;&lt;td&gt;&lt;i&gt;2.8&lt;/i&gt;&lt;/td&gt;&lt;td&gt;$33.44&lt;/td&gt;&lt;td&gt;8.1&lt;/td&gt;&lt;td&gt;$11.51&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2200&lt;/td&gt;&lt;td&gt;$102&lt;/td&gt;&lt;td&gt;&lt;i&gt;3.1&lt;/i&gt;&lt;/td&gt;&lt;td&gt;$32.85&lt;/td&gt;&lt;td&gt;8.9&lt;/td&gt;&lt;td&gt;$11.38&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2400&lt;/td&gt;&lt;td&gt;$115&lt;/td&gt;&lt;td&gt;3.4&lt;/td&gt;&lt;td&gt;$33.96&lt;/td&gt;&lt;td&gt;9.8&lt;/td&gt;&lt;td&gt;$11.83&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2500&lt;/td&gt;&lt;td&gt;$123&lt;/td&gt;&lt;td&gt;3.5&lt;/td&gt;&lt;td&gt;$35.07&lt;/td&gt;&lt;td&gt;10.2&lt;/td&gt;&lt;td&gt;$12.07&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Celeron&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2600&lt;/td&gt;&lt;td&gt;$130&lt;/td&gt;&lt;td&gt;&lt;i&gt;3.6&lt;/i&gt;&lt;/td&gt;&lt;td&gt;$36.11&lt;/td&gt;&lt;td&gt;10.6&lt;/td&gt;&lt;td&gt;$12.30&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;1800&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;7.5&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2400&lt;/td&gt;&lt;td&gt;$248&lt;/td&gt;&lt;td&gt;3.3&lt;/td&gt;&lt;td&gt;$74.84&lt;/td&gt;&lt;td&gt;9.8&lt;/td&gt;&lt;td&gt;$25.43&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2533&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;3.5&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;10.3&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2667&lt;/td&gt;&lt;td&gt;$287&lt;/td&gt;&lt;td&gt;3.7&lt;/td&gt;&lt;td&gt;$77.95&lt;/td&gt;&lt;td&gt;10.8&lt;/td&gt;&lt;td&gt;$26.49&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2800&lt;/td&gt;&lt;td&gt;$391&lt;/td&gt;&lt;td&gt;3.9&lt;/td&gt;&lt;td&gt;$101.04&lt;/td&gt;&lt;td&gt;11.4&lt;/td&gt;&lt;td&gt;$34.33&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;3060&lt;/td&gt;&lt;td&gt;$586&lt;/td&gt;&lt;td&gt;4.2&lt;/td&gt;&lt;td&gt;$138.68&lt;/td&gt;&lt;td&gt;12.4&lt;/td&gt;&lt;td&gt;$47.12&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4 HT&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2400&lt;/td&gt;&lt;td&gt;$265&lt;/td&gt;&lt;td&gt;3.4&lt;/td&gt;&lt;td&gt;$78.44&lt;/td&gt;&lt;td&gt;9.8&lt;/td&gt;&lt;td&gt;$27.11&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4 HT&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2667&lt;/td&gt;&lt;td&gt;$320&lt;/td&gt;&lt;td&gt;3.7&lt;/td&gt;&lt;td&gt;$85.38&lt;/td&gt;&lt;td&gt;10.8&lt;/td&gt;&lt;td&gt;$29.51&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4 HT&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;2800&lt;/td&gt;&lt;td&gt;$405&lt;/td&gt;&lt;td&gt;3.9&lt;/td&gt;&lt;td&gt;$102.82&lt;/td&gt;&lt;td&gt;11.4&lt;/td&gt;&lt;td&gt;$35.53&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4 HT&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;3060&lt;/td&gt;&lt;td&gt;$603&lt;/td&gt;&lt;td&gt;&lt;strong&gt;4.3&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;$140.17&lt;/td&gt;&lt;td&gt;12.4&lt;/td&gt;&lt;td&gt;$48.44&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Pentium 4 HT&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;3200&lt;/td&gt;&lt;td&gt;$920&lt;/td&gt;&lt;td&gt;4.5&lt;/td&gt;&lt;td&gt;$204.59&lt;/td&gt;&lt;td&gt;13.0&lt;/td&gt;&lt;td&gt;$70.70&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;CPU price/performance&lt;/div&gt;

&lt;p&gt;Benchmark results that were directly gathered are shown in &lt;strong&gt;bold type&lt;/strong&gt;. Benchmark results that were obtained from the distributed.net database are shown in &lt;em&gt;italics&lt;/em&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Key search machine 2 interface</title>
      <link>https://ianhowson.com/fpga/ks2-interface/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/ks2-interface/</guid>
      <description>

&lt;h2 id=&#34;registers&#34;&gt;Registers&lt;/h2&gt;

&lt;p&gt;The registers available to the programmer are:&lt;/p&gt;

&lt;table class=&#39;ui attached celled table&#39;&gt;[H]
  &lt;thead&gt;
    &lt;tr&gt;&lt;th&gt;Address&lt;/th&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Read/write&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;BUFFER&lt;/td&gt;&lt;td&gt;See text&lt;/td&gt;&lt;td&gt;Read/write&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;CTEXT&lt;/td&gt;&lt;td&gt;Sets the ciphertext to use&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;PTEXT&lt;/td&gt;&lt;td&gt;Sets the plaintext to use&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;IV&lt;/td&gt;&lt;td&gt;Sets the initialisation vector to use&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Revised key search machine registers&lt;/div&gt;

&lt;p&gt;The BUFFER register has the following format:&lt;/p&gt;

&lt;table class=&#39;ui attached celled table&#39;&gt;
  &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Bits&lt;/th&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;0&amp;ndash;3&lt;/td&gt;&lt;td&gt;VERSION&lt;/td&gt;&lt;td&gt;Protocol version (2)&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;DVALID&lt;/td&gt;&lt;td&gt;Set when the machine is ready for a new command&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;unused&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;RW&lt;/td&gt;&lt;td&gt;Specifies whether this command describes a read or write operation&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;unused&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;8&amp;ndash;15&lt;/td&gt;&lt;td&gt;ADDR&lt;/td&gt;&lt;td&gt;Specifies the target address for the write or read&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;16&amp;ndash;63&lt;/td&gt;&lt;td&gt;DATA&lt;/td&gt;&lt;td&gt;See text&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Revised key search machine registers&lt;/div&gt;

&lt;p&gt;VERSION and DVALID function identically to the &lt;a href=&#34;https://ianhowson.com/fpga/ks1-interface/&#34;&gt;original key search
machine&lt;/a&gt;. In this version of
the machine the BUFFER register indirectly controls the search bus.
A write is performed by setting RW to 1 and specifying the data in
the DATA register. A read is performed by writing a word with RW set
to 0 and then polling the BUFFER register until DVALID goes high.
The data will be contained in the space allocated to the DATA field.&lt;/p&gt;

&lt;p&gt;A read or write through the BUFFER register always sets or retrieves
the key in use by a search unit. The exact interpretation of the DATA
field depends on the key generator in use. The intended purpose is
for DATA to be interpreted as a block number during a write, and treated
as the key number (least significant 32 bits) during a read.&lt;/p&gt;

&lt;p&gt;When reading through the BUFFER register, 32 bits are used by the
key value. The other 8 bits are used by the search unit to report
status information:&lt;/p&gt;

&lt;table class=&#39;ui attached celled table&#39;&gt;
  &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Bits&lt;/th&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;KEYVALID&lt;/td&gt;&lt;td&gt;Set when the search unit has a key block to search through&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;RUNNING&lt;/td&gt;&lt;td&gt;Set when the search unit is searching its key block&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;2&amp;ndash;7&lt;/td&gt;&lt;td&gt;unused&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;8&amp;ndash;47&lt;/td&gt;&lt;td&gt;KEY&lt;/td&gt;&lt;td&gt;The least significant 32 bits of the key value&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Revised key search machine search unit read format&lt;/div&gt;

&lt;h2 id=&#34;operation&#34;&gt;Operation&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Software checks presence and version of board by reading BUFFER register&lt;/li&gt;
&lt;li&gt;If VERSION is 0, program complains that FPGA has not been programmed&lt;/li&gt;
&lt;li&gt;If VERSION is not 2, program complains that software version does
not match or FPGA is incorrectly programmed&lt;/li&gt;

&lt;li&gt;&lt;p&gt;For each address where a search unit is believed to exist&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Software performs a read-through-BUFFER operation on the appropriate
address&lt;/li&gt;
&lt;li&gt;If the key returned is 1, a search unit exists at that address&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Software writes CTEXT, PTEXT and IV registers&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;For each search unit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Software writes initial key into search unit with a write-through-BUFFER operation&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Until correct key is located:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Software polls the RUNNING bit of each known search unit in turn.&lt;/li&gt;

&lt;li&gt;&lt;p&gt;If RUNNING on a search unit is 0:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Record the key value as a potential key&lt;/li&gt;
&lt;li&gt;Write the block number to the search unit so that it continues searching from the same point&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key that is read from the key buffer is the value that was in
the key generator at the time the search unit was halted, not the
key that caused the search unit to halt. The software must be aware
of the number of clock cycles required to process a single key and
subtract that value from the retrieved value. This value is algorithm
dependent.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Key search engine 1 interface</title>
      <link>https://ianhowson.com/fpga/ks1-interface/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/ks1-interface/</guid>
      <description>

&lt;h2 id=&#34;description&#34;&gt;Description&lt;/h2&gt;

&lt;p&gt;The interface allows the following operations to be performed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve the status of the board&lt;/li&gt;
&lt;li&gt;retrieve potential keys&lt;/li&gt;
&lt;li&gt;set the ciphertext, plaintext and IV to be used by all search units&lt;/li&gt;
&lt;li&gt;access and detect all search units controlled by the machine&lt;/li&gt;
&lt;li&gt;obtain the status of a single search unit&lt;/li&gt;
&lt;li&gt;set or retrieve the next key that will be processed by a search unit&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;registers&#34;&gt;Registers&lt;/h2&gt;

&lt;p&gt;The controller provides a number of registers which allow the computer
to access the machine&amp;rsquo;s resources.&lt;/p&gt;

&lt;table class=&#39;ui celled attached table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th&gt;Address&lt;/th&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Read/write&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;STATUS&lt;/td&gt;&lt;td&gt;Status register&lt;/td&gt;&lt;td&gt;Read only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;NEXTKEY&lt;/td&gt;&lt;td&gt;Retrieves the next potential key from the buffer&lt;/td&gt;&lt;td&gt;Read only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;CTEXT&lt;/td&gt;&lt;td&gt;Sets the known ciphertext&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;PTEXT&lt;/td&gt;&lt;td&gt;Sets the known plaintext&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;SUSEL&lt;/td&gt;&lt;td&gt;Select a search unit&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;SUKEY&lt;/td&gt;&lt;td&gt;Set or retrieve the current value of the key generator&lt;/td&gt;&lt;td&gt;Read/write&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;IV&lt;/td&gt;&lt;td&gt;Sets the initialisation vector&lt;/td&gt;&lt;td&gt;Write only&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Initial key search machine registers&lt;/div&gt;

&lt;p&gt;Only the least significant 3 bits are decoded. All transfers are 64
bits wide. This makes supporting key lengths greater than 64 bits
difficult.&lt;/p&gt;

&lt;p&gt;When reading the STATUS register, the least significant word contains
the following bits:&lt;/p&gt;

&lt;table class=&#39;ui celled attached table&#39;&gt;
  &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Bit&lt;/th&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;0&amp;ndash;3&lt;/td&gt;&lt;td&gt;VERSION&lt;/td&gt;&lt;td&gt;Described below&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;BUFFER_FULL&lt;/td&gt;&lt;td&gt;Set when the key buffer is full&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;DVALID&lt;/td&gt;&lt;td&gt;Set when the machine is ready for a new command&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;SU_PRESENT&lt;/td&gt;&lt;td&gt;Set when the currently selected search unit exists&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;SU_RUNNING&lt;/td&gt;&lt;td&gt;Set when the currently selected search unit is running&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;BUFFER_EMPTY&lt;/td&gt;&lt;td&gt;Set when the key buffer is empty&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Initial key search machine STATUS register&lt;/div&gt;

&lt;p&gt;VERSION specifies the version of the communication protocol. For this
iteration of the design, the version is &amp;ldquo;0001&amp;rdquo;.
It is also used to detect whether the board is programmed and operating
properly. As such, it should never be &amp;ldquo;0000&amp;rdquo;. This
catches the case where the FPGA has not been correctly programmed.&lt;/p&gt;

&lt;p&gt;DVALID is set when the machine is ready for a new command, and cleared
when a command is currently executing. Results from a write command
should not be read until DVALID is set.&lt;/p&gt;

&lt;p&gt;When written to, the SUSEL register selects a search unit. Any subsequent
commands that operate on a specific search unit operate on the search
unit specified in SUSEL. Writing to SUSEL also updates the value of
the SUKEY register and the SU_PRESENT and SU_RUNNING bits in the
STATUS register. The SUSEL register must be repeatedly written to
in order to keep this data up to date.&lt;/p&gt;

&lt;p&gt;SU_RUNNING is set when the last selected search unit is running,
and cleared when the search unit is halted. A search unit might be
halted if it has found a key and is waiting to have the key read,
or if no initial key has been set.&lt;/p&gt;

&lt;h2 id=&#34;operation&#34;&gt;Operation&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Software checks presence and version of board by reading STATUS register&lt;/li&gt;
&lt;li&gt;If VERSION is 0, program complains that FPGA has not been programmed&lt;/li&gt;
&lt;li&gt;If VERSION is not 1, program complains that software version does
not match or FPGA is incorrectly programmed&lt;/li&gt;

&lt;li&gt;&lt;p&gt;For each address where a search unit is believed to exist&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Software writes the address to SUSEL&lt;/li&gt;
&lt;li&gt;Software polls STATUS until DVALID goes high&lt;/li&gt;
&lt;li&gt;If SU_PRESENT is 1, the search unit exists and can be used; if 0,
search unit does not exist&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Software writes CTEXT, PTEXT and IV registers&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;For each search unit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Software waits for DVALID flag to go high&lt;/li&gt;
&lt;li&gt;Software selects a search unit (sets SUSEL register)&lt;/li&gt;
&lt;li&gt;Software writes initial key into search unit (writes SUKEY register)&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Until correct key is located:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Software polls STATUS register to determine if any potential keys
have been located&lt;/li&gt;
&lt;li&gt;If there is a pending key, read it out of the buffer&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key that is written into the key buffer is always the key that
was in the key generator at the time the search unit was halted, not
the key that caused the search unit to halt. The software must be
aware of the number of clock cycles required to process a single key,
and subtract that value from the value stored in the key buffer. This
value is algorithm dependent.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Design</title>
      <link>https://ianhowson.com/fpga/design/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/design/</guid>
      <description>

&lt;h2 id=&#34;conceptual-design&#34;&gt;Conceptual design&lt;/h2&gt;

&lt;p&gt;Most key search machines are designed around similar ideas. A controller
operates a number of independent search units. This controller usually
interfaces with a general purpose computer. Each search unit contains
a key generator, decryptor (or encryptor) and comparator. The key
generator produces trial keys that need to be checked. Some designs
combine the key generator and decryptor modules to improve performance.
The decryptor decrypts the known ciphertext with the trial key. An
encryptor can also be used with some limitations. The comparator checks
the plaintext that is generated by the decryptor to see if it is correct.
If it is, the controller is signalled.&lt;/p&gt;

&lt;p&gt;Due to its complexity, the cipher module is usually considered to
be the bottleneck in the system. All other modules must be able to
operate at least as quickly.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/keysearch-conceptual.svg&#34;&gt;
  &lt;p&gt;Conceptual key search machine design&lt;/p&gt;
&lt;/div&gt;

&lt;h3 id=&#34;key-generator&#34;&gt;Key generator&lt;/h3&gt;

&lt;p&gt;A counter is an obvious choice for a key generator. The count sequence
is predictable, but performance is not adequate on all devices. Xilinx
FPGAs provide a dedicated carry chain which improves performance significantly.&lt;/p&gt;

&lt;p&gt;The EFF DES cracker &lt;a href=&#34;#ref10&#34;&gt;[10]&lt;/a&gt; uses a counter where
the 24 most significant bits are held constant and the 32 least significant
bits counted. This technique is useful to reduce a counter&amp;rsquo;s propagation
delay. The most significant bits must be counted and loaded externally.
This scheme introduces the idea of a &amp;ldquo;block&amp;rdquo; of keys &amp;ndash; a subset of the key space which can be searched in a short period
of time. The 24 constant bits can be viewed as the block number. There
must be a mechanism with which the controller can detect the end of
block condition and start the key generator on a new block.&lt;/p&gt;

&lt;p&gt;One design &lt;a href=&#34;#ref12&#34;&gt;[12]&lt;/a&gt; uses a single counter that is shared
between all of the available search units. Each unit adds or concatenates
a unique ID to the counter value to obtain its trial key. This scheme
works well when the number of search units is a power of two since
the ID can simply be concatenated, saving resources.&lt;/p&gt;

&lt;p&gt;A number of designs &lt;a href=&#34;#ref16&#34;&gt;[16]&lt;/a&gt;, &lt;a href=&#34;#ref19&#34;&gt;[19]&lt;/a&gt;, &lt;a href=&#34;#ref15&#34;&gt;[15]&lt;/a&gt;
use Linear Feedback Shift Registers (LFSRs) to generate trial keys.
The main advantage of an LFSR over a counter is its high speed; propagation
delays remain constant regardless of the length of the LFSR. One disadvantage
of LFSRs is that their count sequence is nonlinear. Evenly breaking
up a large key space between search units requires more effort than
with a linear counter. One simple scheme is to use a shorter LFSR
than usual and set the remainder of the key bits to a constant value.
This works similarly to the block scheme for linear counters described
above; the LFSR can be 32 bits long, and the remaining 24 bits set
by the controller.&lt;/p&gt;

&lt;h3 id=&#34;cipher-module&#34;&gt;Cipher module&lt;/h3&gt;

&lt;h4 id=&#34;encryptor-vs-decryptor&#34;&gt;Encryptor vs. decryptor&lt;/h4&gt;

&lt;p&gt;When performing a known plaintext attack, the choice of encryptor
or decryptor is dependent on which has higher performance. Most ciphers
(including all stream ciphers) have identical performance regardless
of their mode. Some may have a more efficient implementation when
implemented in one way or another. RC5 &lt;a href=&#34;#ref6&#34;&gt;[6]&lt;/a&gt; is an example
of this. An RC5 encryptor can operate more efficiently than a decryptor
because the order that the S array is used in during key setup matches
that used in the encryption stage, allowing the phases to overlap.&lt;/p&gt;

&lt;p&gt;Most key search machines will use a decryptor. Ciphertext-only attacks
require a decryptor. Known-plaintext attacks will also require a decryptor
under some conditions. This is to allow a more flexible comparator
scheme that can detect correct plaintext regardless of imperfect knowledge.&lt;/p&gt;

&lt;h4 id=&#34;iterated-vs-pipelined&#34;&gt;Iterated vs. pipelined&lt;/h4&gt;

&lt;p&gt;Two major approaches are used when implementing the cipher module;
a long pipeline or a small iterative module. Most ciphers are comprised
of a number of round functions, making an iterative implementation
natural. Pipelined approaches can achieve much greater speeds at the
expense of FPGA resources. DES is frequently implemented as either
a small iterated module or a long pipeline. The iterated version takes
a multiple of 16 cycles (one for each application of the round function)
to produce one block of output, while the pipelined version can produce
one unit of output every clock cycle. The resource gains made by using
an iterated cipher implementation almost never outweigh the loss of
speed. Resource constraints may force a cipher to be implemented in
iterative form.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;An FPGA-Based Performance Evaluation of the AES Block Cipher
Candidate Algorithm Finalists&lt;/em&gt; &lt;a href=&#34;#ref32&#34;&gt;[32]&lt;/a&gt; explores
these issues in depth. It presents FPGA performance figures for the
MARS, RC6, Rijndael, Serpent and Twofish ciphers. Loop unrolling,
pipelining and sub-pipelining are investigated as architectural choices.
In most cases, a pipelined implementation was fastest. &lt;em&gt;Fast
DES Implementation for FPGAs and Its Application to a Universal Key-Search
Machine&lt;/em&gt; &lt;a href=&#34;#ref18&#34;&gt;[18]&lt;/a&gt; explores pipelined, combinatorial and
iterative approaches for the DES cipher.&lt;/p&gt;

&lt;h4 id=&#34;partially-evaluated-circuits&#34;&gt;Partially evaluated circuits&lt;/h4&gt;

&lt;p&gt;Some ciphers may benefit from having values precomputed during compilation
time. This is usually used to achieve higher performance in systems
that have very infrequent key changes. FPGAs fit well with this approach,
allowing the programmed-in key to be changed with only a small period
of downtime. The speed efficiency of an FPGA DES implementation was
improved significantly using this technique &lt;a href=&#34;#ref33&#34;&gt;[33]&lt;/a&gt;.
The utility of this technique in a key search machine is dependent
on the cipher. The plaintext or ciphertext would be compiled into
the design instead of the key. This may yield improvements for some
ciphers, but exploratory experiments only showed very small resource
savings.&lt;/p&gt;

&lt;h3 id=&#34;comparator&#34;&gt;Comparator&lt;/h3&gt;

&lt;p&gt;The environment that the key search machine operates in determines
the choice of comparator. If a perfect ciphertext/plaintext pair is
known, simply checking for bit equality will be adequate. Ignoring
certain bits in the trial plaintext may be a useful extension when
only a portion of the sought plaintext is known.&lt;/p&gt;

&lt;p&gt;If a ciphertext-only attack will be attempted or the plaintext is
not precisely known, it may be necessary to implement a heuristic
matching scheme. Such a scheme will generally flag a number of keys
as potential matches and allow humans or software to check them further
for correctness.&lt;/p&gt;

&lt;p&gt;A simple scheme to detect ASCII text is to require that the most significant
bit of each plaintext byte be 0. This can be further generalised into
a statistical approach that scores each plaintext byte in the plaintext
according to its probability of occurrence. &lt;em&gt;A Programmable Plaintext
Recognizer&lt;/em&gt; &lt;a href=&#34;#ref34&#34;&gt;[34]&lt;/a&gt; uses similar ideas to extend
Wiener&amp;rsquo;s theoretical key search machine &lt;a href=&#34;#ref15&#34;&gt;[15]&lt;/a&gt;.
Applying compression to a message before encrypting it causes their
heuristics to fail. This is an effective countermeasure against any
statistical comparator, since the compression makes the message &amp;ldquo;look
like&amp;rdquo; random data.&lt;/p&gt;

&lt;p&gt;Some applications may also benefit from a specialist comparator. A
machine designed to solve the Blaze Challenge &lt;a href=&#34;#ref23&#34;&gt;[23]&lt;/a&gt;
would need a specialist comparator that will find a match on any block
that fits the form of the solution (in this case, when the plaintext
is composed only of a single repeated byte.)&lt;/p&gt;

&lt;h3 id=&#34;returning-matches&#34;&gt;Returning matches&lt;/h3&gt;

&lt;p&gt;At some point in a key search machine&amp;rsquo;s operation it will be necessary
to return potential keys to the host computer. Several schemes have
been used to achieve this goal.&lt;/p&gt;

&lt;p&gt;Most key search machines simply stop running when a match is found
and wait for the computer to read out the key value. This is simple
and flexible, but inefficient when many keys need to be returned &amp;ndash;
while a search unit is waiting to release the key it halted on, it
cannot be used to search the key space.&lt;/p&gt;

&lt;p&gt;A hardware buffer can be used to reduce the waiting time. When a key
needs to be returned it is read into the hardware buffer, and the
controller can read the keys out. This has the advantage of improved
efficiency, but costs hardware resources.&lt;/p&gt;

&lt;p&gt;One novel approach is to measure the amount of time needed to find
the key. Using knowledge of how quickly the key space can be searched,
an approximate trial key can be found. A number of keys need to be
checked to account for timer inaccuracies. This method removes the
need for key storage and retrieval hardware.&lt;/p&gt;

&lt;h2 id=&#34;a-id-a-generic-hardware-a-a-generic-fpga-key-search-machine&#34;&gt;&lt;a id=&#39;A-generic-hardware&#39;&gt;&lt;/a&gt;A generic FPGA key search machine&lt;/h2&gt;

&lt;p&gt;The programming interface for this design is supplied in &lt;a href=&#34;https://ianhowson.com/fpga/ks1-interface/&#34;&gt;Key search engine 1 interface&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;goals&#34;&gt;Goals&lt;/h3&gt;

&lt;p&gt;To produce a FPGA-based key search machine which can operate independently
of cipher algorithm. It should communicate with a computer for instructions
and data. It should be reasonably scalable for large key cracks, and
be easily modifiable for ciphertext-only attacks. It should allow
rapid prototyping of key search machines for different ciphers.&lt;/p&gt;

&lt;h3 id=&#34;top-level-design&#34;&gt;Top-level design&lt;/h3&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/ks1-toplevel.svg&#34;&gt;
  &lt;p&gt;Initial key search machine top level design&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The bus provided by the Pilchard interface runs synchronously at 100
or 133MHz. The remainder of the key search machine must operate at
this speed.&lt;/p&gt;

&lt;p&gt;The top-level Status register provides general status information
for the entire key search machine.&lt;/p&gt;

&lt;p&gt;The key buffer stores potentially correct keys for the computer to
read out and check further. This prevents search units from being
paused for very long when a potential key is located. It is particularly
useful for ciphertext-only attacks, where there may be a large number
of potentially correct keys. 256 keys can be stored; this figure fully
utilises the four Block SelectRAM units that are needed to store a
64 bit word.&lt;/p&gt;

&lt;p&gt;The controller operates the search bus. It relays commands from the
computer to individual search units, stores the ciphertext and plaintext
registers, and polls each search unit on the bus to see if there are
any keys waiting. It uses a simple state machine. While there are
no commands waiting to execute, it polls search units to see if there
are any keys waiting. If a key is found, it is read into the key buffer.
If a command arrives, it temporarily stops polling and executes the
command.&lt;/p&gt;

&lt;h3 id=&#34;search-unit-design&#34;&gt;Search unit design&lt;/h3&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/ks1-searchunit.svg&#34;&gt;
  &lt;p&gt;Initial key search machine search unit design&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Each search unit has its own status register which the controller
uses to determine if a key has been located. The key generator provides
a trial key to the decryptor, which uses the key and the supplied
ciphertext to produce trial plaintext. This trial plaintext is compared
with the known plaintext or has a set of heuristics applied to determine
if it appears to be valid. If it is, the search unit is halted until
it is instructed to restart by the controller.&lt;/p&gt;

&lt;h2 id=&#34;another-generic-fpga-key-search-machine&#34;&gt;Another generic FPGA key search machine&lt;/h2&gt;

&lt;h3 id=&#34;motivation&#34;&gt;Motivation&lt;/h3&gt;

&lt;p&gt;Several problems were identified with the original key search machine
that justified the design of a new one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It was overly complicated. Modules within the design such as the key
buffer provided very low returns on their cost. Removing the key buffer
allowed the controller to be simplified, since it no longer needed
to poll search units for keys. Other simplifications were made within
the search units.&lt;/li&gt;
&lt;li&gt;Timing closure was never achieved. The original design was implemented
before a proper understanding of high-speed digital logic had been
attained. The fact that it actually worked can be put down to luck
and favourable operating conditions.&lt;/li&gt;
&lt;li&gt;Minor bugs remained that complicated the software design.&lt;/li&gt;
&lt;li&gt;The programming interface was more sophisticated than it needed to
be, which complicated the software further.&lt;/li&gt;
&lt;li&gt;The clock speed for search units was locked to that of the memory
bus. This turned out to be a larger handicap than was originally predicted.
The DES search unit could run at almost 180MHz according to the synthesis
tools, but was still locked to the 100MHz of the memory bus. This
could be increased to 133MHz by adjusting the motherboard jumpers,
but this was not a satisfactory solution. Better support for slow
ciphers was also needed.&lt;/li&gt;
&lt;li&gt;It could not easily support key lengths over 64 bits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new design and its driver software took approximately four days
to implement and debug. The programming interface is described in &lt;a href=&#34;https://ianhowson.com/fpga/ks2-interface/&#34;&gt;Key search machine 2 interfaces&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;design&#34;&gt;Design&lt;/h3&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/ks2-design.svg&#34;&gt;
  &lt;p&gt;Revised key search machine design&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;There are two controllers in this design; a master and a slave. The
master handles all communication with the host computer and links
to the slave with an asynchronous bus modelled closely on VME. The
slave controller&amp;rsquo;s only purpose is to link the asynchronous bus and
the search bus, which can run synchronously at any speed. This allows
search units to run at any speed, simplifying cipher implementation.&lt;/p&gt;

&lt;h3 id=&#34;search-unit-design-1&#34;&gt;Search unit design&lt;/h3&gt;

&lt;p&gt;The search unit is designed to use a block system for key allocation.
Only the block number is transmitted to the search unit. When retrieving
the key from the search unit, the least significant 32 bits of the
key are returned. The software is expected to track which search unit
is searching which block.&lt;/p&gt;

&lt;h3 id=&#34;search-bus&#34;&gt;Search bus&lt;/h3&gt;

&lt;p&gt;The search bus runs at the same speed as the search units. The two
clock domains (SDRAM clock and search unit clock) are linked with
an asynchronous bus using similar protocols to VME.&lt;/p&gt;

&lt;p&gt;High speed search units were later identified as a problem; it was
found to be difficult to route a wide high speed bus over the entire
FPGA and still meet timing constraints. A potential improvement to
the machine would be to decouple the clock rate of the search bus
from that of the search units or make the bus completely asynchronous.
The latter was the original intent of the asynchronous bus, but the
resources required to implement it made it unwieldy to use on every
search unit. It remains the best solution for a large-scale machine
(at least between FPGA devices).&lt;/p&gt;

&lt;h2 id=&#34;modules&#34;&gt;Modules&lt;/h2&gt;

&lt;h3 id=&#34;counters&#34;&gt;Counters&lt;/h3&gt;

&lt;p&gt;Two linear counters were implemented. The first was a simple 64 bit
counter. The second was designed to work with block schemes and added
functionality to allow counting to be inhibited for ciphers that do
not need a new key every clock cycle. It counts through 32 bits of
range and has a further 48 bits of range that is set externally.&lt;/p&gt;

&lt;h3 id=&#34;bit-equality-comparator&#34;&gt;Bit equality comparator&lt;/h3&gt;

&lt;p&gt;A comparator that checks for exact bit equality was implemented. It
was 64 bits wide. It flags a match when its two inputs are identical.
To ensure that it runs quickly enough with high clock speeds, it was
implemented as a short pipeline. On the first clock cycle, four 16
bit segments of the trial plaintext are compared individually. On
the second cycle if the result of these four comparisons is true,
a match is flagged.&lt;/p&gt;

&lt;h3 id=&#34;statistical-comparator&#34;&gt;Statistical comparator&lt;/h3&gt;

&lt;p&gt;A simple statistical comparator was implemented using some of the
ideas within &lt;a href=&#34;#ref34&#34;&gt;[34]&lt;/a&gt;. Its purpose is to use the
probabilities of different bytes within the produced plaintext to
determine if the plaintext &amp;ldquo;looks right&amp;rdquo;. The definition of &amp;ldquo;looks
right&amp;rdquo; varies depending on the attack scenario; English text would
have different statistical properties to an executable file, for example.&lt;/p&gt;

&lt;p&gt;The algorithm used is fairly simple. The comparator takes a 64 bit
input and splits it into 8 bit bytes. Each byte value has an assigned
&amp;ldquo;score&amp;rdquo; &amp;ndash; higher scores correspond with more frequently
occurring byte values. The scores for each byte are added and compared
against a threshold value. If the threshold is exceeded, a match is
flagged.&lt;/p&gt;

&lt;h4 id=&#34;implementation&#34;&gt;Implementation&lt;/h4&gt;

&lt;p&gt;Implementation of the algorithm was more challenging, but still straightforward.
The main design constraint was that the comparator be no slower than
any decryption module &amp;ndash; in this case, the DES module running
at over 149MHz and producing one word of plaintext per cycle. In order
to meet this timing requirement, steps of the algorithm were split
up as much as possible.&lt;/p&gt;

&lt;p&gt;The figure below shows the steps
performed by the implementation. Four RAM blocks were used to store
the byte value scores (8 bits each). Each RAM block has two ports,
allowing a total of eight memory lookups every cycle. The scores are
added in parallel in pairs to minimise delays on each cycle. Finally,
the total score is compared with the threshold (which is set by the
plaintext value). Splitting up the steps in this way produces a deep
pipeline, but allows very high clock rates.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/stat-compare.svg&#34;&gt;
  &lt;p&gt;Statistical comparator design&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The threshold comparison stage is the main timing bottleneck. Speed
improvements can be made by reducing the comparison resolution. By
only comparing the most significant four bits, synthesis reports a
maximum speed of 181MHz. The required resolution depends on the statistical
properties of the text being attacked.&lt;/p&gt;

&lt;p&gt;A small C program was written to generate character scores from files.
It counts the frequency of each character in the file. The scores
are then normalised down to 8 bits and output in a format suitable
for entry directly into the VHDL RAM initialisation code. This data
could also be used to modify the bitstream after compilation if desired.
This program allows the comparator to be &amp;ldquo;trained&amp;rdquo; on similar
input data to what is expected.&lt;/p&gt;

&lt;h3 id=&#34;a-id-des-cipher-module-a-des-cipher-module&#34;&gt;&lt;a id=&#39;DES-cipher-module&#39;&gt;&lt;/a&gt;DES cipher module&lt;/h3&gt;

&lt;p&gt;The DES implementation is a modified copy of the DES demonstration
provided with the Pilchard board, which is itself a modified version
of the Xilinx optimised DES implementation in &lt;a href=&#34;#ref35&#34;&gt;[35]&lt;/a&gt;.
The order of the round functions was reversed to convert the encryptor
into a decryptor.&lt;/p&gt;

&lt;p&gt;Registers were added to the key schedule logic, but later removed
when the efficient keying system described in &lt;a href=&#34;#ref19&#34;&gt;[19]&lt;/a&gt;
was implemented. This scheme integrated an LFSR key generator with
the DES key schedule logic. A 72 bit LFSR with taps suitable for a
56 bit LFSR was used. As the previous keys were shifted through the
LFSR they remain available to the key schedule logic, which can generate
the necessary subkeys with rotations. This saved approximately 500
slices that were previously used for subkey registers. Subkey generation
was essentially free, although fanout on the LFSR bits did reduce
the speed slightly.&lt;/p&gt;

&lt;p&gt;The possibility of attacking a key and its complement simultaneously
was considered. This halves the search space, but not the search time.
The decryption portion of DES comprised the bulk of the area requirement
in a hardware implementation, and this improvement only saves key
schedule logic. After implementing the LFSR keying scheme above, performance
improvements would be negligible.&lt;/p&gt;

&lt;p&gt;Using the XCV1000E, a key search machine containing controllers and
five search units was operated at 100MHz, giving a total search rate
of 500Mkeys/sec.&lt;/p&gt;

&lt;h3 id=&#34;a5-1-cipher-module&#34;&gt;A5/1 cipher module&lt;/h3&gt;

&lt;p&gt;The A5/1 implementation was produced from scratch using the algorithm
description given in &lt;a href=&#34;#ref7&#34;&gt;[7]&lt;/a&gt;. It aims to find the
initial key state rather than the key itself. Time constraints did
not allow the more efficient stream cipher attack in &lt;a href=&#34;https://ianhowson.com/fpga/background/#stream-ciphers&#34;&gt;Stream Ciphers&lt;/a&gt;
to be implemented, and so no further work was performed using this
module.&lt;/p&gt;

&lt;p&gt;The (already small) resources needed to implement the A5/1 module
could be further reduced by configuring the Xilinx LUTs as shift registers
&lt;a href=&#34;#ref36&#34;&gt;[36]&lt;/a&gt;. This would complicate key loading; the entire
key state could no longer be loaded in a single cycle.&lt;/p&gt;

&lt;h3 id=&#34;a-id-rc5-cipher-module-a-rc5-cipher-module&#34;&gt;&lt;a id=&#39;RC5-cipher-module&#39;&gt;&lt;/a&gt;RC5 cipher module&lt;/h3&gt;

&lt;h4 id=&#34;introduction&#34;&gt;Introduction&lt;/h4&gt;

&lt;p&gt;The RC5 implementation was produced completely from scratch using
the algorithm description given by Rivest &lt;a href=&#34;#ref6&#34;&gt;[6]&lt;/a&gt;. It implemented
RC5-32/12/9. It was intended to be used to complete the RSA Secret-Key
Challenge contests &lt;a href=&#34;#ref29&#34;&gt;[29]&lt;/a&gt;. The possibility
of connecting the complete key search machine to distributed.net &lt;a href=&#34;#ref26&#34;&gt;[26]&lt;/a&gt;
was considered as an extension.&lt;/p&gt;

&lt;p&gt;Few prior works in this area could be located. &lt;a href=&#34;#ref37&#34;&gt;[37]&lt;/a&gt; claims
to have schematics for a functional RC5 implementation on Xilinx FPGAs,
but they are no longer available. The author was not able to be contacted.
&lt;a href=&#34;#ref37&#34;&gt;[37]&lt;/a&gt; contains a Verilog model which was not found to
be useful.&lt;/p&gt;

&lt;h4 id=&#34;a-id-rc5-pipelined-design-a-pipelined-design&#34;&gt;&lt;a id=&#39;rc5-Pipelined-design&#39;&gt;&lt;/a&gt;Pipelined design&lt;/h4&gt;

&lt;p&gt;A fully pipelined design similar to that used for DES was investigated.
This possibility was considered to be impractical due to the large
number of registers needed for the S array.&lt;/p&gt;

&lt;p&gt;After implementing the iterative version, the possibility of implementing
a pipelined version was considered again. This time, the number of
LUTs required was identified as being excessive. A prototype implementation
determined that each stage of the key mixing phase would require 256
LUTs, and each half-round of the decryption phase would require 192
LUTs. Given 78 mixing steps and 24 decryption half-rounds, the number
of LUTs required is 24576 &amp;ndash; coincidentally, the exact
number of LUTs available on the Virtex 1000E. Many more would be required
for state decoding, communication, key generation, comparisons, routing
overhead and so on. This possibility was not investigated further,
but would almost certainly be feasible given more hardware resources
to work with. Such an implementation would be able to provide very
high search rates on sufficiently large FPGA devices.&lt;/p&gt;

&lt;h4 id=&#34;iterative-design&#34;&gt;Iterative design&lt;/h4&gt;

&lt;p&gt;An iterative design for the RC5 implementation was used. Block SelectRAM
memories within the FPGA were used to store the S array. The number
of RAM blocks was anticipated to be the limiting factor, similar to
the RC4 key search engine described in &lt;a href=&#34;#ref21&#34;&gt;[21]&lt;/a&gt;. The L array
was stored in three rotating registers; this eased timing constraints
and prevented reads and writes to the RAM becoming a bottleneck.&lt;/p&gt;

&lt;p&gt;The key mixing phase of RC5 took the bulk of the time needed to check
a key. It required 78 iterations, each of which consists of a read
and a write to the S and L arrays. To minimise the time required per
cycle, the key mixing stage of the algorithm was set up to operate
continuously on two separate regions of RAM. The initialisation and
decryption stages were arranged to work on the opposite region of
RAM. When a key mix phase completes, the decryption and initialisation
phases begin on that region of RAM. In this way, the average time
required to check a key would effectively be the time required to
perform the key mixing phase.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc5-ram.svg&#34;&gt;
  &lt;p&gt;RC5 RAM timing&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The key mix phase needs to be completed as quickly as possible. The
decryption and initialisation phases are not timing critical, and
can be completed more slowly in order to save FPGA resources. The
decryption module takes advantage of this by performing twice as many
rounds and interchanging the A and B registers at the end of each
round. In this way the subtract, shift, XOR and RAM lookup resources
can be reused. The initialisation module actually performs the additions
required to initialise the S array, even though these results could
be trivially precomputed. This saves FPGA resources.&lt;/p&gt;

&lt;p&gt;The general goal for the key mix operation is to complete as quickly
as possible. The general goal for the decryption and initialisation
operations is to use as few resources as possible, so long as the
time taken for these two operations does not exceed that needed by
the key mix operation.&lt;/p&gt;

&lt;h4 id=&#34;implementation-1&#34;&gt;Implementation&lt;/h4&gt;

&lt;p&gt;One problematic area in the implementation was the 32 bit barrel shifter
required by RC5. The initial naïve implementation required 352 slices;
with the help of &lt;a href=&#34;#ref39&#34;&gt;[39]&lt;/a&gt; this was improved to
80 slices. One shifter is required for each of the key mix stage and
the decryption stages. These account for a significant amount of the
resource usage. Some research and experimentation was conducted to
find smaller or faster shifter designs, without success. Shrinking
or speeding up the barrel shifters would provide large benefits to
the overall performance of the design.&lt;/p&gt;

&lt;p&gt;Running the module at 100MHz proved difficult. Routing delays introduced
after the place and route stage were the cause of the problem; congestion
was present at one of the RAM blocks. The delay at this point increased
when the number of search units was increased, suggesting that floorplanning
may be useful to reduce the delay or at least make it consistent.
A brief unsuccessful attempt at floorplanning was made.&lt;/p&gt;

&lt;p&gt;To solve this problem, two approaches were used. Originally, two RAM
blocks were used to provide a 32 bit wide RAM. One port was used by
the key schedule module and the other by the decryptor and initialisation
module. The number of RAM blocks was doubled and writes made to both
pairs. Reads could be made from either pair of RAM blocks, allowing
unrelated logic to be moved to different areas of the FPGA by the
place and route tools. This helped to reduce delays. The RAM blocks
were not being otherwise used. Adding a wait state after RAM access
allowed the module to meet its timing requirements at the cost of
reduced performance.&lt;/p&gt;

&lt;p&gt;The total time required to check an RC5 key is 469 clock cycles. Each
iteration needs 6 clock cycles, and 78 iterations are required. One
cycle is needed for initialisation. At the target clock speed of 100MHz,
this gives a search rate of 213,220 keys/sec. 16 search units could
be fit into an XCV1000E device, giving an aggregate search rate of
3.4Mkeys/sec.&lt;/p&gt;

&lt;p&gt;The RC5 cipher module consumed 595 slices. The implementation in &lt;a href=&#34;#ref37&#34;&gt;[37]&lt;/a&gt;
required 510 XC4000 CLBs; each XC4000 CLB &lt;a href=&#34;#ref40&#34;&gt;[40]&lt;/a&gt; is
roughly equivalent to a Virtex slice.&lt;/p&gt;

&lt;h4 id=&#34;optimisations&#34;&gt;Optimisations&lt;/h4&gt;

&lt;p&gt;The possibility of increasing the clock speed of the RC5 module was
investigated, but found to be counterproductive. The intent was to
balance the time spent in each pipeline stage better, hopefully overcoming
the increase in resource usage and number of stages required. Registers
were inserted at locations responsible for timing limitations. These
registers did not increase resource usage significantly due to the
structure of the Virtex slice &lt;a href=&#34;#ref41&#34;&gt;[41]&lt;/a&gt;. The number
of cycles per round increased from 5 to 8 and the synthesis clock
speed from 102MHz to 142MHz, which was not an effective tradeoff.
Many previously trivial operations such as the comparison needed to
be split into stages instead of being simple combinatorial operations,
which greatly increased the complexity of the source code. The overall
resource usage also increased.&lt;/p&gt;

&lt;p&gt;Replacing each bit in the three registers used to implement the L
array with a short LUT shift register would reduce the resources allocated
and potentially ease routing.&lt;/p&gt;

&lt;p&gt;Some work was conducted to see if it was possible to take shortcuts
in the key mixing operation; this was unsuccessful.&lt;/p&gt;

&lt;p&gt;Including the ciphertext and IV at synthesis time reduced resource
usage for the search unit to 539 slices. This would be a worthwhile
approach for an attack where the ciphertext and IV are known in advance.
It would generally not be suitable for an ASIC implementation.&lt;/p&gt;

&lt;p&gt;This module was implemented before the second key search machine.
Performance could be improved by running at a lower clock speed with
fewer pipeline stages.&lt;/p&gt;

&lt;h2 id=&#34;a-name-software-benchmarks-a-software-benchmarks&#34;&gt;&lt;a name=&#39;Software-benchmarks&#39;&gt;&lt;/a&gt;Software benchmarks&lt;/h2&gt;

&lt;h3 id=&#34;methodology&#34;&gt;Methodology&lt;/h3&gt;

&lt;p&gt;Benchmarks were conducted on a number of different CPUs to measure
how quickly they could perform key searches. Setting up and running
the benchmarks was very rapid, so many different CPUs were tested
to determine if any would provide significant price/performance advantages.&lt;/p&gt;

&lt;p&gt;Pre-written benchmarks were used. These benchmarks were faster and
more thoroughly tested than what could otherwise be produced in the
available time.&lt;/p&gt;

&lt;h3 id=&#34;a-name-software-benchmark-results-a-results&#34;&gt;&lt;a name=&#39;Software-benchmark-results&#39;&gt;&lt;/a&gt;Results&lt;/h3&gt;

&lt;p&gt;Each benchmark was run at least three times until consistent results
were achieved. Linux benchmarks were run as the root user, prefixing
the benchmark command with &lt;code&gt;nice -20&lt;/code&gt; to ensure that the benchmark
ran with the highest priority.&lt;/p&gt;

&lt;p&gt;Tables containing the gathered results are given in &lt;a href=&#34;https://ianhowson.com/fpga/cpu-benchmarks/&#34;&gt;CPU benchmark results&lt;/a&gt;.
distributed.net maintains an online database &lt;a href=&#34;#ref42&#34;&gt;[42]&lt;/a&gt;
of search rates for each CPU, allowing some of the benchmark results
to be verified.&lt;/p&gt;

&lt;h4 id=&#34;des&#34;&gt;DES&lt;/h4&gt;

&lt;p&gt;Two benchmark programs were used: the distributed.net client version
19991117 (which had to be compiled from source), and the SolNET DES
client &lt;a href=&#34;#ref43&#34;&gt;[43]&lt;/a&gt;. The distributed.net client gave
far better benchmark results, but could only be run on Linux machines
with appropriate compiler versions. Neither DES client had been optimised
for modern CPUs.&lt;/p&gt;

&lt;p&gt;The command &lt;code&gt;dnetc -benchmark des&lt;/code&gt; was used to run the distributed.net
benchmarks, and &lt;code&gt;desclient-x86-linux -m&lt;/code&gt; for the SolNET benchmarks.
The SolNET client&amp;rsquo;s benchmark results were unstable on faster CPUs,
requiring them to be run a large number of times.&lt;/p&gt;

&lt;p&gt;distributed.net maintains an online database of search rates for each
CPU &lt;a href=&#34;#ref42&#34;&gt;[42]&lt;/a&gt;. The DES benchmarks for newer CPUs could
not be verified because the CPUs did not exist at the time that the
online benchmarks were gathered. The results for older CPUs were far
higher than those in the online database.&lt;/p&gt;

&lt;p&gt;Benchmarks for Celeron, P4HT and Athlon XP (Barton) CPUs had to be
inferred from others based on the same core. The Mkeys/sec/MHz ratios
obtained for RC5 remained fairly constant under this assumption, and
this is assumed to remain true for DES.&lt;/p&gt;

&lt;h4 id=&#34;rc5-72&#34;&gt;RC5-72&lt;/h4&gt;

&lt;p&gt;The distributed.net client version 03033120 was used to conduct RC5-72
benchmarks. Binaries from the distributed.net website were downloaded
for the relevant platform, unpacked, and the benchmark executed from
the command line with &lt;code&gt;dnetc -benchmark rc5-72&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The RC5 benchmark results were verified against those in the distributed.net
database. Confusion is apparent with the Athlon speed ratings; it
is not obvious whether an entry marked &amp;ldquo;1900&amp;rdquo; refers to a 1900+
or a 1900MHz Athlon. Nevertheless, the RC5 benchmark results gathered
were found to mesh well with those in the database.&lt;/p&gt;

&lt;p&gt;No Celeron machines based on the Pentium IV core were available to
run benchmarks on, so the online benchmark results were used for analysis.
These appeared internally consistent, so a Mkeys/sec/MHz rating was
determined and averaged across the available benchmark results to
reduce error. This rating was used to infer the missing benchmark
results.&lt;/p&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p&gt;&lt;a name=&#39;ref6&#39;&gt;&lt;/a&gt;[6] R. L. Rivest, &amp;ldquo;The RC5 encryption algorithm,&amp;rdquo; in &lt;em&gt;Practical Cryptography for Data Internetworks&lt;/em&gt;, W. Stallings, Ed. IEEE Computer Society Press, 1996.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref7&#39;&gt;&lt;/a&gt;[7] J. Keller and B. Seitz, &amp;ldquo;A hardware-based attack on the A5/1 stream cipher,&amp;rdquo; in &lt;em&gt;APC 2001&lt;/em&gt;. VDE Verlag, 2001, pp. 155 158. [Online]. Available: &lt;a href=&#34;http://www.informatik.fernuni-hagen.de/ti2/papers/apc2001-nal.pdf&#34;&gt;http://www.informatik.fernuni-hagen.de/ti2/papers/apc2001-nal.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref10&#39;&gt;&lt;/a&gt;[10] Electronic Frontier Foundation, &lt;em&gt;Cracking DES&lt;/em&gt;. O&amp;rsquo;Reilly, 1998.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref12&#39;&gt;&lt;/a&gt;[12] P. Leong, M. Leong, O. Cheung, T. Tung, C. Kwok, M. Wong, and K. Lee, &amp;ldquo;Pilchard - a reconfigurable computing platform with memory slot interface,&amp;rdquo; in Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2001. [Online]. Available: &lt;a href=&#34;http://www.cse.cuhk.edu.hk/~phwl/papers/pilchard_fccm01.pdf&#34;&gt;http://www.cse.cuhk.edu.hk/~phwl/papers/pilchard_fccm01.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref15&#39;&gt;&lt;/a&gt;[15] M. J. Wiener, &amp;ldquo;Efficient DES key search,&amp;rdquo; in Practical Cryptography for Data Internetworks, W. Stallings, Ed. IEEE Computer Society Press, 1996, pp. 31 79.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref16&#39;&gt;&lt;/a&gt;[16] I. Goldberg and D. Wagner, &amp;ldquo;Architectural considerations for cryptanalytic hardware,&amp;rdquo; CS252 Report, 1996. [Online]. Available: &lt;a href=&#34;http://www.cs.berkeley.edu/~iang/isaac/hardware/paper.ps&#34;&gt;http://www.cs.berkeley.edu/~iang/isaac/hardware/paper.ps&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref18&#39;&gt;&lt;/a&gt;[18] J.-P. Kaps and C. Paar, &amp;ldquo;Fast DES implementation for FPGAs and its application to a universal key-search machine,&amp;rdquo; in &lt;em&gt;Selected Areas in Cryptography&lt;/em&gt;, 1998, pp. 234 247.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref19&#39;&gt;&lt;/a&gt;[19] I. Hamer and P. Chow, &amp;ldquo;DES cracking on the Transmogrifier 2a,&amp;rdquo; in &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, ser. Cryptographic Hardware and Embedded Systems. Springer-Verlag, 1999, no. 1717, pp. 13 24. [Online]. Available: &lt;a href=&#34;http://www.eecg.toronto.edu/~pc/research/publications/des.ches99.ps.gz&#34;&gt;http://www.eecg.toronto.edu/~pc/research/publications/des.ches99.ps.gz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref21&#39;&gt;&lt;/a&gt;[21] K. L. K.H. Tsoi and P. Leong, &amp;ldquo;A massively parallel RC4 key search engine,&amp;rdquo; in &lt;em&gt;Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)&lt;/em&gt;, 2002, pp. 13 21. [Online]. Available: &lt;a href=&#34;http://www.cse.cuhk.edu.hk/~phwl/papers/vrvw_fccm02.pdf&#34;&gt;http://www.cse.cuhk.edu.hk/~phwl/papers/vrvw_fccm02.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref23&#39;&gt;&lt;/a&gt;[23] M. Blaze. (1997, June) A better DES challenge. [Online]. Available: &lt;a href=&#34;http://www.privacy.nb.ca/cryptography/archives/cryptography/html/1997-0%6/0127.html&#34;&gt;http://www.privacy.nb.ca/cryptography/archives/cryptography/html/1997-0%6/0127.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref26&#39;&gt;&lt;/a&gt;[26] (2003, October) distributed.net: Node Zero. [Online]. Available: &lt;a href=&#34;http://www.distributed.net/&#34;&gt;http://www.distributed.net/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref29&#39;&gt;&lt;/a&gt;[29] The RSA Laboratories Secret-Key Challenge. RSA Security. [Online]. Available: &lt;a href=&#34;http://www.rsasecurity.com/rsalabs/challenges/secretkey/index.html&#34;&gt;http://www.rsasecurity.com/rsalabs/challenges/secretkey/index.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref32&#39;&gt;&lt;/a&gt;[32] A. Elbirt, W. Yip, B. Chetwynd, and C. Paar, &amp;ldquo;An FPGA-based performance evaluation of the AES block cipher candidate algorithm finalists,* in &lt;em&gt;IEEE Transactions on VLSI Systems&lt;/em&gt;, ser. IEEE Transactions on VLSI Systems, August 2001, vol. 9, no. 4.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref33&#39;&gt;&lt;/a&gt;[33] J. Leonard and W. H. Mangione-Smith, &amp;ldquo;A case study of partially evaluated hardware circuits: Key-specific DES,&amp;rdquo; in &lt;em&gt;Field-Programmable Logic and Applications. 7th International Workshop&lt;/em&gt;, W. Luk, P. Y. K. Cheung, and M. Glesner, Eds., vol. 1304. London, U.K.: Springer-Verlag, 1997, pp. 151 160. [Online]. Available: &lt;a href=&#34;http://citeseer.nj.nec.com/leonard97case.html&#34;&gt;http://citeseer.nj.nec.com/leonard97case.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref34&#39;&gt;&lt;/a&gt;[34] D. Wagner and S. M. Bellovin, &amp;ldquo;A programmable plaintext recognizer,&amp;rdquo; 1994. [Online]. Available: &lt;a href=&#34;ftp://ftp.research.att.com/dist/smb/recog.ps&#34;&gt;ftp://ftp.research.att.com/dist/smb/recog.ps&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref35&#39;&gt;&lt;/a&gt;[35] C. Eilbeck. My crypto page. [Online]. Available: &lt;a href=&#34;http://www.yordas.demon.co.uk/crypto/&#34;&gt;http://www.yordas.demon.co.uk/crypto/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref36&#39;&gt;&lt;/a&gt;[36] Xilinx, Inc. SRL16 16-bit shift register look-up-table (LUT). [Online]. Available: &lt;a href=&#34;http://toolbox.xilinx.com/docsan/xilinx5/data/docs/lib/lib0393_377.html&#34;&gt;http://toolbox.xilinx.com/docsan/xilinx5/data/docs/lib/lib0393_377.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref37&#39;&gt;&lt;/a&gt;[37] E. Soha. (1998, May) RC5 on FPGAs. No longer available from original source. [Online]. Available: &lt;a href=&#34;http://web.archive.org/web/19981205053422/http://www-inst.eecs.berkeley%.edu/~barrel/rc5.html&#34;&gt;http://web.archive.org/web/19981205053422/http://www-inst.eecs.berkeley%.edu/~barrel/rc5.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref40&#39;&gt;&lt;/a&gt;[40] Xilinx, Inc., &amp;ldquo;XC4000E and XC4000X Series Field Programmable Gate Arrays,&amp;rdquo; May 1999. [Online]. Available: &lt;a href=&#34;http://www.xilinx.com/bvdocs/publications/4000.pdf&#34;&gt;http://www.xilinx.com/bvdocs/publications/4000.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref41&#39;&gt;&lt;/a&gt;[41] Xilinx Inc., &amp;ldquo;Virtex-E 1.8V Field Programmable Gate Arrays,&amp;rdquo; July 2002. [Online]. Available: &lt;a href=&#34;http://direct.xilinx.com/bvdocs/publications/ds022.pdf&#34;&gt;http://direct.xilinx.com/bvdocs/publications/ds022.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref42&#39;&gt;&lt;/a&gt;[42] distributed.net. (2003, October) distributed.net: Client Speed Comparisons. [Online]. Available: &lt;a href=&#34;http://n0cgi.distributed.net/speed/&#34;&gt;http://n0cgi.distributed.net/speed/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref43&#39;&gt;&lt;/a&gt;[43] (1997, May) SolNET DES Challenge Attack: Download Page. [Online]. Available: &lt;a href=&#34;http://www.des.sollentuna.se/download.html&#34;&gt;http://www.des.sollentuna.se/download.html&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Background</title>
      <link>https://ianhowson.com/fpga/background/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/background/</guid>
      <description>

&lt;h2 id=&#34;theory&#34;&gt;Theory&lt;/h2&gt;

&lt;p&gt;The theory in this section is only covered briefly. The reader is
encouraged to refer to &lt;a href=&#34;#ref1&#34;&gt;Bruce Schneier&amp;rsquo;s &lt;em&gt;Applied Cryptography&lt;/em&gt;&lt;/a&gt; for more details.&lt;/p&gt;

&lt;h3 id=&#34;symmetric-ciphers&#34;&gt;Symmetric ciphers&lt;/h3&gt;

&lt;p&gt;A symmetric cipher is characterised by the functions&lt;/p&gt;

&lt;p&gt;$$ciphertext=E(plaintext,key)$$&lt;/p&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;p&gt;$$plaintext=E^{-1}(ciphertext,key)$$&lt;/p&gt;

&lt;p&gt;The intent of a cipher is that the function E be non-invertible without
the key. This ensures that the plaintext remains secret to people
without the key.&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;block cipher&lt;/em&gt; is one where data is processed in discrete blocks.
The input plaintext or ciphertext is broken up into blocks of the
appropriate size. An example is DES, which processes data in 64 bit
blocks. A &lt;em&gt;stream cipher&lt;/em&gt; is one which works with much smaller
units of data &amp;ndash; often a single bit at a time. A5/1 is
a common stream cipher. Stream ciphers are used to generate a &lt;em&gt;key stream&lt;/em&gt;, which is then XORd with the plaintext to produce the ciphertext.
XORing the ciphertext with the key stream again will decrypt the data.&lt;/p&gt;

&lt;h3 id=&#34;attacks-and-security&#34;&gt;Attacks and security&lt;/h3&gt;

&lt;p&gt;In a &lt;em&gt;known plaintext attack&lt;/em&gt;, the attacker possesses some ciphertext
and the matching plaintext. The goal is to find the key. This is the
attack method usually used in research; possessing or being able to
infer part of the plaintext is a reasonably safe assumption. E-mail
headers and IP packets always begin in the same way, for example.&lt;/p&gt;

&lt;p&gt;In a &lt;em&gt;ciphertext only attack&lt;/em&gt;, the attacker possesses some ciphertext.
These attacks are more difficult to perform. Usually, the attacker
relies on some properties of the plaintext to determine when they
are successful (such as character distributions or language statistics).&lt;/p&gt;

&lt;p&gt;There are two criteria for a symmetric cipher to be considered secure &lt;a href=&#34;#ref2&#34;&gt;[2]&lt;/a&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;There must be no shortcuts to attack a cipher; exhaustive key search
must be the most feasible attack&lt;/li&gt;
&lt;li&gt;The number of possible keys is large enough to make an exhaustive
key search attack infeasible&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&#34;key-search-attacks&#34;&gt;Key search attacks&lt;/h2&gt;

&lt;p&gt;In a key search attack, the attacker tries every possible key as input
to the cipher. The known piece of ciphertext is also used as an input.
In a known plaintext attack, the trial plaintext from the cipher output
is compared to the known plaintext. In a ciphertext only attack, heuristics
are used to determine if the output is valid plaintext.&lt;/p&gt;

&lt;h3 id=&#34;normal-cipher-usage&#34;&gt;Normal cipher usage&lt;/h3&gt;

&lt;p&gt;Most ciphers consist of a key setup phase and an operation phase.
During key setup, the internal state is initialised. During operation,
input ciphertext or plaintext is encrypted or decrypted. Key setup
only needs to be conducted once for each key that is used.&lt;/p&gt;

&lt;p&gt;When a cipher is used in practice, the key is usually kept constant
for a long period while the plaintext or ciphertext input is varied
frequently. Key setup is performed only once, and the cipher is designed
to handle the rapid change in input.&lt;/p&gt;

&lt;p&gt;Exhaustive key search reverses this by keeping the input constant
while changing the key frequently. The main implication from this
is that key setup must be performed very frequently. Many ciphers
exploit this to improve resistance to key search attacks by having
a very long key setup period. The key setup period is often comprised
of the encryption algorithm itself. This greatly increases the time
and resources needed to conduct a successful exhaustive key search.&lt;/p&gt;

&lt;p&gt;Commercial chips that perform encryption or decryption may not be
suitable for use in a key search machine if they are not designed
to have the key changed frequently. Conversely, it may be possible
to optimise a custom key search design by precomputing (partially
evaluating) parts of the algorithm, since the ciphertext is known
in advance. This technique has already been used to produce very fast
and efficient cipher implementations by including the key in the design
itself.&lt;/p&gt;

&lt;h3 id=&#34;block-ciphers&#34;&gt;Block ciphers&lt;/h3&gt;

&lt;p&gt;When attacking a block cipher, one output block is usually tested
for each key. If the output block matches the known plaintext, tests
with more blocks are conducted to verify that the key is correct.
The further checking step is important. There may be several keys
that give the same plaintext output if the key size is longer than
the block size and only a single block is checked.&lt;/p&gt;

&lt;h3 id=&#34;a-name-stream-ciphers-a-stream-ciphers&#34;&gt;&lt;a name=&#39;stream-ciphers&#39;&gt;&lt;/a&gt;Stream ciphers&lt;/h3&gt;

&lt;p&gt;Stream ciphers are often faster to conduct brute-force attacks against
because incorrect keys can be quickly eliminated. A simple approach
would be to generate a quantity of the key stream and XOR that with
the ciphertext to generate the plaintext. The stream cipher can then
be treated in exactly the same way as a block cipher. Efficiency can
be slightly improved by ignoring the XOR stage and simply searching
for the correct key stream. The amount of key stream to be generated
must balance out the number of false alarms with the amount of time
taken to check each key. Generating more of the key stream will cut
down on false alarms, but take more time.&lt;/p&gt;

&lt;p&gt;The main problem with this approach is that it is very inefficient.
The entire block of key stream must be generated before it is checked
for correctness. The first few bits to be generated may be enough
to determine that a key is incorrect. A more efficient algorithm would
then be:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a single unit of the key stream (the smallest amount possible).&lt;/li&gt;
&lt;li&gt;Check whether this unit matches the first unit of the desired key
stream.&lt;/li&gt;
&lt;li&gt;If it matches, continue checking with the next unit of the same key.
If it doesn&amp;rsquo;t, start again with the next key.&lt;/li&gt;
&lt;li&gt;If a sufficient number of units match, return the key as a potentially
correct key.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With this algorithm, an average of two units of key stream need to
be generated for each trial key. This is far more efficient than the
simple algorithm, which may need to generate a large amount of key
stream to avoid returning an excessive number of potential keys.&lt;/p&gt;

&lt;p&gt;As with a block cipher, generating the first unit of key stream may
require a lengthy key setup phase be carried out. Many key search
attacks avoid this by searching for the initial state of the cipher
after the key setup has been completed. This is not always feasible;
some stream ciphers such as RC4 have very large internal states.&lt;/p&gt;

&lt;h2 id=&#34;common-ciphers&#34;&gt;Common ciphers&lt;/h2&gt;

&lt;p&gt;Most ciphers make use of a number of common operations. These operations
typically retain entropy to ensure random-looking output. They may
also introduce nonlinearities in the output. Ciphers generally operations
from a number of algebraic groups to improve their strength.&lt;/p&gt;

&lt;table class=&#39;ui attached celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Key length&lt;/th&gt;&lt;th&gt;Type&lt;/th&gt;&lt;th&gt;Operations&lt;/th&gt;&lt;th&gt;Ref&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;

  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;DES&lt;td&gt;56&lt;td&gt;Block&lt;td&gt;Bit permute, rotate, XOR, table lookup (6&amp;times;4)&lt;td&gt;&lt;a href=&#39;#ref3&#39;&gt;[3]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
       &lt;td&gt;RC4&lt;td&gt;64&lt;td&gt;Stream&lt;td&gt;Add, table read and write (8&amp;times;8), XOR&lt;td&gt;&lt;a href=&#39;#ref1&#39;&gt;[1]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;Rijndael&lt;td&gt;128/192/256&lt;td&gt;Block&lt;td&gt;Table lookup (8&amp;times;8), rotate, multiply (GF(2&lt;sup&gt;8&lt;/sup&gt;)), XOR&lt;td&gt;&lt;a href=&#39;#ref4&#39;&gt;[4]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;3DES&lt;td&gt;112/168&lt;td&gt;Block&lt;td&gt;Bit permute, rotate, XOR, table lookup (6&amp;times;4)&lt;td&gt;&lt;a href=&#39;#ref1&#39;&gt;[1]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;IDEA&lt;td&gt;128&lt;td&gt;Block&lt;td&gt;XOR, add, multiply (16 bits), rotate&lt;td&gt;&lt;a href=&#39;#ref4&#39;&gt;[4]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;Blowfish&lt;td&gt;32-448&lt;td&gt;Block&lt;td&gt;XOR, add, table read and write (8&amp;times;32)&lt;td&gt;&lt;a href=&#39;#ref5&#39;&gt;[5]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;RC5&lt;td&gt;0-2040&lt;td&gt;Block&lt;td&gt;XOR, add, variable rotate&lt;td&gt;&lt;a href=&#39;#ref6&#39;&gt;[6]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;A5/1&lt;td&gt;64&lt;td&gt;Stream&lt;td&gt;XOR, shift&lt;td&gt;&lt;a href=&#39;#ref7&#39;&gt;[7]&lt;/a&gt;
    &lt;/tr&gt;
    &lt;tr&gt; 
      &lt;td&gt;Skipjack&lt;td&gt;80&lt;td&gt;Block&lt;td&gt;XOR, shift, add, table lookup (8&amp;times;8)&lt;td&gt;&lt;a href=&#39;#ref8&#39;&gt;[8]&lt;/a&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;Cipher operations&lt;/div&gt;

&lt;p&gt;Table sizes are specified by (x&amp;times;y), where x is the
number of bits in the index and y is the number of bits in the output.
For example, 6&amp;times;4 can be modelled by a RAM with 6 address bits
and 4 data bits. Addition is considered to include subtraction as
a trivial extension.&lt;/p&gt;

&lt;p&gt;By testing the relative speed of each operation on each implementation
technology, it should be possible to gain insights into which ciphers
will run quickly on which technology.&lt;/p&gt;

&lt;h3 id=&#34;des&#34;&gt;DES&lt;/h3&gt;

&lt;p&gt;The Data Encryption Standard (DES) has been extensively used and studied
for decades. Several linear and differential attacks against it have
been discovered, but the most effective attack remains exhaustive
key search. It works with 64 bit blocks and was originally designed
for fast hardware implementations. It has been the subject of several
contests.&lt;/p&gt;

&lt;p&gt;DES has enjoyed widespread military, government and commercial use
in the past, notably within the banking and finance sectors. Its key
length is only 56 bits, which is considered far too weak nowadays.
Many attacks on the key length of DES have been performed, some of
which are described below.&lt;/p&gt;

&lt;p&gt;DES remains in use through a variant called Triple DES (or 3DES).
In 3DES, the DES cipher is applied three times with two or three different
keys. This is an effective method of increasing the strength of the
DES cipher, but care must be taken during implementation to ensure
that all of the keys are different. Clayton and Bond successfully
attacked a secure processor (formerly used in ATMs) that utilises
3DES &lt;a href=&#34;#ref24&#34;&gt;[24]&lt;/a&gt;. They exploit protocol flaws to force
the processor to use duplicate keys. They can then perform a key search
attack on the reduced key space to determine a &amp;ldquo;master key&amp;rdquo;.&lt;/p&gt;

&lt;h3 id=&#34;rc5&#34;&gt;RC5&lt;/h3&gt;

&lt;p&gt;RC5 is a simple block cipher designed by Ronald Rivest &lt;a href=&#34;#ref6&#34;&gt;[6]&lt;/a&gt;.
Despite its simple structure, very few attacks better than exhaustive
key search have been discovered. It is fully parameterised, so the
block length, number of rounds and key length can be selected to suit
the application. It is best known as the cipher being attacked by
the RSA Secret Key Challenges &lt;a href=&#34;#ref29&#34;&gt;[29]&lt;/a&gt;. The parameters
for the challenges are selected to be efficient on modern CPUs. RC5
is not widely used for commercial applications and has been patented
by RSA Data Security.&lt;/p&gt;

&lt;h2 id=&#34;implementation-technologies&#34;&gt;Implementation technologies&lt;/h2&gt;

&lt;h3 id=&#34;software-on-general-purpose-cpus&#34;&gt;Software on general-purpose CPUs&lt;/h3&gt;

&lt;p&gt;Software is usually the first platform that a cipher will be implemented
with. General-purpose CPUs are cheap, plentiful, and fast for most
tasks. Many ciphers are designed for an efficient software implementation.&lt;/p&gt;

&lt;h4 id=&#34;parallelism&#34;&gt;Parallelism&lt;/h4&gt;

&lt;p&gt;CPUs are designed to perform serial operations very quickly. Regardless
of the amount of available chip area, the need to operate serially
remains. This has lead to modern CPUs expending a large amount of
area on prediction and caching circuitry. A doubling in chip area
for a CPU will not result in a doubling of performance.&lt;/p&gt;

&lt;p&gt;Exhaustive key search is highly parallelisable; the task can be split
perfectly between any number of processing units. This means that
using multiple specialised processing units instead of a single CPU
may give higher performance.&lt;/p&gt;

&lt;p&gt;Recent CPUs have demonstrated a move towards increasing parallelism.
Multiple execution units, deep pipelines, Symmetric Multiprocessing
(SMP) and techniques such as Intel&amp;rsquo;s HyperThreading all serve to increase
the level of parallelism.&lt;/p&gt;

&lt;h4 id=&#34;bitslicing&#34;&gt;Bitslicing&lt;/h4&gt;

&lt;p&gt;Eli Biham pioneered a technique which later became known as bitslicing
&lt;a href=&#34;#ref9&#34;&gt;[9]&lt;/a&gt;. The paper deals primarily with its application
to the DES cipher, but it is applicable to many other algorithms.
In it, each register of a CPU is viewed as a large number of single-bit
registers. This allows a large number of single-bit operations to
be performed in parallel. For an algorithm such as DES which is composed
largely of single-bit operations, this provides a very large performance
gain.&lt;/p&gt;

&lt;h3 id=&#34;asics&#34;&gt;ASICs&lt;/h3&gt;

&lt;p&gt;An ASIC (Application Specific Integrated Circuit) is a chip that has
been designed for a particular purpose. They are usually very fast
for whatever application they have been designed for, but cannot be
modified after fabrication. Initial fabrication costs are very high,
but can be amortized over many chips. The unit price per chip is usually
quite low. There is a far greater development effort required than
for software, and more than for FPGAs. Gate array designs reduce the
high cost of development significantly, but reduce achievable density.
They work by placing gates over the entire silicon area of a device
during fabrication and linking the gates with metal layers later.&lt;/p&gt;

&lt;p&gt;FPGA designs can be converted to equivalent gate array ASIC designs
at a relatively low cost. This technique was used for the machine
described in &lt;a href=&#34;#ref10&#34;&gt;[10]&lt;/a&gt;. Designs implemented in this
way tend to be faster and cheaper than those on FPGAs, but not as
fast as a dedicated ASIC design. It is an attractive option where
development time and cost are important and the number of FPGAs needed
make the implementation cost prohibitive. Many of the issues affecting
FPGA designs (such as timing) also apply to ASICs. Routing tends to
be much less problematic on an ASIC compared with an FPGA.&lt;/p&gt;

&lt;h3 id=&#34;fpgas&#34;&gt;FPGAs&lt;/h3&gt;

&lt;p&gt;FPGAs (Field Programmable Gate Arrays) combine software and hardware
approaches. They are chips that can have their internal layout reconfigured
at any time. This is usually achieved by loading a &lt;em&gt;bitstream&lt;/em&gt;
from a ROM or controller. An FPGA typically contains a large number
of logic blocks. &amp;ldquo;Programming&amp;rdquo; an FPGA determines how the logic
blocks are wired together. Modern FPGAs may contain tens of thousands
of logic blocks, each of which contains latches, combinational functions
and other logic. High-end FPGAs such as the Virtex II Pro &lt;a href=&#34;#ref11&#34;&gt;[11]&lt;/a&gt;
even integrate CPU cores within the normal FPGA fabric. There is a
move to providing dedicated hardware within FPGAs, such as RAM and
communication controllers.&lt;/p&gt;

&lt;p&gt;FPGA performance approaches that of an ASIC, but their general structure
makes them slower and less efficient. Much less can be done within
the same amount of silicon area. They also generate more heat and
use more power than an equivalent ASIC. FPGAs do not have the high
up-front cost of an ASIC, but cost more per unit. They are ideal for
prototyping and development, since they can be reprogrammed quickly
at no cost.&lt;/p&gt;

&lt;p&gt;Developing a design for an FPGA is generally more time consuming than
writing normal software due to the low-level nature of the design.
There are also far fewer competent FPGA designers than software programmers,
increasing development cost and time.&lt;/p&gt;

&lt;h4 id=&#34;pilchard&#34;&gt;Pilchard&lt;/h4&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/pilchard.jpeg&#34;&gt;
  &lt;p&gt;The Pilchard development board&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Pilchard is a low-cost FPGA development board &lt;a href=&#34;#ref12&#34;&gt;[12]&lt;/a&gt;.
It contains a Xilinx Virtex E device; the device used for this thesis
was an XCV1000E-6HQ240. The Pilchard board and FPGA device used are
shown above.&lt;/p&gt;

&lt;p&gt;Pilchard interfaces to the RAM bus of certain motherboards. From the
perspective of the FPGA designer, it provides a simple synchronous
100MHz or 133MHz bus with no interrupts or DMA. It appears as memory
range to the programmer, so registers within the FPGA can map directly
to variables in the driver software. This combination allows easy
system development, high performance and low FPGA resource requirements.&lt;/p&gt;

&lt;p&gt;A number of external I/O pins are available that can be interfaced
to other hardware. Space is also available on the circuit board for
ROM chips that can load a bitstream into the FPGA device on power
up.&lt;/p&gt;

&lt;h3 id=&#34;combining-technologies&#34;&gt;Combining technologies&lt;/h3&gt;

&lt;p&gt;Software, FPGA and ASIC components can be profitably combined. All
known FPGA and ASIC-based key search machines are controlled by a
general purpose computer.&lt;/p&gt;

&lt;p&gt;One useful idea is to use each technology for the task it excels at.
For example, a machine using all three technologies might use a computer
as a primary controller, FPGAs as lower-level controllers, and ASICs
for each search unit. The computer is useful for human interface and
easy reconfigurability. The FPGAs are useful for their high speed,
I/O capabilities and reconfigurability. ASICs have the advantages
of very high speed and low price in very large quantities. This scheme
is particularly desirable for ciphers where suitable ASICs are available
commercially, reducing fabrication costs.&lt;/p&gt;

&lt;p&gt;Another possibility is to perform hardware-fast operations on ASICs
and FPGAs, and software-fast operations on computers. This is generally
infeasible due to the &amp;ldquo;I/O gap&amp;rdquo; &amp;ndash; latencies between
the ASIC/FPGA and the CPU far outweigh any speed benefit. Pilchard
interfaces with the memory bus of a computer and can thus provide
a very high bandwidth and low latency connection to the CPU. The integrated
CPUs in Virtex II Pro FPGAs also provide similar benefits.&lt;/p&gt;

&lt;p&gt;The integrated CPUs on Virtex II Pro FPGAs provide another option
for ciphers favouring software implementations. Each Virtex II Pro
FPGA contains up to four PowerPC 405 cores &lt;a href=&#34;#ref11&#34;&gt;[11]&lt;/a&gt;.
These CPUs could be used to perform the bulk of the cipher operations
while the surrounding FPGA fabric handles control, communication and
testing of results. A very large number of CPUs could be integrated
into a small space using this technique.&lt;/p&gt;

&lt;h2 id=&#34;previous-work&#34;&gt;Previous work&lt;/h2&gt;

&lt;p&gt;Most previous hardware key search machines have been designed to locate
DES keys. This is because DES is very fast in hardware, widely deployed
and has a dangerously short key length. There are also political issues
involved with DES and its selection as a standard.&lt;/p&gt;

&lt;h3 id=&#34;hardware-key-search-machines&#34;&gt;Hardware key search machines&lt;/h3&gt;

&lt;p&gt;Many hardware key search machines have been produced in the past.
Most of these are not practical machines. They are used to gather
performance estimates with a certain technology or technique. More
key search machines are known to exist; only those with notable features
have been presented in this section.&lt;/p&gt;

&lt;table class=&#39;ui small celled table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th&gt;Name&lt;th&gt;Cipher&lt;th&gt;Year&lt;th&gt;Level&lt;th&gt;Technology&lt;th&gt;Keys/sec/chip&lt;th&gt;Ref.&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;Diffie/Hellman&lt;td&gt;DES&lt;td&gt;1977&lt;td&gt;Theoretical&lt;td&gt;ASIC&lt;td&gt;1M&lt;td&gt;&lt;a href=&#39;#ref13&#39;&gt;[13]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;McLaughlin&lt;td&gt;DES&lt;td&gt;1992&lt;td&gt;Theoretical&lt;td&gt;ASIC&lt;td&gt;2k&lt;td&gt;&lt;a href=&#39;#ref14&#39;&gt;[14]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Wiener&lt;td&gt;DES&lt;td&gt;1993&lt;td&gt;Designed&lt;td&gt;ASIC&lt;td&gt;50M&lt;td&gt;&lt;a href=&#39;#ref15&#39;&gt;[15]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Goldberg/Wagner&lt;td&gt;DES&lt;td&gt;1996&lt;td&gt;Built&lt;td&gt;CPLD&lt;td&gt;0.5M&lt;td&gt;&lt;a href=&#39;#ref16&#39;&gt;[16]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Various&lt;td&gt;DES&lt;td&gt;1996&lt;td&gt;Estimated&lt;td&gt;FPGA&lt;td&gt;30M&lt;td&gt;&lt;a href=&#39;#ref2&#39;&gt;[2]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Various&lt;td&gt;DES&lt;td&gt;1996&lt;td&gt;Estimated&lt;td&gt;ASIC&lt;td&gt;200M&lt;td&gt;&lt;a href=&#39;#ref2&#39;&gt;[2]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Wiener&lt;td&gt;DES&lt;td&gt;1997&lt;td&gt;Estimated&lt;td&gt;ASIC&lt;td&gt;300M&lt;td&gt;&lt;a href=&#39;#ref17&#39;&gt;[17]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Kaps/Paar&lt;td&gt;DES&lt;td&gt;1998&lt;td&gt;Built&lt;td&gt;FPGA&lt;td&gt;6.29M&lt;td&gt;&lt;a href=&#39;#ref18&#39;&gt;[18]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;EFF&lt;td&gt;DES&lt;td&gt;1998&lt;td&gt;Built&lt;td&gt;ASIC&lt;td&gt;60M&lt;td&gt;&lt;a href=&#39;#ref10&#39;&gt;[10]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Hamer/Chow&lt;td&gt;DES&lt;td&gt;1999&lt;td&gt;Built&lt;td&gt;FPGA&lt;td&gt;25M&lt;td&gt;&lt;a href=&#39;#ref19&#39;&gt;[19]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Goldberg/Wagner&lt;td&gt;RC4&lt;td&gt;1996&lt;td&gt;Built&lt;td&gt;CPLD&lt;td&gt;8.4k&lt;td&gt;&lt;a href=&#39;#ref16&#39;&gt;[16]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Kundarewich/Wilton/Hu&lt;td&gt;RC4&lt;td&gt;1999&lt;td&gt;Built&lt;td&gt;CPLD&lt;td&gt;40k&lt;td&gt;&lt;a href=&#39;#ref20&#39;&gt;[20]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Tsoi/Lee/Leong&lt;td&gt;RC4&lt;td&gt;2002&lt;td&gt;Built&lt;td&gt;FPGA&lt;td&gt;6.06M&lt;td&gt;&lt;a href=&#39;#ref21&#39;&gt;[21]&lt;/a&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Goldberg/Wagner&lt;td&gt;A5&lt;td&gt;1996&lt;td&gt;Built&lt;td&gt;CPLD&lt;td&gt;4M&lt;td&gt;&lt;a href=&#39;#ref16&#39;&gt;[16]&lt;/a&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui bottom attached header&#39;&gt;Previous hardware key search machines&lt;/div&gt;

&lt;p&gt;Most of these performance figures have caveats; they may be estimates,
approximations, or based on implementations which were not completely
carried out.&lt;/p&gt;

&lt;h4 id=&#34;performance-estimates&#34;&gt;Performance estimates&lt;/h4&gt;

&lt;p&gt;A number of papers provide theoretical estimates of the cost of breaking
ciphers with a hardware key search engine. &lt;em&gt;Minimal Key Lengths
for Symmetric Ciphers to Provide Adequate Commercial Security&lt;/em&gt; &lt;a href=&#34;#ref2&#34;&gt;[2]&lt;/a&gt;
is a prime example of this. It lacks practical grounding, but is still
contains useful estimates and background. In 1996, it predicts that
a $200 FPGA (AT&amp;amp;T ORCA) can test 30 million DES keys per second,
and that a $10 ASIC can test 200 million DES keys per second. For
$300,000, an FPGA-based machine should be able to crack a DES key
every 19 days, and an ASIC-based machine every three hours.&lt;/p&gt;

&lt;p&gt;Wiener&amp;rsquo;s &lt;em&gt;Efficient DES Key Search&lt;/em&gt; &lt;a href=&#34;#ref15&#34;&gt;[15]&lt;/a&gt;
describes a theoretical hardware DES key search machine based around
a custom ASIC. The machine was designed, but not built. Just about
every detail of the machine was described, including the schematics,
interfaces and physical requirements. Each ASIC in the design is estimated
to be able to check 50 million keys per second. Pipelined search units
and an LFSR key generator are used. In 1993, a machine costing $100,000
is estimated to be able to crack a DES key every 35 hours, on average.
These estimates were updated in 1997 to take newer technology and
further analysis into account &lt;a href=&#34;#ref17&#34;&gt;[17]&lt;/a&gt;. In this paper,
a $100,000 machine should be capable of cracking a DES key in six
hours. The speed estimates given in this paper are the basis of those
presented in &lt;a href=&#34;#ref2&#34;&gt;[2]&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;McLaughlin presented a high-level design for a DES key search machine
&lt;a href=&#34;#ref14&#34;&gt;[14]&lt;/a&gt;. The paper ignores
low-level issues and focuses on the high-level functionality of the
machine. Its main features are the use of a fuzzy comparer and specialist
key generators.&lt;/p&gt;

&lt;p&gt;Diffie and Hellman produced a paper in 1977 that counters objections
to the possibility of a key search machine &lt;a href=&#34;#ref13&#34;&gt;[13]&lt;/a&gt;. Objections
to the reliability, size, speed, power and cost of a key search machine
were countered, and a system architecture based around a million search
chips presented. Despite the (comparatively) primitive technology
available at the time, a key search machine is still believed to be
feasible.&lt;/p&gt;

&lt;h4 id=&#34;a-id-the-eff-des-a-the-eff-des-cracker&#34;&gt;&lt;a id=&#39;The-EFF-DES&#39;&gt;&lt;/a&gt;The EFF DES Cracker&lt;/h4&gt;

&lt;p&gt;In 1998, the Electronic Frontier Foundation (EFF) published a book
&lt;a href=&#34;#ref10&#34;&gt;[10]&lt;/a&gt; describing a large scale DES cracker that
they built. Paul Kocher later elaborated on the book in &lt;a href=&#34;#ref22&#34;&gt;[22]&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The machine was based around a very large number of search units.
Each search unit takes 16 clock cycles to check a DES key. 24 search
units were built into a custom ASIC design that ran at 40MHz. 64 ASICs
were placed on each circuit board and 27 boards constructed. Taking
into account faulty search units, the entire machine was capable of
a search rate of 92.6 billion keys per second, or an average search
time of 4.5 days. The machine was built with a budget of $250,000.&lt;/p&gt;

&lt;p&gt;A flexible plaintext recognition scheme was used that allows selective
matching against certain characters, as well as specialised modes
for the Blaze Challenge &lt;a href=&#34;#ref23&#34;&gt;[23]&lt;/a&gt;. This allowed
the machine to conduct ciphertext-only attacks.&lt;/p&gt;

&lt;p&gt;Kocher further elaborated on the technical problems inherent with
building such a large machine. Power and heat issues were the main
ones dealt with. The ASICs used had to be produced successfully with
a single attempt, leading to a number of design compromises. Had this
requirement been lifted, both the performance and correctness of the
design could have been improved significantly.&lt;/p&gt;

&lt;p&gt;Many of the political issues involved with DES were also covered.
These focused primarily on the disparity between what government officials
report and what the DES cracking machine is capable of.&lt;/p&gt;

&lt;h4 id=&#34;small-scale-key-search-machines&#34;&gt;Small-scale key search machines&lt;/h4&gt;

&lt;p&gt;Many small key search engines have been produced in an attempt to
gauge how processing power has changed with time. These are all based
on reconfigurable hardware (FPGAs or CPLDs).&lt;/p&gt;

&lt;p&gt;Hamer and Chow implemented a DES key search machine on the Transmogrifier
2a, a system containing 32 linked Altera FPGAs &lt;a href=&#34;#ref19&#34;&gt;[19]&lt;/a&gt;.
Their design features a long DES pipeline and an LFSR key generator
design that minimises the need for key schedule logic. Each FPGA ran
at 25MHz, giving an aggregate search rate of 800Mkeys/sec.&lt;/p&gt;

&lt;p&gt;Tsoi, Lee and Leong implemented an RC4 key search machine &lt;a href=&#34;#ref21&#34;&gt;[21]&lt;/a&gt;
using a Pilchard board. Their design used 96 search units running
at 50MHz to achieve a total rate of 6.06Mkeys/sec. Not all of the
FPGA resources were used; the number of search units was limited by
the number of RAM blocks available. The FPGA implementation ran approximately
58 times faster than a software implementation running on a Pentium
4 1500MHz. Kundarewich, Wilton and Hu also implemented an RC4 key
search machine using Altera CPLDs, and obtained a search rate of 40kkeys/sec
&lt;a href=&#34;#ref20&#34;&gt;[20]&lt;/a&gt;. Brief cost and performance comparisons were
carried out.&lt;/p&gt;

&lt;p&gt;Deeper investigation into architectural decisions was made by Kaps
and Paar &lt;a href=&#34;#ref18&#34;&gt;[18]&lt;/a&gt;. They explored the idea of an algorithm
independent key search machine on an FPGA, focusing on DES. Several
architectural options for DES were investigated and implemented on
Xilinx FPGAs. Their key search design would be capable of 6.29Mkeys/sec.&lt;/p&gt;

&lt;p&gt;Goldberg and Wagner performed an analysis of RC4, A5, DES and CDMF
implementations on a CPLD board &lt;a href=&#34;#ref16&#34;&gt;[16]&lt;/a&gt;. The
performance of a variety of CPLDs was compared with their cost, noting
that low end CPLDs generally give the best price/performance ratio.
Comparisons were also made against equivalent software implementations.
The RC4 cipher was found to be faster in software than hardware, the
opposite result to that of &lt;a href=&#34;#ref21&#34;&gt;[21]&lt;/a&gt; and &lt;a href=&#34;#ref20&#34;&gt;[20]&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&#34;specialised-key-search-machines&#34;&gt;Specialised key search machines&lt;/h4&gt;

&lt;p&gt;Clayton and Bond exploited a variety of protocol flaws to successfully
attack a security module that was previously used in ATMs &lt;a href=&#34;#ref24&#34;&gt;[24]&lt;/a&gt;.
They were able to successfully recover 3DES keys from the device with
the assistance of a cheap FPGA board. By implementing a more practical
attack they were able to learn more about the difficulties and benefits
of working within a real environment.&lt;/p&gt;

&lt;p&gt;Pornin and Stern attacked A5/1 using a combination of software and
hardware approaches &lt;a href=&#34;#ref25&#34;&gt;[25]&lt;/a&gt;. Software was used to reduce
the search space of initial states, while hardware was used to conduct
an exhaustive search over this subspace. A board containing four Xilinx
4010E FPGAs was used in conjunction with a 500MHz Alpha workstation.
Each FPGA contains 12 search units, each checking one initial state
every 65 cycles. The FPGAs were clocked at 50MHz, giving a total search
rate of 37 million initial states per second. Using two boards with
one workstation allowed an initial state to be determined in 2.5 days
on average, far faster than exhaustive key search alone. Keller and
Seitz took a more analytical approach by using backtracking to reduce
the search space &lt;a href=&#34;#ref7&#34;&gt;[7]&lt;/a&gt;. Their implementation
was performed on a Xilinx XC4062 FPGA.&lt;/p&gt;

&lt;h3 id=&#34;software-key-search&#34;&gt;Software key search&lt;/h3&gt;

&lt;h4 id=&#34;distributed-computing&#34;&gt;Distributed computing&lt;/h4&gt;

&lt;p&gt;Several organisations have implemented software to conduct distributed
key search attacks against ciphers using network-connected hosts.
distributed.net &lt;a href=&#34;#ref26&#34;&gt;[26]&lt;/a&gt; is the largest and most well-known
of these. Others include DESCHALL &lt;a href=&#34;#ref27&#34;&gt;[27]&lt;/a&gt; and SolNET &lt;a href=&#34;#ref28&#34;&gt;[28]&lt;/a&gt;.
The basic idea is the same: run a piece of software on many hosts
and coordinate their efforts with a central server. The software is
configured to run during idle time on the hosts. Buffering schemes
allow hosts to continue working on their part of the task when not
connected to a network. Regardless of the precise task being performed,
work is usually divided into &amp;ldquo;blocks&amp;rdquo; which take a (relatively)
short period of time to complete. A client connects to the server
to be allocated a number of blocks and does not communicate again
until those blocks are complete.&lt;/p&gt;

&lt;p&gt;These efforts have been quite successful so far. distributed.net has
successfully completed RSA Data Security&amp;rsquo;s RC5-64, RC5-56, DES II-1
challenges &lt;a href=&#34;#ref29&#34;&gt;[29]&lt;/a&gt;, as well as a similar challenge
from CS Communications &amp;amp; Systems &lt;a href=&#34;#ref30&#34;&gt;[30]&lt;/a&gt;. They completed
the DES-III challenge with the help of the EFF DES cracker. DESCHALL
completed the DES-I challenge. A group headed by Germano Caronni and
containing 3500 computers completed the RC5-48 challenge. Ian Goldberg
used 250 computers to complete RC5-40 &lt;a href=&#34;#ref31&#34;&gt;[31]&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p&gt;&lt;a name=&#39;ref1&#39;&gt;&lt;/a&gt;[1] B. Schneier, &lt;em&gt;Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd ed&lt;/em&gt;. John Wiley &amp;amp; Sons, Inc., January 1996.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref2&#39;&gt;&lt;/a&gt;[2] M. Blaze, W. Diffie, R. L. Rivest, B. Schneier, T. Shimomura, E. Thompson, and M. Wiener, &amp;ldquo;Minimal key lengths for symmetric ciphers to provide adequate commercial security,&amp;rdquo; A Report by an Ad Hoc Group of Cryptographers and Computer Scientists, January 1996. [Online]. Available: &lt;a href=&#34;http://www.schneier.com/paper-keylength.pdf&#34;&gt;http://www.schneier.com/paper-keylength.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref4&#39;&gt;&lt;/a&gt;[4] J. J. G. Savard. A cryptographic compendium. [Online]. Available: http: //home.ecn.ab.ca/∼jsavard/crypto/intro.htm&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref5&#39;&gt;&lt;/a&gt;[5] B. Schneier, &amp;ldquo;Description of a new variable-length key, 64-bit block cipher (Blowfish), in &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, no. 809. Springer-Verlag, 1994, pp. 191 204. [Online]. Available: &lt;a href=&#34;http://www.schneier.com/paper-blowfish-fse.html&#34;&gt;http://www.schneier.com/paper-blowfish-fse.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref6&#39;&gt;&lt;/a&gt;[6] R. L. Rivest, &amp;ldquo;The RC5 encryption algorithm,&amp;rdquo; in &lt;em&gt;Practical Cryptography for Data Internetworks&lt;/em&gt;, W. Stallings, Ed. IEEE Computer Society Press, 1996.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref7&#39;&gt;&lt;/a&gt;[7] J. Keller and B. Seitz, &amp;ldquo;A hardware-based attack on the A5/1 stream cipher,&amp;rdquo; in &lt;em&gt;APC 2001&lt;/em&gt;. VDE Verlag, 2001, pp. 155-158. [Online]. Available: &lt;a href=&#34;http://www.informatik.fernuni-hagen.de/ti2/papers/apc2001-final.pdf&#34;&gt;http://www.informatik.fernuni-hagen.de/ti2/papers/apc2001-final.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref8&#39;&gt;&lt;/a&gt;[8] National Security Agency. (1998, May) Skipjack and KEA algorithm specifications. [Online]. Available: &lt;a href=&#34;http://csrc.nist.gov/encryption/skipjack/skipjack.pdf&#34;&gt;http://csrc.nist.gov/encryption/skipjack/skipjack.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref9&#39;&gt;&lt;/a&gt;[9] E. Biham, &amp;ldquo;A fast new DES implementation in software,&amp;rdquo; &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, vol. 1267, pp. 260 ??, 1997. [Online]. Available: &lt;a href=&#34;http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1997/CS/CS08%91.ps.gz&#34;&gt;http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1997/CS/CS08%91.ps.gz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref10&#39;&gt;&lt;/a&gt;[10] Electronic Frontier Foundation, &lt;em&gt;Cracking DES&lt;/em&gt;. O&amp;rsquo;Reilly, 1998.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref11&#39;&gt;&lt;/a&gt;[11] Xilinx, Inc., &amp;ldquo;Virtex-II Pro complete data sheet,&amp;rdquo; September 2003, &lt;a href=&#34;http://direct.xilinx.com/bvdocs/publications/ds083.pdf&#34;&gt;http://direct.xilinx.com/bvdocs/publications/ds083.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref12&#39;&gt;&lt;/a&gt;[12] P. Leong, M. Leong, O. Cheung, T. Tung, C. Kwok, M. Wong, and K. Lee, &amp;ldquo;Pilchard - a reconfigurable computing platform with memory slot interface,&amp;rdquo; in Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2001. [Online]. Available: &lt;a href=&#34;http://www.cse.cuhk.edu.hk/~phwl/papers/pilchard_fccm01.pdf&#34;&gt;http://www.cse.cuhk.edu.hk/~phwl/papers/pilchard_fccm01.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref13&#39;&gt;&lt;/a&gt;[13] W. Diffie and M. E. Hellman, &amp;ldquo;Exhaustive cryptanalysis of the NBS data encryption standard,&amp;rdquo; in &lt;em&gt;Computer&lt;/em&gt;, June 1977, vol. 10, no. 6, pp. 74 84.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref14&#39;&gt;&lt;/a&gt;[14] R. McLaughlin, &amp;ldquo;Yet another machine to break DES,&amp;rdquo; &lt;em&gt;Cryptologia&lt;/em&gt;, vol. 16, no. 2, pp. 136 144, April 1992.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref15&#39;&gt;&lt;/a&gt;[15] M. J. Wiener, &amp;ldquo;Efficient DES key search,&amp;rdquo; in Practical Cryptography for Data Internetworks, W. Stallings, Ed. IEEE Computer Society Press, 1996, pp. 31 79.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref16&#39;&gt;&lt;/a&gt;[16] I. Goldberg and D. Wagner, &amp;ldquo;Architectural considerations for cryptanalytic hardware,&amp;rdquo; CS252 Report, 1996. [Online]. Available: &lt;a href=&#34;http://www.cs.berkeley.edu/~iang/isaac/hardware/paper.ps&#34;&gt;http://www.cs.berkeley.edu/~iang/isaac/hardware/paper.ps&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref17&#39;&gt;&lt;/a&gt;[17] M. J. Wiener, &amp;ldquo;Efficient DES key search - an update,&amp;rdquo; in &lt;em&gt;Cryptobytes&lt;/em&gt;, RSA Laboratories, Ed., 1997, vol. 3, no. 2, pp. 6 8. [Online]. Available: &lt;a href=&#34;ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto3n2.pdf&#34;&gt;ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto3n2.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref18&#39;&gt;&lt;/a&gt;[18] J.-P. Kaps and C. Paar, &amp;ldquo;Fast DES implementation for FPGAs and its application to a universal key-search machine,&amp;rdquo; in &lt;em&gt;Selected Areas in Cryptography&lt;/em&gt;, 1998, pp. 234 247.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref19&#39;&gt;&lt;/a&gt;[19] I. Hamer and P. Chow, &amp;ldquo;DES cracking on the Transmogrifier 2a,&amp;rdquo; in &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, ser. Cryptographic Hardware and Embedded Systems. Springer-Verlag, 1999, no. 1717, pp. 13 24. [Online]. Available: &lt;a href=&#34;http://www.eecg.toronto.edu/~pc/research/publications/des.ches99.ps.gz&#34;&gt;http://www.eecg.toronto.edu/~pc/research/publications/des.ches99.ps.gz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref20&#39;&gt;&lt;/a&gt;[20] P. D. Kundarewich, S. J. Wilton, and A. J. Hu, &amp;ldquo;A cpld-based rc4 cracking system,&amp;rdquo; in &lt;em&gt;Canadian Conference on Electrical and Computer Engineering&lt;/em&gt;, 1999. [Online]. Available: &lt;a href=&#34;http://www.ee.ubc.ca/~stevew/papers/pdf/ccece99.pdf&#34;&gt;http://www.ee.ubc.ca/~stevew/papers/pdf/ccece99.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref21&#39;&gt;&lt;/a&gt;[21] K. L. K.H. Tsoi and P. Leong, &amp;ldquo;A massively parallel RC4 key search engine,&amp;rdquo; in &lt;em&gt;Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)&lt;/em&gt;, 2002, pp. 13 21. [Online]. Available: &lt;a href=&#34;http://www.cse.cuhk.edu.hk/~phwl/papers/vrvw_fccm02.pdf&#34;&gt;http://www.cse.cuhk.edu.hk/~phwl/papers/vrvw_fccm02.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref22&#39;&gt;&lt;/a&gt;[22] P. Kocher, &amp;ldquo;Breaking DES,&amp;rdquo; in &lt;em&gt;Cryptobytes&lt;/em&gt;, RSA Laboratories, Ed., 1999, vol. 4, no. 2, pp. 1 5. [Online]. Available: &lt;a href=&#34;ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto4n2.pdf&#34;&gt;ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto4n2.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref23&#39;&gt;&lt;/a&gt;[23] M. Blaze. (1997, June) A better DES challenge. [Online]. Available: &lt;a href=&#34;http://www.privacy.nb.ca/cryptography/archives/cryptography/html/1997-0%6/0127.html&#34;&gt;http://www.privacy.nb.ca/cryptography/archives/cryptography/html/1997-0%6/0127.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref24&#39;&gt;&lt;/a&gt;[24] R. Clayton and M. Bond. Experience using a low-cost fpga design to crack des keys. [Online]. Available: &lt;a href=&#34;http://www.cl.cam.ac.uk/users/rnc1/descrack/DEScracker.html&#34;&gt;http://www.cl.cam.ac.uk/users/rnc1/descrack/DEScracker.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref25&#39;&gt;&lt;/a&gt;[25] T. Pornin and J. Stern, &amp;ldquo;Software-hardware trade-offs; application to A5/1 cryptanalysis,&amp;rdquo; in &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, ser. CHES 99. Springer-Verlag, 2000, pp. 318 327. [Online]. Available: &lt;a href=&#34;http://www.di.ens.fr/~stern/data/St91.pdf&#34;&gt;http://www.di.ens.fr/~stern/data/St91.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref26&#39;&gt;&lt;/a&gt;[26] (2003, October) distributed.net: Node Zero. [Online]. Available: &lt;a href=&#34;http://www.distributed.net/&#34;&gt;http://www.distributed.net/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref27&#39;&gt;&lt;/a&gt;[27] C. M. Curtin. (1998, June) DESCHALL. [Online]. Available: &lt;a href=&#34;http://www.interhack.net/projects/deschall/&#34;&gt;http://www.interhack.net/projects/deschall/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref29&#39;&gt;&lt;/a&gt;[29] The RSA Laboratories Secret-Key Challenge. RSA Security. [Online]. Available: &lt;a href=&#34;http://www.rsasecurity.com/rsalabs/challenges/secretkey/index.html&#34;&gt;http://www.rsasecurity.com/rsalabs/challenges/secretkey/index.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref30&#39;&gt;&lt;/a&gt;[30] (2003, October) distributed.net: Project CSC. [Online]. Available: &lt;a href=&#34;http://www.distributed.net/csc/&#34;&gt;http://www.distributed.net/csc/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref31&#39;&gt;&lt;/a&gt;[31] (1997, January) 40-bit crypto proves no problem. [Online]. Available: &lt;a href=&#34;http://news.com.com/2100-1017-266268.html?legacy=cnet&#34;&gt;http://news.com.com/2100-1017-266268.html?legacy=cnet&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Analysis</title>
      <link>https://ianhowson.com/fpga/analysis/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/analysis/</guid>
      <description>

&lt;h2 id=&#34;factors-affecting-exhaustive-key-search&#34;&gt;Factors affecting exhaustive key search&lt;/h2&gt;

&lt;p&gt;Many factors affect the time taken to conduct an exhaustive key search
attack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Key length.&lt;/strong&gt; Increasing the key length used by a cipher will
dramatically increase the size of the key space and hence the time
required. Using long keys is by far the most effective countermeasure
to an exhaustive key search attack.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Available resources.&lt;/strong&gt; An attacker with more resources (money,
computational power, people) can sweep the key space more rapidly.&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cipher design.&lt;/strong&gt; The design of a cipher has a strong influence
on how long an exhaustive key search will take. Several factors contribute
to this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Key setup time.&lt;/strong&gt; In normal use, a cipher&amp;rsquo;s key setup time
is usually negligible. When conducting a key search attack, the key
setup phase must be performed for every key. Long key setup times
can frustrate key search attacks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cipher operations.&lt;/strong&gt; Some operations will take longer to perform
than others. Some ciphers are designed to use operations that are
fast on one particular technology and slow on another.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cipher speed.&lt;/strong&gt; On identical implementation technologies, some
ciphers can encrypt or decrypt more quickly than others.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Frequency of register access.&lt;/strong&gt; Pipelined designs can achieve
significant resource savings if registers are accessed infrequently
(discussed in &lt;a href=&#39;#Estimating-FPGA-resource&#39;&gt;Estimating FPGA resource usage for pipelined cipher implementations&lt;/a&gt;.) Iterative
designs often have less constraints on their RAM or register access.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Availability of accurate plaintext and ciphertext pairs.&lt;/strong&gt; Possessing only ciphertext or incomplete ciphertext and plaintext
may reduce the speed and accuracy of the exhaustive key search attack.
Ideally, several perfect pairs of ciphertext and plaintext should
be available. If many pairs are available, time-memory tradeoff attacks
may be more appropriate for some ciphers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;software-factors&#34;&gt;Software factors&lt;/h3&gt;

&lt;p&gt;The most significant factor that affects the time required to conduct
a key search using CPUs is the word length of the cipher. The speed
of the cipher tends to be highest on a CPU whose word length matches
that of the cipher. All processing must be performed in units of this
word length.&lt;/p&gt;

&lt;p&gt;Ciphers that have very small states may be able to operate completely
within CPU registers, which will improve performance. Smaller speed
benefits will arise if the state is small enough to fit inside the
highest-level cache. Very few ciphers have state sizes over a few
thousand bytes.&lt;/p&gt;

&lt;p&gt;Some cipher implementations may exchange storage space for processing
power by precomputing values. One software implementation of A5/1 &lt;a href=&#34;#ref44&#34;&gt;[44]&lt;/a&gt; exploits this by precomputing all possible
values of the individual LFSRs. Instead of performing the normal shift
and XOR operations, output is generated by modifying pointers to the
precomputed values. This requires almost 128MB of memory, but greatly
improves the performance of A5/1 in software.&lt;/p&gt;

&lt;h3 id=&#34;fpga-factors&#34;&gt;FPGA factors&lt;/h3&gt;

&lt;p&gt;The most significant factor affecting the speed of exhaustive key
search on FPGAs is whether the cipher can be implemented as a long
pipeline as opposed to an iterative structure. Pipelined structures
make far more efficient use of the FPGA resources and can achieve
higher clock speeds. Any cipher can be implemented as a long pipeline,
but the required FPGA resources are often prohibitive. Generally,
a long pipeline will check one key every clock cycle.&lt;/p&gt;

&lt;h2 id=&#34;operation-performance&#34;&gt;Operation performance&lt;/h2&gt;

&lt;h3 id=&#34;a-name-fpgasoperationperformance-a-fpgas&#34;&gt;&lt;a name=&#39;FPGAsOperationPerformance&#39;&gt;&lt;/a&gt;FPGAs&lt;/h3&gt;

&lt;p&gt;Operation performance on FPGAs is influenced by the time required
to complete the operation and the resource usage. Many operations
can be completed more quickly by using more resources. Conversely,
using less resources allows a greater level of parallelism, increasing
overall performance. Finding the correct balance is difficult and
may require several attempts at implementation.&lt;/p&gt;

&lt;p&gt;Results giving the time and space requirements of a number of operations
are given in &lt;a href=&#34;#LUTs-required-per&#34;&gt;LUTs and time required per bit for cipher operations&lt;/a&gt;. Not all operations
are covered, and optimisations using RAM or other FPGA features are
ignored. In particular, large data-dependent table lookups will be
more efficiently performed using the RAM blocks contained within most
Xilinx FPGAs. Similarly, multiplications may be better performed using
onboard Virtex II, Virtex II Pro or Spartan 3 multiplier resources.&lt;/p&gt;

&lt;p&gt;Addition and subtraction are dependent on the width of the data being
added, due to the Xilinx carry chain. Variable rotation is achieved
using the scheme in &lt;a href=&#34;#ref39&#34;&gt;[39]&lt;/a&gt;, and assumes the use
of Xilinx slice multiplexers. Data-dependent table lookup does not
assume this because each Xilinx family has different multiplexer resources
available which will affect the structures used for implementation.&lt;/p&gt;

&lt;table class=&#39;ui attached celled table&#39;&gt;
  &lt;thead&gt;&lt;tr&gt;&lt;th&gt;Operation&lt;th&gt;LUTs&lt;th&gt;Time (LUT depth)&lt;/tr&gt;&lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;Bit permute&lt;td&gt;0&lt;td&gt;0&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Fixed rotate or shift&lt;td&gt;0&lt;td&gt;0&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;XOR (up to 4 inputs)&lt;td&gt;1&lt;td&gt;1&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Data-dependant table read or write (1-4 address bits)&lt;td&gt;1&lt;td&gt;1&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Data-dependant table lookup (\(w\) address bits, \(w&gt;4\))&lt;td&gt;\(2^{w-3}-1\)&lt;td&gt;\(w-3\)&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Add or subtract (two inputs)&lt;td&gt;1&lt;td&gt;Depends on data width&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;Variable rotate or shift over \(w\) bits&lt;td&gt;\(\log_{2}w\)&lt;/td&gt;&lt;td&gt;\(\frac{\log_{2}w}{2}\)&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered bottom attached header&#39;&gt;&lt;a name=&#39;LUTs-required-per&#39;&gt;&lt;/a&gt;LUTs and time required per bit for cipher operations&lt;/div&gt;

&lt;h3 id=&#34;cpus&#34;&gt;CPUs&lt;/h3&gt;

&lt;p&gt;In contrast with FPGAs, storage space is not a concern for software
implementations. When the word size of the operation inputs is equal
to or less than that of the CPU, most modern CPUs can complete any
operation very quickly. The operations being performed thus become
less important.&lt;/p&gt;

&lt;p&gt;Word size is the major factor when determining operation speed on
CPUs. If the word size of the operation is greater than that of the
CPU, performance will generally be halved (or slightly worse).&lt;/p&gt;

&lt;p&gt;If the word size of the operation is less than that of the CPU, processing
resources are being wasted. We then need to examine the algorithm
to determine if any parallelism can be applied to make the best use
of the resources that are available. For example, it may be possible
to pack a number of short-word operands into one word and perform
an operation over a number of words simultaneously. Bitslicing &lt;a href=&#34;#ref9&#34;&gt;[9]&lt;/a&gt;
is another approach that is used to accelerate single-bit operations
on CPUs with a wide native word length.&lt;/p&gt;

&lt;h2 id=&#34;a-name-estimating-fpga-resource-a-estimating-fpga-resource-usage-for-pipelined-cipher-implementations&#34;&gt;&lt;a name=&#39;Estimating-FPGA-resource&#39;&gt;&lt;/a&gt;Estimating FPGA resource usage for pipelined cipher implementations&lt;/h2&gt;

&lt;h3 id=&#34;introduction&#34;&gt;Introduction&lt;/h3&gt;

&lt;p&gt;It is useful to determine whether a pipelined cipher implementation
is feasible before beginning work on the implementation itself. This
section describes a method that can be used to estimate the resource
usage of a cipher given details of its algorithm.&lt;/p&gt;

&lt;h3 id=&#34;assumptions&#34;&gt;Assumptions&lt;/h3&gt;

&lt;p&gt;A number of assumptions are made to facilitate this analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FPGAs are comprised of many logic cells (LCs.) A logic cell is comprised
of a four input look-up table (LUT) and a flip-flop (FF). This is
similar to the LC layout used in most Xilinx and Altera FPGAs, and
is shown in &lt;a href=&#34;#Simplified-FPGA-logic&#34;&gt;Simplified FPGA logic cell&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The state size remains constant throughout the key setup and decryption
phases. This is rarely true, but does not significantly affect the
results. The analysis can be extended to handle changing state sizes.&lt;/li&gt;
&lt;li&gt;The key setup and decryption phases are composed of a number of rounds.
Each round performs the same operations. The logic used to perform
each round maps the state from round \(r\) to round \(r+1\).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a name=&#39;Simplified-FPGA-logic&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/logic-cell-layout.svg&#34;&gt;
  &lt;p&gt;Simplified FPGA logic cell&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;h3 id=&#34;factors&#34;&gt;Factors&lt;/h3&gt;

&lt;p&gt;To estimate the quantity of resources required, we consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The size of the state during key setup and decryption&lt;/li&gt;
&lt;li&gt;The operations that are performed during key setup and decryption&lt;/li&gt;
&lt;li&gt;What quantity of the state is modified during each step or round of
key setup and decryption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final result will specify the number of LCs required for a fully
pipelined implementation of the cipher. The mapping from LC to real
FPGA resources is generally trivial. A Virtex, Spartan IIE, Virtex
II or Virtex II Pro slice maps to two LCs. Spartan 3 devices have
fewer effective LCs because half of the LUTs on the device have reduced
functionality and do not support usage as a RAM or a shift register.&lt;/p&gt;

&lt;h3 id=&#34;method&#34;&gt;Method&lt;/h3&gt;

&lt;p&gt;We define a number of variables, shown in the table below.&lt;/p&gt;

&lt;table class=&#39;ui attached celled table&#39;&gt;
  &lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;\(s\)&lt;/td&gt;&lt;td&gt;State size in bits&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;\(n\)&lt;/td&gt;&lt;td&gt;Number of rounds needed to complete a phase&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;\(r\)&lt;/td&gt;&lt;td&gt;Number of LUTs required to perform a round&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;\(m\)&lt;/td&gt;&lt;td&gt;Number of bits of state modified during a round&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;\(c\)&lt;/td&gt;&lt;td&gt;Number of LCs required to perform a phase&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered attached header&#39;&gt;FPGA resource estimation variables&lt;/div&gt;

&lt;p&gt;The state size \(s\) is the number of bits that need to be stored between
rounds. This includes all registers and modifiable lookup tables used
in a phase. Usually this figure can be determined by summing the total
number of bits of storage used by the cipher.&lt;/p&gt;

&lt;p&gt;The number of rounds \(n\) is usually specified by the cipher. Modelling
the cipher in this way forces each round to complete in one clock
cycle, which may result in very long combinational delays for some
ciphers. This will be discussed further below.&lt;/p&gt;

&lt;p&gt;The number of LUTs \(r\) required to perform a round is determined
by examining at the operations performed during that round and the
number of bits that these operations are performed on. These results
are summarised in &lt;a href=&#39;#LUTs-required-per&#39;&gt;LUTs and time required per bit for cipher operations&lt;/a&gt;. Another
strategy would be to implement the round function and use synthesis
results to estimate the resource usage.&lt;/p&gt;

&lt;p&gt;If a state bit is modified during a round, its result can be stored
in the same LC as the LUT that modified it. A dedicated LC is needed
otherwise. This gives us value \(m\).&lt;/p&gt;

&lt;p&gt;The number of LCs needed to complete a cipher phase is thus:&lt;/p&gt;

&lt;p&gt;$$c=n(r+s-m) \text{ (equation 4.1)}$$&lt;/p&gt;

&lt;p&gt;We can then compare the number of LCs in the result to the number
of LCs in a device to determine whether a pipelined implementation
is feasible. Alternatively, we can use the LC estimate to determine
what the smallest device needed is. The LC figure can also be used
as a &amp;ldquo;difficulty rating&amp;rdquo;; ciphers with high LC counts will generally
take more FPGA resources and time to attack.&lt;/p&gt;

&lt;h3 id=&#34;optimisations&#34;&gt;Optimisations&lt;/h3&gt;

&lt;h4 id=&#34;multiplexers&#34;&gt;Multiplexers&lt;/h4&gt;

&lt;p&gt;Each Xilinx slice contains a number of additional multiplexers which
can reduce the number of LCs needed for large table lookups.&lt;/p&gt;

&lt;h4 id=&#34;shift-registers&#34;&gt;Shift registers&lt;/h4&gt;

&lt;p&gt;Xilinx LUTs can also operate as shift registers &lt;a href=&#34;#ref36&#34;&gt;[36]&lt;/a&gt;.
This is very useful when a state bit is not modified for one or more
clock cycles. Instead of chaining FFs together, a single LUT can replace
up to 16 FFs. Analysis of the data dependencies in the cipher algorithm
can provide justification to reduce the effective \(s\) value significantly.
Synthesis tools will sometimes perform this optimisation automatically.
We can then obtain an alternate equation to obtain the number of LCs
required:&lt;/p&gt;

&lt;p&gt;$$c=S+nr\label{eq:LC2} \text{ (equation 4.2)}$$&lt;/p&gt;

&lt;p&gt;where \(S\) is the number of LCs used to store the state over the entire
pipeline. Implementations using shift registers for storage cannot
pack the shift registers into the same LC as the LUT performing the
calculation, since the shift register uses the LUT. All state that
is modified can be latched in the same LC as the LUT that performed
the calculation. We thus do not require variable \(m\).&lt;/p&gt;

&lt;h4 id=&#34;short-pipeline-stages&#34;&gt;Short pipeline stages&lt;/h4&gt;

&lt;p&gt;Pipelined implementations on FPGAs can be sped up by making each pipeline
stage as short as possible. No additional cost is incurred when using
the latches included in LCs, making very deep pipelines at high speeds
a good design approach. Instead of completing an entire round in a
single clock cycle, it is usually more efficient to break up the round
and complete it over more clock cycles (at a higher clock rate). The
design will still check an average of one key per clock cycle at the
higher clock rate.&lt;/p&gt;

&lt;p&gt;Splitting up the pipeline stages requires analysis of the data dependencies
within each round, and would significantly complicate this analysis.&lt;/p&gt;

&lt;p&gt;Using short pipeline stages will increase the number of LUTs that
will need to be used as shift registers, and hence increase overall
resource usage slightly. At worst, doubling the clock rate of a design
will double the number of LCs used only for data storage. Smaller
penalties will be incurred if shift registers are less than half full.&lt;/p&gt;

&lt;p&gt;Some operations (particularly boolean operations like XOR) can often
be collapsed into other operations, particularly if there are spare
LUT input lines. This is highly dependent on the cipher algorithm.&lt;/p&gt;

&lt;h3 id=&#34;rc5-example&#34;&gt;RC5 example&lt;/h3&gt;

&lt;p&gt;Rivest describes the RC5 algorithm in &lt;a href=&#34;#ref6&#34;&gt;[6]&lt;/a&gt;. It consists
of three phases: initialisation, key mixing and decryption. The initialisation
phase is ignored because it can be trivially precomputed and will
not consume FPGA resources when implemented in this way.&lt;/p&gt;

&lt;h4 id=&#34;key-mixing&#34;&gt;Key mixing&lt;/h4&gt;

&lt;p&gt;The key mixing phase of RC5-32/12/9 uses an S array of 26 words and
an L array of 3 words. Each word is 32 bits wide, giving a total state
size (\(s\)) of 928 bits. 78 rounds (\(n\)) are needed. Each round modifies
64 bits of state (\(m\).)&lt;/p&gt;

&lt;p&gt;To determine \(r\), we examine the operations being performed. All
of the table lookups operate in a predictable order, removing the
need for additional multiplexers. Each round contains five adds, a
fixed rotation and a variable rotation, all over 32 bits. One of the
adds (\(A+B\) in the second line) is performed twice, and can be ignored.
This gives an \(r\) value of \(32(4\times1+0+\log_{2}32)=288\) and a
final \(c\) value of \(78(288+928-64)=89856\). This represents a slice
count that can only be achieved in very large FPGAs. The estimated
\(r\) value is close to the value of 256 obtained in the trial implementation.&lt;/p&gt;

&lt;p&gt;Significant resource savings can be achieved by recognising that each
bit in the S array is only accessed once every 26 rounds, and each
bit in the L array is only accessed once every 3 rounds. We can thus
collapse a significant number of the LCs used purely for storage into
shift register LUTs. 7488 LCs are used to store the L array as it
travels through the pipeline; this can be reduced to 2496. 64896 total
FFs are used to store the S array. Two shift registers are needed
to provide the 25 cycle delay on each bit, and each bit is accessed
three times. The total number of LCs needed is thus \(26\times32\times2\times3=4992\).
This demonstrates that the frequency of register access has a significant
bearing on the efficiency of cipher implementations on FPGAs. The
S array stores almost 9 times as much data as the L array, but requires
only twice the resources.&lt;/p&gt;

&lt;p&gt;Using Equation 4.2, we obtain an LC count of 22464
for the key mixing phase of RC5.&lt;/p&gt;

&lt;p&gt;The resource usage can be further optimised by noticing that elements
of the S array are initialised sequentially, and so not all values
need to be stored until 26 rounds have been completed.&lt;/p&gt;

&lt;h4 id=&#34;decryption&#34;&gt;Decryption&lt;/h4&gt;

&lt;p&gt;The decryption phase of RC5 uses the same S array as the key mixing
phase, but does not require the L array. Each round consists of two
half-rounds which are identical except for the source and destination
of the results. We can use this property to double the number of rounds
and halve the number of LCs required per round. It consists of 22
half-rounds, each of which contains a subtraction, a variable rotation
and an XOR. All operations are performed on 32 bit words. Two additional
subtractions are performed after the half-rounds. We thus have \(s=26\times32=832\),
\(n=22\), \(r=32(1+5+1)=192\) (which matches the trial implementation)
and \(m=64\). This gives \(c=22(192+832-64)=21120\).&lt;/p&gt;

&lt;p&gt;Again, significant resource savings can be achieved by using shift
registers for storage. Each word in the S array is only accessed once
and is never written to, so we need only provide storage for the period
between the start of the decryption phase and the point where it is
accessed. At worst, this will be 26 rounds, requiring two shift registers
per bit. There are 26 words storing 32 bits each, giving an LC count
of \(26\times32\times2=1664\). Applying Equation 4.2
again gives \(c=1664+24\times192=6272\). The final subtractions require
an additional 64 LCs, giving a total of 6336 LCs.&lt;/p&gt;

&lt;h4 id=&#34;results&#34;&gt;Results&lt;/h4&gt;

&lt;p&gt;Using the figures determined from Equation 4.1, we
obtain a total LC count of 110976. This translates to 55488 slices &amp;ndash; barely fitting within the largest Virtex II Pro device
that is planned for production (and is not yet even shipping.)&lt;/p&gt;

&lt;p&gt;Using the Xilinx-optimised figures from Equation 4.2 gives
a total LC count of 28800. This converts to a far more practical 14400
slices, within the capacities of many larger devices.&lt;/p&gt;

&lt;p&gt;It should be noted that the figures generated by this analysis technique
tend to be conservative and ignore many potential resource optimisations.
It also ignores issues of pipelining within the cipher rounds which
are difficult to deal with in such a general sense. Both of these
areas can provide significant resource and speed advantages in an
actual implementation.&lt;/p&gt;

&lt;h2 id=&#34;fpga-price-performance-comparison&#34;&gt;FPGA price/performance comparison&lt;/h2&gt;

&lt;p&gt;Pricing data for Virtex E, Spartan IIE, Virtex II and Virtex II Pro
FPGAs in quantities of 25-99 was obtained from Avnet&amp;rsquo;s website &lt;a href=&#34;#ref45&#34;&gt;[45]&lt;/a&gt;,
and is current as of 15 October, 2003. The XC2V40 and XC2V80 device
pricing is for a quantity of 100 or more.&lt;/p&gt;

&lt;p&gt;Pricing data for Spartan 3 devices was obtained from Ernest Peltzer
&lt;a href=&#34;#ref46&#34;&gt;[46]&lt;/a&gt; of Sensory Networks, and are projected prices
for Q1 2004 in quantities of 100 or more. Spartan 3 devices only started
shipping very recently and so pricing data is both difficult to obtain
and very likely to change.&lt;/p&gt;

&lt;p&gt;This analysis assumes that the cipher module is the slowest part of
the total key search machine. It ignores resources that would be dedicated
to the PC interface, but includes those that are required for each
search unit. Interface overheads are ignored because in a large-scale
design each FPGA does not need its own PC interface and controller;
the search bus can be connected directly to FPGA pads.&lt;/p&gt;

&lt;h3 id=&#34;speed-grades-and-packaging&#34;&gt;Speed grades and packaging&lt;/h3&gt;

&lt;p&gt;Each FPGA is available in a variety of speed grades and several packaging
options. In general, the slowest speed grade and the smallest package
gives the best price/performance ratio. Low FPGA speeds can be compensated
for by using more FPGAs, and packaging is not important since only
a few I/O lines are needed. This greatly simplifies the analysis by
allowing a large percentage of the FPGA devices available to be ignored.&lt;/p&gt;

&lt;h3 id=&#34;families&#34;&gt;Families&lt;/h3&gt;

&lt;p&gt;The maximum attainable speed and resource usage is the same within
each FPGA family. This allows performance estimates to be generated
with far less effort. Synthesis estimates for the DES and RC5 search units
are used. These are less accurate than those obtained after place
and route, but remain valid across different capacity FPGAs within
a family. These results are summarised in &lt;a href=&#34;https://ianhowson.com/fpga/fpga-price-performance/&#34;&gt;FPGA price/performance tables&lt;/a&gt;.
Search unit resource costs are estimated in the same way.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;Simplified-FPGA-logic&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/fpga-families.svg&#34;&gt;
  &lt;p&gt;Relative clock speed across FPGA families&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;The maximum clock speed for a given design varies greatly depending
on the FPGA family used. Interestingly, the current &amp;ldquo;budget&amp;rdquo; family
(Spartan 3) achieves the highest clock rates. This can be explained
by their 90nm manufacturing process; the Virtex II and Virtex II Pro
use 150nm and 120nm processes.&lt;/p&gt;

&lt;p&gt;RC5 requires half as many RAM blocks on a Virtex II, Virtex II Pro
or Spartan 3 as it does on a Virtex E or Spartan IIE. This is because
the RC5 implementation needs a 32 bit wide RAM, and the Virtex E and
Spartan IIE RAM is only 16 bits wide.&lt;/p&gt;

&lt;h3 id=&#34;des&#34;&gt;DES&lt;/h3&gt;

&lt;p&gt;&lt;a name=&#39;des-pp&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/des-price-performance.svg&#34;&gt;
  &lt;p&gt;FPGA price/performance for DES&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;des-pp-zoom&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/des-price-performance-zoomed.svg&#34;&gt;
  &lt;p&gt;FPGA price/performance for DES, showing low-end detail&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;The above figures show the price and performance of
each device in each Xilinx FPGA family for the DES cipher. The second shows the same data, but zoomed to show
detail for the low-end FPGAs. &amp;ldquo;Kinks&amp;rdquo; in the graph appear where
moving to the next largest device does not improve performance (since
there are not enough available resources to add another search unit).
Smaller kinks are visible when moving to the next largest package.&lt;/p&gt;

&lt;p&gt;We can see that Spartan 3 FPGAs give by far the best performance for
a given price. This is not surprising; they are positioned as a budget
FPGA and can achieve higher clock rates than the other families being
considered. Again, it should be pointed out that they have only recently
started shipping and pricing will likely be volatile for some time.
The pricing figures used for the analysis were also projected figures
for Q1 2004 and for a larger quantity than the other families.&lt;/p&gt;

&lt;p&gt;Of the mature families, Spartan IIE devices give the best price/performance
ratio. Their performance is quite limited, however. Virtex II Pro
devices can achieve spectacularly high search rates, but at a high
cost per device.&lt;/p&gt;

&lt;p&gt;The figure below shows the price/performance ratio
achieved by each device in the Virtex II Pro family. The device with
the best ratio is the XC2VP20, with the XC2VP30 and XC2VP40 close
behind. In a real system where PCB costs and assembly have to be taken
into account, it may be worthwhile purchasing a smaller number of
faster FPGAs with a worse price/performance ratio. These three devices
use the FG676 package; the jump in price to the XC2VP50 can be explained
by the larger package (FF1152). The XC2VP100 has a far worse ratio
than the others, and a much larger package (FF1696). Like the Spartan
3 devices, it has only recently started shipping and may still have
unstable pricing.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;v2pro-des&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/des-v2pro-pp.svg&#34;&gt;
  &lt;p&gt;Virtex II Pro price/performance ratio by device for DES&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;h3 id=&#34;rc5&#34;&gt;RC5&lt;/h3&gt;

&lt;p&gt;&lt;a name=&#39;rc5-family-pp&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc5-pp.svg&#34;&gt;
  &lt;p&gt;FPGA price/performance for RC5&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;Relative FPGA price and performance for RC5 is shown above.
It is similar to that for DES, but with less gap between the Virtex
E and Virtex II/Virtex II Pro families. Detail near the low end is
also very similar. Relative pricing and performance within the family
remains the same as for DES.&lt;/p&gt;

&lt;h3 id=&#34;rc4&#34;&gt;RC4&lt;/h3&gt;

&lt;p&gt;&lt;a name=&#39;rc4-family-pp&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc4-pp.svg&#34;&gt;
  &lt;p&gt;FPGA price/performance for RC4&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;rc4-family-pp-zoomed&#39;&gt;&lt;/a&gt;
&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc4-pp-zoomed.svg&#34;&gt;
  &lt;p&gt;FPGA price/performance for RC4, showing low-end detail&lt;/p&gt;
&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;RC4&amp;rsquo;s performance is entirely constrained by the number of RAM blocks
available on the FPGA. This gives quite difference price/performance
results. From the figures, we can see that
the Virtex E and Virtex II families are quite similarly placed. Virtex
II Pro devices perform much better when cost is taken into account.
Examining the low-end detail shows that the Spartan IIE family remains competitive for far longer
than the Spartan 3, in contrast with the other results. Again, the
XC2VP20 remains the most cost-effective choice in the Virtex II Pro
family.&lt;/p&gt;

&lt;h2 id=&#34;cpu-price-performance-comparison&#34;&gt;CPU price/performance comparison&lt;/h2&gt;

&lt;p&gt;CPU pricing data was obtained from Sastradi Satria of OnLine Centre
&lt;a href=&#34;#ref47&#34;&gt;[47]&lt;/a&gt;. This is for quantities of 10 and is specified
in AUD without GST. This is useful for comparing CPUs, but makes comparing
price/performance ratios between CPUs and FPGAs difficult. Several
other pricing sources were located but not used due to accuracy or
quantity issues.&lt;/p&gt;

&lt;p&gt;Benchmark results are listed in &lt;a href=&#34;https://ianhowson.com/fpga/cpu-benchmarks/&#34;&gt;CPU benchmark results&lt;/a&gt;,
and should be interpreted taking into account the problems noted in &lt;a href=&#34;https://ianhowson.com/fpga/design/#Software-benchmark-results&#34;&gt;Software benchmark results&lt;/a&gt;. These results were scaled by
the clock speed to obtain performance estimates for each CPU that
is currently being sold. This assumes that performance will scale
linearly with CPU speed, which is generally true for exhaustive key
search.&lt;/p&gt;

&lt;p&gt;When comparing performance between CPUs the SolNET benchmarks were
used because they have been performed on a wider variety of CPUs.
They are not directly applicable to CPU to FPGA comparisons.&lt;/p&gt;

&lt;p&gt;No pricing data was available for mobile Intel CPUs (PIII-M, P4-M,
Centrino). This would be useful when considering a very large-scale
key search machine based on CPUs; these CPUs use much less power and
generate far less heat. PIII-M, Centrino and G3 are particularly interesting
due to their high benchmark results at comparatively low clock rates.&lt;/p&gt;

&lt;p&gt;These comparisons ignore the cost of support hardware, which can be
expected to be several times that of the CPU device itself in some
cases.&lt;/p&gt;

&lt;h3 id=&#34;des-1&#34;&gt;DES&lt;/h3&gt;

&lt;p&gt;The figures below show the price and performance
of each CPU family for the DES cipher. We can see that the CPUs within
a family that achieve the highest search rates are disproportionately
expensive. It is scarcely worth trying to achieve a search rate over
12Mkeys/sec with an Athlon XP or 11Mkeys/sec with a Pentium 4 because
the price increases so steeply.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/des-cpu-families.svg&#34;&gt;
  &lt;p&gt;CPU price/performance by family for DES&lt;/p&gt;
&lt;/div&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/des-cpu-pp.svg&#34;&gt;
  &lt;p&gt;CPU price/performance by device for DES&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The Athlon XP curve contains a number of kinks; these occur because
pricing increases with their performance rating. This performance
rating is not in line with actual performance, however &amp;ndash; Barton core Athlons have a higher performance rating than their clock
speed (and measured performance) would suggest. We can see that the
Duron line appears to fit reasonably well with the Athlon XP line.
Celeron CPUs achieve higher performance than their price would otherwise
suggest.&lt;/p&gt;

&lt;p&gt;Examination of the data shows that the two Duron data points have
a linear price/performance relationship. In most practical systems,
it would thus be best to choose the faster of the two in order to
save on auxiliary costs (support hardware, space, etc.)&lt;/p&gt;

&lt;p&gt;The second figure above shows the price/performance
ratio for each device under consideration. We can see that the slowest
device in each family generally gives the best ratio. The Durons have
exactly the same price/performance ratio. The Athlon XP 2200+ and
Celeron 2200 provide a marginally better ratio than their neighbours.
All Pentium 4 devices are quite expensive for the performance that
they give. The exact ratio needed for a large-scale machine will depend
on the price of the support hardware, but in general any Duron, Celeron
or Athlon XP up to 2600+ will provide a good price/performance ratio.&lt;/p&gt;

&lt;h3 id=&#34;rc5-1&#34;&gt;RC5&lt;/h3&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc5-cpu-families.svg&#34;&gt;
  &lt;p&gt;CPU price/performance by family for RC5&lt;/p&gt;
&lt;/div&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc5-cpu-pp.svg&#34;&gt;
  &lt;p&gt;CPU price/performance by device for RC5&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The top figure shows the price/performance
ratios of each CPU family for RC5. We can see that the Celeron and
Pentium 4 families are far less competitive for RC5; the most expensive
Pentium 4 HT device barely outperforms the cheapest Duron! The Duron
and Athlon XP families remain similarly positioned relative to each
other. The same kinks in the Athlon XP curve are apparent.&lt;/p&gt;

&lt;p&gt;The bottom figure shows the price/performance
ratios of each device for RC5. As with DES, the cheapest device in
each family provides the best ratio (with the minor exceptions noted
for DES). Unlike DES, however, the Celeron family is no longer as
competitive. Key search machine designers would do best to select
the Duron 1600 or a low-end Athlon XP. Both Pentium 4 varieties remain
very expensive for their performance.&lt;/p&gt;

&lt;h2 id=&#34;technology-comparison&#34;&gt;Technology comparison&lt;/h2&gt;

&lt;h3 id=&#34;cpus-and-fpgas&#34;&gt;CPUs and FPGAs&lt;/h3&gt;

&lt;h4 id=&#34;ciphers&#34;&gt;Ciphers&lt;/h4&gt;

&lt;p&gt;The pricing and performance data for CPUs and FPGAs is not directly
comparable. CPU prices are given in AUD for quantities of 10; FPGA
prices are given in USD for quantities of 24-99. The CPU performance
is based on benchmark results, while the FPGA performance is based
on synthesis estimates.&lt;/p&gt;

&lt;p&gt;Nonetheless, we can scale the CPU pricing based on the current exchange
rate, and scale the FPGA performance based on measured performance.
At the time of writing, one Australian dollar is worth 0.700639 U.S.
dollars. The predicted performance for the XCV1000E running DES was
894Mkeys/sec; achieved performance was 500Mkeys/sec. The DES CPU performance
also needs to be scaled up by approximately 2.5 to account for the
low speeds achieved by the SolNET client compared with the distributed.net
client. The predicted FPGA performance for RC5 matched quite closely
with the achieved performance (predictions were 1.0625 times faster.)
Scaling with these figures ignores many factors but will suffice for
this analysis.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/des-cpu-vs-fpga.svg&#34;&gt;
  &lt;p&gt;CPU and FPGA family comparison for DES&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The figure above shows the price/performance
ratios for each CPU and FPGA family. The entire CPU range is compressed
into the left-hand side of the graph; even at the high end, they do
not come anywhere near the search rate of a low-end FPGA. It can be
seen that searching DES on general purpose CPUs is very costly compared
to searching with FPGAs.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/rc5-cpu-vs-fpga.svg&#34;&gt;
  &lt;p&gt;CPU and FPGA family comparison for RC5&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The figure above shows the same comparison
for the RC5 cipher with the less competitive FPGA families removed.
CPUs now perform better than FPGAs at the same price. They still cannot
match the performance offered by high-end FPGAs.&lt;/p&gt;

&lt;p&gt;These two comparisons show that the choice of implementation technology
can greatly affect the time and cost to perform an exhaustive key
search. In a practical key search machine, the technology must be
selected to match the cipher being attacked.&lt;/p&gt;

&lt;h4 id=&#34;in-general&#34;&gt;In general&lt;/h4&gt;

&lt;p&gt;Over time, FPGAs will most likely become more efficient for key search
than CPUs. This is because CPU performance does not scale linearly
with available silicon area; it is limited by bus speeds, interactions
between instructions and limited parallelism. FPGA performance for
key search will scale linearly; if twice as much silicon area is available,
twice as many search units can be implemented. FPGAs will thus become
more important in future cryptanalysis. Already, improved CPU performance
is becoming dependent on increasing parallelism; SIMD techniques and
HyperThreading are examples of this.&lt;/p&gt;

&lt;p&gt;CPUs are, of course, far easier to obtain than FPGAs. Many organisations
already have a large computing infrastructure that could be used to
perform key searches. distributed.net and other software RC5 efforts
have demonstrated the feasibility of this approach. FPGA hardware
is very rare in comparison, especially in the quantities that would
be needed to conduct key searches.&lt;/p&gt;

&lt;p&gt;CPUs need a large amount of support hardware (heatsinks, RAM, chipsets,
multi-voltage power supplies and so on) which drives up the cost of
a CPU-based key search machine. All of this hardware is very cheap
and available. Storage space for it becomes more of a concern. FPGAs
have a clear advantage here; many FPGAs can be mounted on a card that
will fit within a computer case. It may be possible to design a motherboard
for commodity CPUs that has some of these advantages.&lt;/p&gt;

&lt;p&gt;Ultimately, neither CPUs nor FPGAs are very efficient for conducting
key searches compared with ASICs. CPUs are inefficient due to their
support hardware and program-based operation; FPGAs are inefficient
because of their generic hardware structure. ASICs have custom hardware
and a low unit price, but very high initial price.&lt;/p&gt;

&lt;h3 id=&#34;extrapolation-for-asics&#34;&gt;Extrapolation for ASICs&lt;/h3&gt;

&lt;p&gt;In &lt;a href=&#34;#ref48&#34;&gt;[48]&lt;/a&gt;, Craig Ulmer reports that ASIC implementations
can achieve three times the speed of an FPGA implementation, and ten
times the density. This is useful as a general guide, but not in this
analysis since the area required by FPGA dice is not easily obtained.&lt;/p&gt;

&lt;p&gt;Instead, we can estimate ASIC costs based on the gate count required
and infer the cost of a gate array device. During the Map phase of
FPGA compilation, ISE reports the &amp;lsquo;equivalent gate count&amp;rsquo; for an ASIC
implementation of the design. This is based partly on the data contained
within &lt;a href=&#34;#ref49&#34;&gt;[49]&lt;/a&gt;, and can be used to determine an approximate
ASIC cost. The DES implementation on the XCV1000E uses 453,968 gates,
and the RC5 implementation uses 1,353,397 gates. RC5&amp;rsquo;s large gate
count is due to the amount of RAM used, including the additional RAM
blocks used to reduce routing delays on the FPGA implementation. Both
of these figures include interface and controller logic. It would
also be possible to find a tradeoff between die size with final cost.&lt;/p&gt;

&lt;p&gt;According to &lt;a href=&#34;#ref50&#34;&gt;[50]&lt;/a&gt;, a Virtex E design
should be implementable with a CMOS-10HD gate array. This is designed
around a 250nm process, which seems reasonable for a 1:1 speed and
density conversion; the Virtex E family uses a 180nm process. No measures
on the physical size of a CMOS-10HD die were available, but &lt;a href=&#34;#ref51&#34;&gt;[51]&lt;/a&gt;
claims 15k gates/mm&lt;sup&gt;2&lt;/sup&gt; for a CMOS-9HD die. Assuming that the number
of gates per mm&lt;sup&gt;2&lt;/sup&gt; scales linearly with feature size, we get approximately
30k gates/mm&lt;sup&gt;2&lt;/sup&gt;. Assuming a 50% gate utilisation ratio gives
us approximately 900kgates for DES, or 30mm&lt;sup&gt;2&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;MOSIS &lt;a href=&#34;#ref52&#34;&gt;[52]&lt;/a&gt; provides small-quantity ASIC fabrication.
They also provide an online price list &lt;a href=&#34;#ref53&#34;&gt;[53]&lt;/a&gt;. We select
the TSMC 250nm process (CL025) as one that should be suitable; other
250nm processes are approximately the same price. This gives a fabrication
cost of $44,200 for 40 parts. $2,500 more will be required for packaging
&lt;a href=&#34;#ref54&#34;&gt;[54]&lt;/a&gt;. This is a high average price per part,
but not a great deal more than the price of the XCV1000E. No pricing
data was available for larger quantities.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/cpu-asic-fpga-comparison.svg&#34;&gt;
  &lt;p&gt;Comparison of CPUs, FPGAs and ASICs for DES&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Without further pricing data, it is difficult to perform an intelligent
cost comparison involving ASICs. We can, however, use the price of
packaging as a bare minimum cost per device to determine a price point
at which ASICs become a viable option. The figure
shows ASICs, CPUs and FPGAs compared at their best price points for
the DES cipher. We can see that to assemble a machine equivalent in
power to the EFF DES Cracker, FPGAs remain the most cost-effective
choice. ASICs do not become price-effective until the machine performance
reaches almost 400Gkeys/sec &amp;ndash; over four times the performance
of the EFF machine.&lt;/p&gt;

&lt;p&gt;Spartan 3 FPGAs were not included in this comparison due to their
uncertain pricing and performance. In addition, their $/Mkeys/sec
ratio is below that of the ASIC design given, meaning that the two
curves would never converge (as they would if all fabrication options
had been considered.) It will be interesting to update this analysis
when Spartan 3 pricing stabilises.&lt;/p&gt;

&lt;h2 id=&#34;comparison-with-other-des-fpga-results&#34;&gt;Comparison with other DES FPGA results&lt;/h2&gt;

&lt;p&gt;The below figure shows known FPGA DES key
search machines and the performance that was predicted by Blaze et
al. in &lt;a href=&#34;#ref2&#34;&gt;[2]&lt;/a&gt;. Extrapolating their estimates
with Moore&amp;rsquo;s Law gives an estimate of 1120Mkeys/sec for a $200 FPGA
today. Performance estimates for the FPGAs priced around $200 are
also shown.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/fpga-des-machines.svg&#34;&gt;
  &lt;p&gt;Previous FPGA DES key search machines and performance estimates&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;The graph shows that no implementation has matched the performance
predicted in 1996 for FPGA devices, regardless of the price of FPGA
device used. The implementation presented in this thesis moves closer
to predictions (as a percentage of expected performance) but still
falls short. It also uses an FPGA that costs $938 today, well in
excess of the $200 quote given. Other $200 FPGA devices are predicted
to achieve similar performance.&lt;/p&gt;

&lt;p&gt;The XC3S1000 is interesting; it has a very high capacity for its price.
It still falls short of the estimate, but not by much. Its predicted
price is also well under $200. It will be interesting to see if this
price remains accurate in Q1 2004. No larger Spartan 3 devices are
shipping yet, so a device with a price closer to $200 could not be
selected.&lt;/p&gt;

&lt;h2 id=&#34;large-scale-key-search-machines&#34;&gt;Large-scale key search machines&lt;/h2&gt;

&lt;h3 id=&#34;cpus-1&#34;&gt;CPUs&lt;/h3&gt;

&lt;p&gt;CPUs are more suited to RC5 key search than FPGAs. A large-scale machine
to complete the RSA RC5-72 challenge in one year might be considered.
This requires an average search rate of almost 150Tkeys/sec to conduct
a complete sweep of the key space (half a year on average to find
the key). The most cost-effective CPU is the Duron 1600, achieving
5.0Mkeys/sec at a price of $58. To reach the target search rate,
almost 30 million CPUs will be needed, costing over $1.7 billion.
This is before considering extras such as RAM, motherboards, power,
heat removal and storage space.&lt;/p&gt;

&lt;p&gt;At present, distributed.net is achieving a key rate of approximately
120Gkeys/sec &lt;a href=&#34;#ref55&#34;&gt;[55]&lt;/a&gt;. At the current rate, the RC5-72
challenge will probably be solved in 624 years.&lt;/p&gt;

&lt;p&gt;A more feasible machine might attempt to match the performance of
the EFF DES Cracker, which achieved 92.6Gkeys/sec using a large number
of gate array ASICs. The most cost-effective CPU is again the Duron
1600, achieving 21.5Mkeys/sec (distributed.net scale) for $58. Over
4300 CPUs would be needed at a cost of over $250,000. This is not
excessively expensive, but again ignores support hardware and other
extras.&lt;/p&gt;

&lt;h3 id=&#34;fpgas&#34;&gt;FPGAs&lt;/h3&gt;

&lt;p&gt;FPGAs perform extremely well for DES key search. A machine matching
the speed of the EFF DES Cracker could be constructed from XC2S200E
devices ($25, 149Mkeys/sec). Spartan 3 devices were not considered
due to their unstable pricing. 622 devices would be needed, at a total
cost of $15,540. The EFF machine spent $130,000 on materials; it
is not clear how much of this was spent on ASIC fabrication.&lt;/p&gt;

&lt;p&gt;Alternatively, XC2VP20 devices could be used. They are slightly more
expensive at a given search rate, but far fewer devices would be needed.
This would reduce auxiliary costs significantly. To match the performance
of the EFF DES Cracker, a mere 78 devices would be needed. Each device
costs $299, giving a total component cost of $23,322. In contrast,
the EFF machine used 1536 devices spanning many circuit boards and
several physical cabinets.&lt;/p&gt;

&lt;p&gt;At the top end of the FPGA spectrum, XC2VP100 devices could be used.
These are the largest Xilinx devices that are currently shipping.
Only 16 devices would be required. The total device cost would be
$89,264, but the physical space consumed by the machine would be
very small &amp;ndash; less than that of a single board in the EFF
DES Cracker.&lt;/p&gt;

&lt;p&gt;FPGAs are generally more expensive than CPUs when performing RC5 key
searches, and so will not be considered. Using the theoretical pipelined
design may be profitable; a single (expensive) FPGA could search 100&amp;ndash;200Mkeys/sec.&lt;/p&gt;

&lt;h3 id=&#34;accounting-for-hardware-costs&#34;&gt;Accounting for hardware costs&lt;/h3&gt;

&lt;p&gt;The device cost of a large-scale key search machine is not the only
factor affecting a machine&amp;rsquo;s cost. All technologies require circuit
boards, controllers, assembly, testing, power, storage and cooling
considerations to be addressed.&lt;/p&gt;

&lt;h4 id=&#34;asics-and-fpgas&#34;&gt;ASICs and FPGAs&lt;/h4&gt;

&lt;p&gt;ASIC and FPGA implementations can use the estimates provided by Wiener
&lt;a href=&#34;#ref15&#34;&gt;[15]&lt;/a&gt;. A circuit board that can support 120 small
package ICs is reported to cost $300. The devices used by Wiener
are 18mm square. FG676 packages (as used on the XC2VP20) are 27mm
square, fitting approximately 35 devices per board. PQ208 packages
(as used on the XCS200E) are approximately the same size (28mm.) The
microcontrollers used by Wiener are not needed since controllers can
be integrated into the FPGAs. Assuming that only one FG676 or PQ208
can fit into the space occupied by four of Wiener&amp;rsquo;s ASICs allows 35
devices per board.&lt;/p&gt;

&lt;p&gt;From this we can see the value of high-density devices. One XC2VP20
has eight times the performance of an XCS200E. A machine capable of
100Gkeys/sec would be take just over four days on average to find
a key. Taking into account circuit board costs, a machine using XCS200E
devices would span 671 devices, 20 boards and cost $22,641. A similar
machine using XC2VP20 devices would span 84 devices, three boards
and cost $26,016. Factoring in controllers, power supplies and mechanical
concerns according to Wiener&amp;rsquo;s figures gives a total of $33,741 for
the XCS200E machine and $33,816 for the XC2VP20 machine. Power consumption,
heat generation and storage space for the XC2VP20 machine would be
significantly lower at only a very small increase in total price.&lt;/p&gt;

&lt;p&gt;Using the preliminary pricing for XC3S1000 devices gives 89 devices,
three boards and $13,663. Most of the cost in this machine is devoted
to auxiliary hardware, meaning that higher speed machines would cost
less in relation to the key rate achieved.&lt;/p&gt;

&lt;div class=&#39;ui center aligned segment&#39;&gt;
  &lt;img class=&#34;ui centered image&#34; src=&#34;https://ianhowson.com/fpga/cost-vs-speed.svg&#34;&gt;
  &lt;p&gt;Cost of key search machines and their expected search time&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Figure &amp;ldquo;Cost of key search machines&amp;rdquo; infers the cost to find a DES
key in a given amount of time. Wiener&amp;rsquo;s estimates, the EFF DES Cracker
and the estimates proposed by this work for XC2VP20 devices are shown.
We can see that to achieve very short search times a lot of money
must be spent, and vice-versa.&lt;/p&gt;

&lt;h4 id=&#34;cpus-2&#34;&gt;CPUs&lt;/h4&gt;

&lt;p&gt;Commodity computer hardware pricing is needed to infer the remainder
of the hardware costs. Assuming that computers can boot from a network,
each CPU will need a heatsink, motherboard, case, power supply, network
adapter and a small amount of RAM. Using an all-in-one motherboard
and low quality case reduces costs significantly. Each CPU will need
approximately AUD$175 in support hardware.&lt;/p&gt;

&lt;h2 id=&#34;key-lengths&#34;&gt;Key lengths&lt;/h2&gt;

&lt;p&gt;It has been shown time and time again that using a long key is the
best way to protect a cipher from an exhaustive key search attack.
Assuming a cipher with a 90 bit key length (as recommended in &lt;a href=&#34;#ref2&#34;&gt;[22]&lt;/a&gt;)
that can be attacked at the same speed as DES, 132 billion XC2S200E
devices would be needed to cover the key space in a year, at a price
of $3.3 trillion. Even using the best available Spartan 3 device
(XC3S400) will cost $850 billion and need over 35 billion devices.
Of course, attacks become even more infeasible if the key length is
increased only slightly.&lt;/p&gt;

&lt;h3 id=&#34;capabilities&#34;&gt;Capabilities&lt;/h3&gt;

&lt;p&gt;Well-resourced entities can feasibly attack ciphers that use long
key lengths. Assuming the performance of the XC2VP20 machine is maintained
regardless of key length, we can see the cost to obtain a key within
a year in the table below. If an attacker is
prepared to wait for a year, 56 bit ciphers like DES are trivial to
defeat using current technology. Each additional bit added to the
key doubles the cost to break it within one year.&lt;/p&gt;

&lt;table class=&#39;ui celled attached table&#39;&gt;
  &lt;thead&gt;
    &lt;tr&gt;&lt;th&gt;Key length&lt;/th&gt;&lt;th&gt;Cost&lt;/th&gt;&lt;th&gt;Potential attacker&lt;/th&gt;&lt;/tr&gt;
  &lt;/thead&gt;

  &lt;tbody&gt;
    &lt;tr&gt;56&lt;/td&gt;&lt;td&gt;$305&lt;/td&gt;&lt;td&gt;Bored teenager&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;60&lt;/td&gt;&lt;td&gt;$4,900&lt;/td&gt;&lt;td&gt;Employed adult&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;64&lt;/td&gt;&lt;td&gt;$78,000&lt;/td&gt;&lt;td&gt;Business department&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;68&lt;/td&gt;&lt;td&gt;$1.25 million&lt;/td&gt;&lt;td&gt;Large business&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;72&lt;/td&gt;&lt;td&gt;$20 million&lt;/td&gt;&lt;td&gt;Small government&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;76&lt;/td&gt;&lt;td&gt;$320 million&lt;/td&gt;&lt;td&gt;Large government entity&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;80&lt;/td&gt;&lt;td&gt;$5.1 billion&lt;/td&gt;&lt;td&gt;Significant inter-government collaboration&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;92&lt;/td&gt;&lt;td&gt;$21 trillion&lt;/td&gt;&lt;td&gt;Infeasible?&lt;/td&gt;&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&#39;ui centered attached header&#39;&gt;Cost to obtain a key in one year and potential attackers&lt;/div&gt;

&lt;h3 id=&#34;minimum-key-lengths&#34;&gt;Minimum key lengths&lt;/h3&gt;

&lt;p&gt;The minimum key length required for a system depends on who the potential
attackers will be. Assuming that we want messages encrypted today
to be completely undecipherable to all known attackers, a 92 bit minimum
key length seems to be appropriate.&lt;/p&gt;

&lt;p&gt;Future data security must also be considered. Moore&amp;rsquo;s Law is currently
the de facto method of predicting future computing capabilities. Over
an 18 month period, Moore&amp;rsquo;s Law states that transistors per IC (or
computing power) will double. If we apply this to a 20 year period,
messages must be encrypted with 14 bits of additional key length to
remain secure against all known potential attackers. 106 bits appears
to be a suitable minimum key length to keep data secure for the next
20 years.&lt;/p&gt;

&lt;p&gt;Data that needs to be kept secure over a longer period of time (such
as census data) will need an even longer key. To keep data secure
for the next hundred years, a 159 bit key seems appropriate.&lt;/p&gt;

&lt;p&gt;All of these estimates ignore future computing technologies that may
become available or new cryptanalytic techniques which may decrease
the strength of a given cipher. Predicting suitable key lengths can
be likened to telling the future. Given the low additional processing
cost, using a key length of 192 or 256 bits should protect data from
all known potential attackers in the foreseeable future.&lt;/p&gt;

&lt;h3 id=&#34;alternatives&#34;&gt;Alternatives&lt;/h3&gt;

&lt;p&gt;When attacking a well-designed system that uses cryptography, it is
rarely profitable to attack the ciphers themselves. Normally there
are weaker areas of the system, such as software bugs, inadequate
security policies, weak passwords and the people involved in the system.
It will almost certainly be easier to exploit one of these areas (particularly
people) to complete an attack against a system.&lt;/p&gt;

&lt;h2 id=&#34;regulatory-issues&#34;&gt;Regulatory issues&lt;/h2&gt;

&lt;p&gt;One of the strongest reasons that this research is valuable is in
the context of legal restrictions on the use and export of strong
cryptography. With it we can evaluate different ciphers in the face
of key length restrictions, as well as the availability of the technology
required to perform an exhaustive key search attack.&lt;/p&gt;

&lt;h3 id=&#34;cryptography-controls&#34;&gt;Cryptography controls&lt;/h3&gt;

&lt;p&gt;Unless otherwise referenced, information in this section was assimilated
from &lt;a href=&#34;#ref56&#34;&gt;[56]&lt;/a&gt;, &lt;a href=&#34;#ref57&#34;&gt;[57]&lt;/a&gt; and &lt;a href=&#34;#ref58&#34;&gt;[58]&lt;/a&gt;.
They should be consulted for more details.&lt;/p&gt;

&lt;p&gt;Local regulations on cryptography have changed significantly in recent
years. Australia is a party to the Wassenaar Agreement, which restricts
the export of &amp;ldquo;dual-use goods&amp;rdquo;, including encryption. It is vague
in parts (particularly as to what constitutes an &amp;ldquo;export&amp;rdquo;), but
sufficiently restrictive to raise concerns. Australia&amp;rsquo;s regulations
are more restrictive in that any export requires approval from the
Defence Signals Directorate (DSD), which deals with Australia&amp;rsquo;s signals
intelligence and information security &lt;a href=&#34;#ref59&#34;&gt;[59]&lt;/a&gt;. The &lt;em&gt;Defence and Strategic Goods List&lt;/em&gt; &lt;a href=&#34;#ref60&#34;&gt;[60]&lt;/a&gt; describes goods which
may be subject to export controls, including encryption. Many different
products are covered by the legislation, including nuclear, biological,
optical, semiconductor and other technology goods. It is frequently
updated.&lt;/p&gt;

&lt;p&gt;Obtaining export approval usually involves submitting an export application
&lt;a href=&#34;#ref61&#34;&gt;[61]&lt;/a&gt; with the DSD. To date no applications have
been rejected, although some companies have been informed that their
applications will be rejected without having applied. An early assessment
of cryptographic goods can be performed to determine if export approval
will be required for that good &lt;a href=&#34;#ref62&#34;&gt;[62]&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are no restrictions on the of cryptography within Australia
or the importation of cryptography into Australia.&lt;/p&gt;

&lt;p&gt;The United States has comparatively tight controls on cryptographic
exports. Currently, symmetric cryptography using up to 56 bit keys
is able to be exported once it has undergone a one-time review. Export
of any cryptography is not permitted to seven &amp;ldquo;terrorist countries
(also known as &amp;ldquo;Tier 4&amp;rdquo; &lt;a href=&#34;#ref63&#34;&gt;[63]&lt;/a&gt;.) As can be seen in
the results from this thesis, 56 bit symmetric cryptography is not
very resistant to brute force attacks. Someone operating under these
constraints would be advised to select a cipher that is relatively
slow and expensive to attack, such as RC5.&lt;/p&gt;

&lt;p&gt;The legal export status of this thesis and its accompanying CD can
be questioned. The thesis itself is probably safe to export without
approval, since it does not contain any cryptographic algorithms.
The CD can almost certainly not be exported without approval, since
it contains cryptographic source code.&lt;/p&gt;

&lt;h3 id=&#34;computing-controls&#34;&gt;Computing controls&lt;/h3&gt;

&lt;p&gt;Regulatory issues exist for exports of high performance computers
to certain countries. This was most visible when exports of the Playstation
II gaming console to China were denied &lt;a href=&#34;#ref64&#34;&gt;[64]&lt;/a&gt; for fears
that they may enhance China&amp;rsquo;s military capability. The regulations
have recently been updated to increase the allowed performance of
exported devices &lt;a href=&#34;#ref65&#34;&gt;[65]&lt;/a&gt;. Computer exports are controlled
for &amp;ldquo;Tier 3&amp;rdquo; countries, which generally includes any countries
that are not allied with the United States. Exemptions can be obtained
to bypass these controls.&lt;/p&gt;

&lt;p&gt;It is not clear whether FPGA devices are subject to export controls,
but it would almost certainly be easy to force designs to fall under
various performance classifications.&lt;/p&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p&gt;&lt;a name=&#39;ref2&#39;&gt;&lt;/a&gt;[2] M. Blaze, W. Diffie, R. L. Rivest, B. Schneier, T. Shimomura, E. Thompson, and M. Wiener, &amp;ldquo;Minimal key lengths for symmetric ciphers to provide adequate commercial security,&amp;rdquo; A Report by an Ad Hoc Group of Cryptographers and Computer Scientists, January 1996. [Online]. Available: &lt;a href=&#34;http://www.schneier.com/paper-keylength.pdf&#34;&gt;http://www.schneier.com/paper-keylength.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref9&#39;&gt;&lt;/a&gt;[9] E. Biham, &amp;ldquo;A fast new DES implementation in software,&amp;rdquo; &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, vol. 1267, pp. 260 ??, 1997. [Online]. Available: &lt;a href=&#34;http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1997/CS/CS08%91.ps.gz&#34;&gt;http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1997/CS/CS08%91.ps.gz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref36&#39;&gt;&lt;/a&gt;[36] Xilinx, Inc. SRL16 16-bit shift register look-up-table (LUT). [Online]. Available: &lt;a href=&#34;http://toolbox.xilinx.com/docsan/xilinx5/data/docs/lib/lib0393_377.html&#34;&gt;http://toolbox.xilinx.com/docsan/xilinx5/data/docs/lib/lib0393_377.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref39&#39;&gt;&lt;/a&gt;[39] P. Alfke and B. New, &amp;ldquo;Multiplexers and barrel shifters in XC3000/XC3100,&amp;rdquo; Xilinx, Inc., Tech. Rep. [Online]. Available: &lt;a href=&#34;http://direct.xilinx.com/bvdocs/appnotes/xapp026.pdf&#34;&gt;http://direct.xilinx.com/bvdocs/appnotes/xapp026.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref44&#39;&gt;&lt;/a&gt;[44] A. Biryukov, A. Shamir, and D. Wagner, &amp;ldquo;Real time cryptanalysis of A5/1 on a PC,&amp;rdquo; &lt;em&gt;Lecture Notes in Computer Science&lt;/em&gt;, vol. 1978, pp. 1+, 2001.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref45&#39;&gt;&lt;/a&gt;[45] (2003, October) Avnet electronics marketing. [Online]. Available: &lt;a href=&#34;http://em.avnet.com/&#34;&gt;http://em.avnet.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref46&#39;&gt;&lt;/a&gt;[46] E. Peltzer, October 2003, private communication.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref47&#39;&gt;&lt;/a&gt;[47] S. Satria, October 2003, private communication.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref48&#39;&gt;&lt;/a&gt;[48] C. D. Ulmer, &amp;ldquo;Configurable Computing: Practical Use of Field Programmable Gate Arrays,&amp;rdquo; Ph.D. dissertation, School of Electrical and Computer Engineering, Georgia Institute of Technology, January 1999. [Online]. Available: &lt;a href=&#34;http://users.ece.gatech.edu/~grimace/research/reports/qual_report.pdf&#34;&gt;http://users.ece.gatech.edu/~grimace/research/reports/qual_report.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref49&#39;&gt;&lt;/a&gt;[49]  Gate count capacity metrics for FPGAs,  Feb 1997. [Online]. Available: http: //www.xilinx.com/bvdocs/appnotes/xapp059.pdf&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref50&#39;&gt;&lt;/a&gt;[50] NEC Electronics America, Inc. FPGA to ASIC Conversion. [Online]. Available: &lt;a href=&#34;http://www.necelam.com/asic/conversion.cfm&#34;&gt;http://www.necelam.com/asic/conversion.cfm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref51&#39;&gt;&lt;/a&gt;[51] NEC Electronics. NEC: Gate array information. [Online]. Available: &lt;a href=&#34;http://www.necgatearray.com/content.nsf/webpages/gatearrayinfo&#34;&gt;http://www.necgatearray.com/content.nsf/webpages/gatearrayinfo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref52&#39;&gt;&lt;/a&gt;[52] The MOSIS Service. MOSIS Integrated Circuit Fabrication Service. [Online]. Available: &lt;a href=&#34;http://www.mosis.org/&#34;&gt;http://www.mosis.org/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref53&#39;&gt;&lt;/a&gt;[53] The MOSIS Service. Domestic Price List for MOSIS IC Prototyping Service. [Online]. Available: &lt;a href=&#34;http://www.mosis.org/Orders/Prices/price-list-domestic.html#tsmc25_logi%c&#34;&gt;http://www.mosis.org/Orders/Prices/price-list-domestic.html#tsmc25_logi%c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref54&#39;&gt;&lt;/a&gt;[54] The MOSIS Service. MOSIS Domestic Price List for ASAT Plastic Packages. [Online]. Available: &lt;a href=&#34;http://www.mosis.org/products/assembly/plastic/price_domestic_asat.html&#34;&gt;http://www.mosis.org/products/assembly/plastic/price_domestic_asat.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref55&#39;&gt;&lt;/a&gt;[55] distributed.net. RC5-72 Live Stats. [Online]. Available: &lt;a href=&#34;http://www1.distributed.net/~pstadt/rc5-72/&#34;&gt;http://www1.distributed.net/~pstadt/rc5-72/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref56&#39;&gt;&lt;/a&gt;[56] G. Pure and G. Taylor. The Australian cryptography FAQ. [Online]. Available: &lt;a href=&#34;http://www.efa.org.au/Issues/Crypto/cryptfaq.html&#34;&gt;http://www.efa.org.au/Issues/Crypto/cryptfaq.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref57&#39;&gt;&lt;/a&gt;[57] B.-J. Koops. Crypto law survey. [Online]. Available: &lt;a href=&#34;http://rechten.uvt.nl/koops/cryptolaw/cls2.htm&#34;&gt;http://rechten.uvt.nl/koops/cryptolaw/cls2.htm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref58&#39;&gt;&lt;/a&gt;[58] Electronic Frontiers Australia Inc. Crypto politics. [Online]. Available: http: //www.efa.org.au/Issues/Crypto/crypto2.html&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref59&#39;&gt;&lt;/a&gt;[59] Defence Signals Directorate. Defence Signals Directorate. [Online]. Available: &lt;a href=&#34;http://www.dsd.gov.au/&#34;&gt;http://www.dsd.gov.au/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref60&#39;&gt;&lt;/a&gt;[60] Department of Defence. Defence and strategic goods list. [Online]. Available: &lt;a href=&#34;http://www.defence.gov.au/dmo/id/export/DSGL_2003.pdf&#34;&gt;http://www.defence.gov.au/dmo/id/export/DSGL_2003.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref61&#39;&gt;&lt;/a&gt;[61] Department of Defence. Export application. [Online]. Available: &lt;a href=&#34;http://www.defence.gov.au/dmo/id/export/dsec/AC717_Oct_03.pdf&#34;&gt;http://www.defence.gov.au/dmo/id/export/dsec/AC717_Oct_03.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref62&#39;&gt;&lt;/a&gt;[62] Department of Defense. Application for one time review. [Online]. Available: &lt;a href=&#34;http://www.defence.gov.au/dmo/id/export/dsec/Onetime.pdf&#34;&gt;http://www.defence.gov.au/dmo/id/export/dsec/Onetime.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref63&#39;&gt;&lt;/a&gt;[63] U. S. Bureau of Industry and Security. HPC - CTP Chart. [Online]. Available: &lt;a href=&#34;http://www.bxa.doc.gov/HPCs/ctpchart.htm&#34;&gt;http://www.bxa.doc.gov/HPCs/ctpchart.htm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref64&#39;&gt;&lt;/a&gt;[64] D. E. Sanger. Letting the Chips Fall Where They May. [Online]. Available: &lt;a href=&#34;http://www.nytimes.com/library/review/061399china-chips-review.html&#34;&gt;http://www.nytimes.com/library/review/061399china-chips-review.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref65&#39;&gt;&lt;/a&gt;[65] Deputy Press Secretary. President changes export controls on computers. [Online]. Available: &lt;a href=&#34;http://www.whitehouse.gov/news/releases/2002/01/20020102-3.html&#34;&gt;http://www.whitehouse.gov/news/releases/2002/01/20020102-3.html&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Conclusion</title>
      <link>https://ianhowson.com/fpga/conclusion/</link>
      <pubDate>Mon, 20 Oct 2003 00:00:00 +0000</pubDate>
      <author>ian@mutexlabs.com (Ian Howson)</author>
      <guid>https://ianhowson.com/fpga/conclusion/</guid>
      <description>

&lt;p&gt;From the analyses presented in &lt;a href=&#34;https://ianhowson.com/fpga/analysis/&#34;&gt;Chapter 4&lt;/a&gt;, we can see that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Key length, available resources and cipher design are the main three
factors that influence the time taken to conduct an exhaustive key
search attack.&lt;/li&gt;
&lt;li&gt;Different implementation technologies favour different ciphers. DES
key searches are best performed with FPGAs, while RC5 key searches
are best performed with CPUs.&lt;/li&gt;
&lt;li&gt;The resource usage of a pipelined FPGA cipher implementation is dependent
on the frequency of register access, state size, number of rounds
and complexity of the round function.&lt;/li&gt;
&lt;li&gt;Frequency of register access is one of the biggest factors affecting
resource usage for pipelined FPGA cipher implementations.&lt;/li&gt;
&lt;li&gt;If sufficient FPGA resources are available, pipelined cipher implementations
will perform far better than iterative cipher implementations.&lt;/li&gt;
&lt;li&gt;Based on preliminary pricing, Spartan 3 FPGAs (particularly the XC3S400)
have the best price/performance ratio.&lt;/li&gt;
&lt;li&gt;Based on stable pricing, Spartan IIE FPGAs (particularly the XC2S200E)
have the best price/performance ratio, followed closely by the XC2VP20
(which has a much higher density).&lt;/li&gt;
&lt;li&gt;Duron and low-end Athlon XP CPUs provide the best price/performance
ratio.&lt;/li&gt;
&lt;li&gt;The performance estimates in &lt;a href=&#34;#ref2&#34;&gt;[2]&lt;/a&gt; are
most likely to be optimistic. This view is shared by Golberg and Wagner
&lt;a href=&#34;#ref16&#34;&gt;[16]&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The cost of conducting a ciphertext-only attack with FPGAs depends
on the cipher. The additional resource cost is quite small for DES,
but significant for RC5. Ciphertext-only attacks favour large, fast
search units over small slow search units.&lt;/li&gt;
&lt;li&gt;A machine similar to the EFF DES cracker &lt;a href=&#34;#ref10&#34;&gt;[10]&lt;/a&gt;
could be built from FPGAs for approximately $34,000, a fraction of
the price of the original machine.&lt;/li&gt;
&lt;li&gt;CPUs are more cost-effective for RC5 key searches than FPGAs, although
it remains to be seen whether this remains true for a pipelined RC5
implementation.&lt;/li&gt;
&lt;li&gt;Cryptography that is restricted to a 56 bit key length by export controls
provides little protection against a well-funded or patient adversary.&lt;/li&gt;
&lt;li&gt;FPGAs will play a greater part in cryptanalysis in the future.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From these conclusions, we can see that in the right situations FPGAs
are very useful cryptanalytic tools. Their low price and high performance
allows key search attacks to be conducted at very low cost. If physical
space devices is a concern, they can achieve much higher search rates
per device than CPUs, even for ciphers that are designed for CPUs.&lt;/p&gt;

&lt;p&gt;The EFF DES cracker can be reproduced now using FPGAs at a cost of
about $34,000. At a price this low, DES should not be used for anything
remotely secure. Government concessions to allow the export of 56
bit cryptography completely destroy the purpose of using cryptography.&lt;/p&gt;

&lt;p&gt;FPGAs will play an increasing role in future cryptanalysis as the
gap between CPU and FPGA performance for a given price widens.&lt;/p&gt;

&lt;h2 id=&#34;future-work&#34;&gt;Future work&lt;/h2&gt;

&lt;p&gt;Possible extensions to this work include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update the price/performance analyses as time progresses. This would
allow the security of ciphers to be continually tracked and give a
general idea of the rate of improvement in FPGA and CPU technology.&lt;/li&gt;
&lt;li&gt;Analyse more ciphers and determine their resistance to exhaustive
key search using various technologies.&lt;/li&gt;
&lt;li&gt;Examine DSPs and CPLDs as possible low-priced technology alternatives.&lt;/li&gt;
&lt;li&gt;Improve the DES benchmark software. The programs used for benchmarks
were not designed with modern CPUs in mind, and may be able to achieve
very high performance by taking advantage of available features. In
particular, SIMD architectures such as Altivec and SSE2 may prove
useful.&lt;/li&gt;
&lt;li&gt;Implement RC5 as a long pipeline. Estimates show that this may result
in very high search rates. High capacity FPGA devices would be needed
to attempt this.&lt;/li&gt;
&lt;li&gt;Examine different FPGA families. No Altera devices were considered
for this thesis. Actel produces a gate array family called the Axcelerator
&lt;a href=&#34;#ref66&#34;&gt;[66]&lt;/a&gt; which is one-time programmable and is reported to have
very low routing overheads and a low price. Spartan 3 devices should
also be re-examined once better pricing data becomes available.&lt;/li&gt;
&lt;li&gt;Improve the accuracy of the price/performance estimates for ASIC devices.
Different fabrication processes may provide better price/performance
ratios.&lt;/li&gt;
&lt;li&gt;Extend the FPGA resource estimation techniques to include timing data.
With careful analysis, it should be possible to approximate overall
performance given a cipher algorithm.&lt;/li&gt;
&lt;li&gt;Consider heat generation and power usage for FPGAs. One of the problems
encountered with the EFF machine was the high power and cooling requirement
for the machine. FPGA devices are reported to be inefficient in this
regard, which may prove a stumbling point for large-scale key search
machines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p&gt;&lt;a name=&#39;ref2&#39;&gt;&lt;/a&gt;[2] M. Blaze, W. Diffie, R. L. Rivest, B. Schneier, T. Shimomura, E. Thompson, and M. Wiener, &amp;ldquo;Minimal key lengths for symmetric ciphers to provide adequate commercial security,&amp;rdquo; A Report by an Ad Hoc Group of Cryptographers and Computer Scientists, January 1996. [Online]. Available: &lt;a href=&#34;http://www.schneier.com/paper-keylength.pdf&#34;&gt;http://www.schneier.com/paper-keylength.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref10&#39;&gt;&lt;/a&gt;[10] Electronic Frontier Foundation, &lt;em&gt;Cracking DES&lt;/em&gt;. O&amp;rsquo;Reilly, 1998.&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref16&#39;&gt;&lt;/a&gt;[16] I. Goldberg and D. Wagner, &amp;ldquo;Architectural considerations for cryptanalytic hardware,&amp;rdquo; CS252 Report, 1996. [Online]. Available: &lt;a href=&#34;http://www.cs.berkeley.edu/~iang/isaac/hardware/paper.ps&#34;&gt;http://www.cs.berkeley.edu/~iang/isaac/hardware/paper.ps&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a name=&#39;ref66&#39;&gt;&lt;/a&gt;[66] Actel Corporation. Actel: Products &amp;amp; Services: Antifuse Devices: Axcelerator. [Online]. Available: &lt;a href=&#34;http://www.actel.com/products/axcelerator/index.html&#34;&gt;http://www.actel.com/products/axcelerator/index.html&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>

