<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Aman Madaan</title>
    <description></description>
    <link>https://madaan.github.io</link>
    <atom:link href="https://madaan.github.io/feed.xml" rel="self" type="application/rss+xml" />
    
      <item>
        <title>The case for inference-time compute</title>
        <description>&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML&quot;&gt;&lt;/script&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;The next generation of AI systems for reasoning will likely rely on a powerful base language model at its core, with a number of inference time techniques that make it more useful.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id=&quot;next-token-prediction-is-awesome&quot;&gt;Next token prediction is awesome&lt;/h3&gt;
&lt;p&gt;I’m in the &lt;em&gt;next token-prediction is awesome&lt;/em&gt; camp. I think despite whatever the naysayers say, the ability to coherently complete a series of tokens grounded in some context is highly non-trivial, and I think humans do it all the time as well. While chatgpt helped in making debates around this stuff mainstream, it has been going around for a while now. Here is a quote from .&lt;/p&gt;

&lt;h3 id=&quot;next-token-prediction-is-not-enough&quot;&gt;Next token prediction is not enough&lt;/h3&gt;

&lt;p&gt;As awesome as it is, I think the ability to do good next-token prediction is only a necessary condition to build systems that tackle difficult reasoning problems. It is also true that humans don’t just do next-token prediction for all but the most mundane tasks. Even if we focus on tasks that involve a single human generating some textual output, it is clear that the next token prediction exercise goes beyond just generation. For more challenging, creative tasks, we do next-token prediction + &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X&lt;/code&gt;. Here, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X&lt;/code&gt; is a number of things, including:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Generate many candidates:&lt;/strong&gt; We generate a lot of candidates and then pick the best one. Example, when suggesting a movie to watch that fits a certain criteria.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Iterate:&lt;/strong&gt; Generate some output, then think about it, then generate some more output. Example, when writing a blog post.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Go beyond context:&lt;/strong&gt; We don’t just use the context, but use the broader world knowledge to generate the output. Often, this involves learnings from the past (a &lt;em&gt;memory&lt;/em&gt; of mistakes or cultural conditioning). Example, answering a question within our domain of expertise.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Use tools:&lt;/strong&gt; we realize when we reach a point where doing things manually doesn’t make sense. At those points, we invoke a tool, and use the output from the tool to continue with the task. Example, we use spell checkers, calculators, etc.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Rephrase the question:&lt;/strong&gt; Sometimes when we are stuck, we rephrase the question to make it easier to answer. Example, when we are stuck on a math problem, we try to rephrase it in terms of a problem we know how to solve.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Solve the easy bits first:&lt;/strong&gt; We solve the easy bits first, and then focus on the hard bits. Example, solving a programming puzzle is often best done by coming up with a brute force solution first, and then optimizing it.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The examples above make a couple of things pretty clear:&lt;/p&gt;

&lt;p&gt;A. Next-token prediction is not the only thing we do.
B  Next-token prediction machinery &lt;em&gt;better&lt;/em&gt; be good, because it is the base on which all of the above is built. If the base is not good, then the rest of the stuff is not going to be good either.&lt;/p&gt;

&lt;p&gt;There are now LLM techniques for realizing all of the above techniques, and are more commonly called search, memory, retrieval and alignment (going beyond context), tool-usage, prompting, and question decomposition. I think these techniques are going to be the next frontier of research in language models, and will be the key to building systems that can do reasoning. Together, these techniques require “throwing more compute” at the problem at inference time, using a good base model.&lt;/p&gt;

&lt;h3 id=&quot;distribution-transformation-perspective&quot;&gt;Distribution Transformation Perspective&lt;/h3&gt;

&lt;p&gt;My favorite way of looking at this is in terms of a distribution transformation. The base LM gives you a strong but simple distribution over the text on the internet. This is powerful, because you already have a great prior on to complete the text for a wide range of contexts. Let’s say this is the distribution \(P_{lm}(x)\). Now we &lt;em&gt;can&lt;/em&gt; draw samples from this distribution, but it is likely not going to be very useful, as anyone who has played with models that have not been instruction-tuned will tell you.
It is more interesting to parameterize a different distribution \(P_{lm + X}(x)\), where \(X\) is some transformation that we apply to the base distribution.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;Acknowledgements:&lt;/strong&gt; Thanks to &lt;a href=&quot;https://omarkhattab.com/&quot;&gt;Omar Khattab&lt;/a&gt; for the helpful brainstorming meetings. We were planning to write a paper on this back in December 2023 (after meeting at Neurips), but couldn’t find the time. I believe &lt;a href=&quot;https://github.com/stanfordnlp/dspy&quot;&gt;DSPy&lt;/a&gt; represents one approach to creating the transformations discussed in this post.&lt;/p&gt;

&lt;!-- 
### How do you create such transforms?


Prompting and p
- DSPy
- Langchain
- 

### Isn&apos;t this just neuro-symbolic?

- Sure, but it&apos;s not just that. First, the definition of what exactly is I&apos;ve always been confused 



---


* Some real tension we are resolving.
* Go against: thinking about distributions. 
* You&apos;ll never fully get there. 

---
- Autoregressive vs. non-symbolic camp
- Autoregressive vs. planning camp
- 

* First camp we are against: the base llm will not be enough.

* Second camp: RAG is not enough.

* Third camp: Agents are unreliable.

- Good names are important.
  
- Hallucination.

- Optimization camp: could you parameterize these steps, can you actually optimize objectives?

- *Surface-level Gradient descent*: can you do gradient descent on these objectives?


- Agents

- Compilers.

- The same program can be compiled into a finetuned or a simple prompt or p-tuning. 

- A huge step over the standard approach is to think of LLMs as a compilers. Then can you shift oart of the compiler outside of the model.

- RASPy --&gt;
</description>
        <pubDate>Fri, 22 Dec 2023 00:00:00 +0000</pubDate>
        <link>https://madaan.github.io/inference-compute/</link>
        <guid isPermaLink="true">https://madaan.github.io/inference-compute/</guid>
      </item>
    
      <item>
        <title>LLMs Are Stochastic Compilers</title>
        <description>&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML&quot;&gt;&lt;/script&gt;

&lt;hr /&gt;

&lt;h4 id=&quot;or-how-to-think-about-prompting-with-an-imprecise-but-hopefully-helpful-analogy&quot;&gt;&lt;em&gt;Or How to Think About Prompting with an Imprecise but Hopefully Helpful Analogy&lt;/em&gt;&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;Originally presented as a tutorial at the &lt;a href=&quot;https://www.cmu-lti-llm.org/&quot;&gt;CMU-LTI Seminar&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;center&gt;
&lt;img src=&quot;https://raw.githubusercontent.com/madaan/madaan.github.io/master/images/llm_compiler/header.jpg&quot; alt=&quot;LLM Compiler&quot; width=&quot;300&quot; height=&quot;300&quot; /&gt;
&lt;/center&gt;

&lt;h4 class=&quot;no_toc&quot; id=&quot;tldr-large-language-models-can-be-thought-of-as-compilers-with-prompts-being-programs-written-in-a-high-level-language-like-programs-prompts-can-be-written-in-different-styles-such-as-by-specifying-instructions-or-a-few-examples-or-using-alternative-formats-like-code-language-models-are-stochastic-and-programs-ie-prompts-may-require-trial-and-error-to-produce-the-desired-output&quot;&gt;TLDR: Large Language models can be thought of as compilers, with prompts being programs written in a high-level language. Like programs, prompts can be written in different styles, such as by specifying instructions or a few examples or using alternative formats like code. Language models are stochastic, and programs (i.e., prompts) may require trial and error to produce the desired output.&lt;/h4&gt;

&lt;hr /&gt;

&lt;h6 class=&quot;no_toc&quot; id=&quot;contents&quot;&gt;Contents&lt;/h6&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#or-how-to-think-about-prompting-with-an-imprecise-but-hopefully-helpful-analogy&quot; id=&quot;markdown-toc-or-how-to-think-about-prompting-with-an-imprecise-but-hopefully-helpful-analogy&quot;&gt;&lt;em&gt;Or How to Think About Prompting with an Imprecise but Hopefully Helpful Analogy&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#the-evolution-of-abstractions-in-programming&quot; id=&quot;markdown-toc-the-evolution-of-abstractions-in-programming&quot;&gt;The Evolution of Abstractions in Programming&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#common-thread&quot; id=&quot;markdown-toc-common-thread&quot;&gt;Common thread&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#you-are-not-teaching-the-task-to-the-processoryou-are-just-specifying-the-task-differently-the-processor-already-knows-how-to-add-two-numbers&quot; id=&quot;markdown-toc-you-are-not-teaching-the-task-to-the-processoryou-are-just-specifying-the-task-differently-the-processor-already-knows-how-to-add-two-numbers&quot;&gt;You are not teaching the task to the processor–you are just specifying the task differently. The processor already knows how to add two numbers.&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#language-models-are-compilers&quot; id=&quot;markdown-toc-language-models-are-compilers&quot;&gt;Language Models Are Compilers&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#fleshing-out-the-analogy&quot; id=&quot;markdown-toc-fleshing-out-the-analogy&quot;&gt;Fleshing out the analogy&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#the-expressiveness-of-prompting&quot; id=&quot;markdown-toc-the-expressiveness-of-prompting&quot;&gt;The Expressiveness of Prompting&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#specification-with-instructions&quot; id=&quot;markdown-toc-specification-with-instructions&quot;&gt;Specification with instructions&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#llms-are-stochastic-compilers&quot; id=&quot;markdown-toc-llms-are-stochastic-compilers&quot;&gt;LLMs are &lt;em&gt;Stochastic&lt;/em&gt; Compilers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#different-prompting-programming-styles&quot; id=&quot;markdown-toc-different-prompting-programming-styles&quot;&gt;Different prompting (programming) styles&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#lets-go-back-to-our-example-of-math-reasoning&quot; id=&quot;markdown-toc-lets-go-back-to-our-example-of-math-reasoning&quot;&gt;Let’s go back to our example of math reasoning&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#summary-and-key-takeaways&quot; id=&quot;markdown-toc-summary-and-key-takeaways&quot;&gt;Summary and Key Takeaways&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#interactive-demos-and-examples&quot; id=&quot;markdown-toc-interactive-demos-and-examples&quot;&gt;Interactive Demos and Examples&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#advanced-prompting-techniques&quot; id=&quot;markdown-toc-advanced-prompting-techniques&quot;&gt;Advanced Prompting Techniques&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#libraries&quot; id=&quot;markdown-toc-libraries&quot;&gt;Libraries&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#our-recent-work-on-llms&quot; id=&quot;markdown-toc-our-recent-work-on-llms&quot;&gt;Our recent work on LLMs&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#acknowledgements&quot; id=&quot;markdown-toc-acknowledgements&quot;&gt;Acknowledgements&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-evolution-of-abstractions-in-programming&quot;&gt;The Evolution of Abstractions in Programming&lt;/h2&gt;

&lt;p&gt;Here is a simple example of how abstractions evolve. Consider the problem of adding two numbers. Given a specification like “Given two numbers, return their sum,” we can write a program that solves this problem at different levels of abstraction.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Machine code: The lowest level of abstraction. The program is a sequence of instructions that are executed by the CPU. The instructions are represented as a sequence of bytes.
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;00101010 00000000 00000001
00101010 00000001 00000010
10010001 00000010 00000000
00111101 00000000 00000011
00111010 00000000 00000011
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Assembly code: A higher level of abstraction. The program is a sequence of instructions that are executed by the CPU. The instructions are represented as a sequence of mnemonics.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;MOV AX, 1     ; Load the first number (1) into register AX
MOV BX, 2     ; Load the second number (2) into register BX
ADD AX, BX    ; Add the numbers in AX and BX, store the result in AX
MOV CX, AX; Move the result to register CX
; The following code is platform-dependent and prints the value in CX
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;C and Python code: A higher level of abstraction. The program is a sequence of instructions that are executed by the CPU. The instructions are represented as a sequence of keywords.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include&lt;/span&gt; &lt;span class=&quot;cpf&quot;&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;The sum is: %d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;The sum is: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;common-thread&quot;&gt;Common thread&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;There is a task we want to solve, with some input and output.&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;Task:&lt;/strong&gt; Add two numbers&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Input:&lt;/strong&gt; Two numbers&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Output:&lt;/strong&gt; Their sum&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Each level of abstraction is a different representation of the same task.&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;Machine code:&lt;/strong&gt; A sequence of bytes, no translation required&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Assembly code:&lt;/strong&gt; A sequence of mnemonics, assembler&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;C code:&lt;/strong&gt; A sequence of keywords compiler&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Python code:&lt;/strong&gt; A sequence of keywords, interpreter&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Yet, there &lt;span style=&quot;color: red;&quot;&gt;is&lt;/span&gt; a common thread:&lt;/em&gt;&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;A program is a way to communicate the task to the processor.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Just different ways to represent the same task.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;you-are-not-teaching-the-task-to-the-processoryou-are-just-specifying-the-task-differently-the-processor-already-knows-how-to-add-two-numbers&quot;&gt;You are not teaching the task to the processor–you are just specifying the task differently. The processor already knows how to add two numbers.&lt;/h4&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;language-models-are-compilers&quot;&gt;Language Models Are Compilers&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A useful way to think about prompting is as another programming language.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The language model is the compiler:&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;center&gt;
&lt;img src=&quot;https://raw.githubusercontent.com/madaan/madaan.github.io/master/images/llm_compiler/llmcompiler.jpg&quot; alt=&quot;LLM Compiler&quot; width=&quot;450&quot; height=&quot;200&quot; /&gt;
&lt;/center&gt;

&lt;ul&gt;
  &lt;li&gt;Input:
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Task: Add two numbers.
Input: 4, 6
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Output:
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Output: The output is 10.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;You wrote code in natural language, which the language model compiles into a sequence of instructions a processor can execute.:
    &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Task: Add two numbers.
Input: 4, 6
Output: The output is
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;fleshing-out-the-analogy&quot;&gt;Fleshing out the analogy&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;If language models are compilers, what’s the processor?
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;The GPU! The language model converts natural language into a sequence of matrix operations and non-linearities that the GPU can execute.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;If you are not convinced, you can perhaps think about &lt;a href=&quot;https://en.wikipedia.org/wiki/Field-programmable_gate_array&quot;&gt;FPGAs&lt;/a&gt;.&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;FPGAs are programmable processors that can be programmed to execute arbitrary logic. When you write code for an FPGA, you’re essentially changing the connections between its logic gates, thus dictating the flow of information through the processor. The FPGA is the processor and you write code to change the connections between the logic gates.&lt;/li&gt;
          &lt;li&gt;Similarly, the “prompts” you write affect the activations in the language model, and similarly change the flow of information through the language model. The language model is the processor and you write code to change the connections between the neurons.&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;If you are still not convinced, I told you the analogy is imprecise.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;the-expressiveness-of-prompting&quot;&gt;The Expressiveness of Prompting&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Recall: the goal is to &lt;em&gt;specify&lt;/em&gt; the task, not to &lt;em&gt;teach&lt;/em&gt; the task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;specification-with-instructions&quot;&gt;Specification with instructions&lt;/h3&gt;

&lt;h4 style=&quot;color: red;&quot;&gt;Generate a poem that rhymes with &apos;cat&apos; and has 4 lines.&quot;&lt;/h4&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Bat, sat, chat, fat
A furry friend that&apos;s really neat,
It purrs and meows with its tiny feet,
It&apos;s the cutest thing when it&apos;s on your mat.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 style=&quot;color: red;&quot;&gt;&quot;Write a program in Python to test if `n` is prime&quot;&lt;/h4&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;#Input an integer
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&apos;Enter an integer: &apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#If n is less than 2, it is not prime
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;is not a prime number&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;#If n is greater than 2, check if it is prime
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;is_prime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;is_prime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;
            
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_prime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;is a prime number&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&apos;is not a prime number&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;h3 style=&quot;color: blue;&quot;&gt; Specification with examples &lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AKA autocomplete: In addition to or instead of instructions, we can provide examples&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 style=&quot;color: red;&quot;&gt;Prompt:&lt;/h4&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;n&quot;&gt;input_1&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;output_1&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;input_2&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;output_2&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;input_3&lt;/span&gt;

    &lt;span class=&quot;err&quot;&gt;???&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3&gt;Feed the prompt to the model:&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;n&quot;&gt;output_3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3&gt;Example: math reasoning&lt;/h3&gt;

&lt;h4 style=&quot;color: red;&quot;&gt;Prompt&lt;/h4&gt;

&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;Q: Jason had 20 lollipops. He gave Denny some lollipops. 
    Now Jason has 12 lollipops. How many lollipops did Jason
    give to Denny?
    A: The answer is 8


    Q: There were nine computers in the server room. 
    Five more computers were installed each day, from monday
    to thursday. How many computers are now in the server room?&lt;/code&gt;&lt;/pre&gt;

&lt;h4&gt;    Model completion:&lt;/h4&gt;

&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;A: The answer is 29&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href=&quot;images/llm_compiler/llmcompiler.png&quot;&gt;&lt;img src=&quot;images/llm_compiler/llmcompiler.png&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;b&gt;Start playing with the model:&lt;/b&gt; &lt;a href=&quot;https://platform.openai.com/playground&quot;&gt;https://platform.openai.com/playground&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;llms-are-stochastic-compilers&quot;&gt;LLMs are &lt;em&gt;Stochastic&lt;/em&gt; Compilers&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Yes, the LLMs can &lt;em&gt;compile&lt;/em&gt; your instructions and solve the task. BUT…
    &lt;ul&gt;
      &lt;li&gt;They are not deterministic.&lt;/li&gt;
      &lt;li&gt;They fail&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/madaan.github.io/master/images/llm_compiler/instr_fail.jpg&quot; alt=&quot;LLM Compiler&quot; width=&quot;600&quot; height=&quot;800&quot; /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;b&gt;But they listen if you talk nicely to them&lt;b&gt;:&lt;/b&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/madaan.github.io/master/images/llm_compiler/instr_guided_v2.jpg&quot; alt=&quot;LLM Compiler&quot; width=&quot;600&quot; height=&quot;800&quot; /&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;different-prompting-programming-styles&quot;&gt;Different prompting (programming) styles&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;So far, we have seen two different programming styles:
    &lt;ul&gt;
      &lt;li&gt;Specification with instructions&lt;/li&gt;
      &lt;li&gt;Specification with examples&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We also saw that LLMs are stochastic, we may have to try several “variants” of the program to get the right one.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Regular programs also come in various flavors:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;Stylistic differences&lt;/li&gt;
    &lt;/ul&gt;

    &lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c1&quot;&gt;# Good naming and formatting
&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;calculate_area&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;height&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;height&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# Poor naming and formatting
&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;calc_a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;ul&gt;
      &lt;li&gt;Implementation differences&lt;/li&gt;
    &lt;/ul&gt;

    &lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c1&quot;&gt;# Using a set to remove duplicates, more efficient and concise
&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;remove_duplicates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# Using a loop to remove duplicates, less efficient and more complex
&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;remove_duplicates_using_loop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;unique_numbers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;number&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;number&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unique_numbers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;unique_numbers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;number&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unique_numbers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;lets-go-back-to-our-example-of-math-reasoning&quot;&gt;Let’s go back to our example of math reasoning&lt;/h3&gt;

&lt;h4 style=&quot;color: red;&quot;&gt;Text prompt&lt;/h4&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;Q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Jason&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;had&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lollipops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;He&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gave&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Denny&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lollipops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;Now&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Jason&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;has&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lollipops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;How&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;many&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lollipops&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;did&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Jason&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;give&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Denny&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;?&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;The&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;answer&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;


    &lt;span class=&quot;n&quot;&gt;Q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;There&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;were&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nine&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;server&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;room&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;Five&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;more&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;were&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;installed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;each&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;day&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;monday&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;thursday&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;How&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;many&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;are&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;now&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;server&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;room&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;code&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pre&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4&gt;    Model completion:&lt;/h4&gt;

&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;A: The answer is 29&lt;/code&gt;&lt;/pre&gt;

&lt;center&gt;&lt;h1&gt;But we don&apos;t have to use plain boring text always!&lt;/h1&gt;&lt;/center&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;We can supply examples of text → Python program.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;LLM is prompted to generate code (Python). You can then run the Python script using a runtime!&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;Opens up tons of possibilities.&lt;/li&gt;
      &lt;li&gt;The Python program can call sympy, matplotlib, sklearn…&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;&lt;span style=&quot;color: red;&quot;&gt; Code prompt!&lt;/span&gt;&lt;/h4&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
# solution using Python:
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;solution&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;Olivia has $23. She bought five bagels for $3 each. How much money does she have left?&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;money_initial&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bagels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bagel_cost&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;money_spent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bagels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bagel_cost&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;money_left&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;money_initial&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;money_spent&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;money_left&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;



&lt;span class=&quot;c1&quot;&gt;# Q: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?
# solution using Python:
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;solution&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;computers_initial&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;computers_per_day&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;num_days&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 4 days between monday and thursday
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;computers_added&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers_per_day&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num_days&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;computers_total&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers_initial&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers_added&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computers_total&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;summary-and-key-takeaways&quot;&gt;Summary and Key Takeaways&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Language models like GPT-3.5/4/ChatGPT can be thought of as compilers that interpret prompts at various levels of abstraction.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Abstractions in programming languages evolve, with examples ranging from machine code to Python.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Prompts can be specified using different programming styles, such as instructions or examples.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Language models are stochastic compilers, requiring trial and error to produce the desired output.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Alternative forms of input, like code, can be used to achieve more precise results from the model.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;interactive-demos-and-examples&quot;&gt;Interactive Demos and Examples&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://github.com/reasoning-machines/prompt-lib/blob/main/notebooks/YoavsPythonPrompts.ipynb&quot;&gt;Your LLM has a “virtual machine”&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://github.com/madaan/memprompt/blob/main/CompletionAndChat.ipynb&quot;&gt;Standard interface for completion and conversation&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;advanced-prompting-techniques&quot;&gt;Advanced Prompting Techniques&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A nice &lt;a href=&quot;https://arxiv.org/abs/2107.13586&quot;&gt;survey&lt;/a&gt; of prompting. A great read especially if you are interested in understanding where NLP was before prompting.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Another great &lt;a href=&quot;https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/&quot;&gt;blog&lt;/a&gt; from Lilian on recent prompt engineering techniques.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;libraries&quot;&gt;Libraries&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://python.langchain.com/en/latest/index.html&quot;&gt;LangChain&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://platform.openai.com/docs/api-reference/introduction&quot;&gt;OpenAI API&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://github.com/reasoning-machines/prompt-lib&quot;&gt;prompt-lib&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;our-recent-work-on-llms&quot;&gt;Our recent work on LLMs&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://selfrefine.info&quot;&gt;Self-Refine: Iteratively Enhancing Language Model Outputs through Self-Feedback&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://pie4perf.com/&quot;&gt;Optimizing Programs by making Targeted Algorithmic Changes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://reasonwithpal.com/&quot;&gt;Leveraging Python to Assist Language Models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2209.07686&quot;&gt;What makes chain-of-thought prompting work?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://cocogen.structgen.com/&quot;&gt;Generating Structured Plans from using LLMs of Code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://memprompt.com&quot;&gt;Utilizing Memory to Prevent LLMs from Repeating Mistakes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The blog was originally written for a tutorial conducted at the &lt;a href=&quot;https://www.cmu-lti-llm.org/&quot;&gt;CMU-LTI Seminar&lt;/a&gt;. Thanks to the organizers for the opportunity!&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Thanks to &lt;a href=&quot;https://adithya7.github.io/&quot;&gt;Adithya Pratapa&lt;/a&gt; for proofreading the first draft.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Thanks to GPT-4 for generating some examples for this blog.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Sat, 01 Apr 2023 00:00:00 +0000</pubDate>
        <link>https://madaan.github.io/prompting/</link>
        <guid isPermaLink="true">https://madaan.github.io/prompting/</guid>
      </item>
    
      <item>
        <title>Notes on Weight Initialization for Deep Neural Networks</title>
        <description>&lt;style&gt;
.tablelines table, .tablelines td, .tablelines th {

  padding: 0; }
  table tr {
    border-top: 1px solid #cccccc;
    background-color: white;
    margin: 0;
    padding: 0; }
    table tr:nth-child(2n) {
      background-color: #f8f8f8; }
    table tr th {
      font-weight: bold;
      border: 1px solid #cccccc;
      text-align: left;
      margin: 0;
      padding: 6px 13px; }
    table tr td {
      border: 1px solid #cccccc;
      text-align: left;
      margin: 0;
      padding: 6px 13px; }
    table tr th :first-child, table tr td :first-child {
      margin-top: 0; }
    table tr th :last-child, table tr td :last-child {
      margin-bottom: 0; }
.highlight pre { background-color: #272822; }
.highlight .hll { background-color: #272822; }
.highlight .c { color: #75715e } /* Comment */
.highlight .err { color: #960050; background-color: #1e0010 } /* Error */
.highlight .k { color: #66d9ef } /* Keyword */
.highlight .l { color: #ae81ff } /* Literal */
.highlight .n { color: #f8f8f2 } /* Name */
.highlight .o { color: #f92672 } /* Operator */
.highlight .p { color: #f8f8f2 } /* Punctuation */
.highlight .cm { color: #75715e } /* Comment.Multiline */
.highlight .cp { color: #75715e } /* Comment.Preproc */
.highlight .c1 { color: #75715e } /* Comment.Single */
.highlight .cs { color: #75715e } /* Comment.Special */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .kc { color: #66d9ef } /* Keyword.Constant */
.highlight .kd { color: #66d9ef } /* Keyword.Declaration */
.highlight .kn { color: #f92672 } /* Keyword.Namespace */
.highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
.highlight .kr { color: #66d9ef } /* Keyword.Reserved */
.highlight .kt { color: #66d9ef } /* Keyword.Type */
.highlight .ld { color: #e6db74 } /* Literal.Date */
.highlight .m { color: #ae81ff } /* Literal.Number */
.highlight .s { color: #e6db74 } /* Literal.String */
.highlight .na { color: #a6e22e } /* Name.Attribute */
.highlight .nb { color: #f8f8f2 } /* Name.Builtin */
.highlight .nc { color: #a6e22e } /* Name.Class */
.highlight .no { color: #66d9ef } /* Name.Constant */
.highlight .nd { color: #a6e22e } /* Name.Decorator */
.highlight .ni { color: #f8f8f2 } /* Name.Entity */
.highlight .ne { color: #a6e22e } /* Name.Exception */
.highlight .nf { color: #a6e22e } /* Name.Function */
.highlight .nl { color: #f8f8f2 } /* Name.Label */
.highlight .nn { color: #f8f8f2 } /* Name.Namespace */
.highlight .nx { color: #a6e22e } /* Name.Other */
.highlight .py { color: #f8f8f2 } /* Name.Property */
.highlight .nt { color: #f92672 } /* Name.Tag */
.highlight .nv { color: #f8f8f2 } /* Name.Variable */
.highlight .ow { color: #f92672 } /* Operator.Word */
.highlight .w { color: #f8f8f2 } /* Text.Whitespace */
.highlight .mf { color: #ae81ff } /* Literal.Number.Float */
.highlight .mh { color: #ae81ff } /* Literal.Number.Hex */
.highlight .mi { color: #ae81ff } /* Literal.Number.Integer */
.highlight .mo { color: #ae81ff } /* Literal.Number.Oct */
.highlight .sb { color: #e6db74 } /* Literal.String.Backtick */
.highlight .sc { color: #e6db74 } /* Literal.String.Char */
.highlight .sd { color: #e6db74 } /* Literal.String.Doc */
.highlight .s2 { color: #e6db74 } /* Literal.String.Double */
.highlight .se { color: #ae81ff } /* Literal.String.Escape */
.highlight .sh { color: #e6db74 } /* Literal.String.Heredoc */
.highlight .si { color: #e6db74 } /* Literal.String.Interpol */
.highlight .sx { color: #e6db74 } /* Literal.String.Other */
.highlight .sr { color: #e6db74 } /* Literal.String.Regex */
.highlight .s1 { color: #e6db74 } /* Literal.String.Single */
.highlight .ss { color: #e6db74 } /* Literal.String.Symbol */
.highlight .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #f8f8f2 } /* Name.Variable.Class */
.highlight .vg { color: #f8f8f2 } /* Name.Variable.Global */
.highlight .vi { color: #f8f8f2 } /* Name.Variable.Instance */
.highlight .il { color: #ae81ff } /* Literal.Number.Integer.Long */

.highlight .gh { } /* Generic Heading &amp; Diff Header */
.highlight .gu { color: #75715e; } /* Generic.Subheading &amp; Diff Unified/Comment? */
.highlight .gd { color: #f92672; } /* Generic.Deleted &amp; Diff Deleted */
.highlight .gi { color: #a6e22e; } /* Generic.Inserted &amp; Diff Inserted */
&lt;/style&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML&quot;&gt;&lt;/script&gt;

&lt;ul class=&quot;no_toc&quot; id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#tl-dr&quot; id=&quot;markdown-toc-tl-dr&quot;&gt;Tl; dr&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#outline&quot; id=&quot;markdown-toc-outline&quot;&gt;Outline&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#motivation&quot; id=&quot;markdown-toc-motivation&quot;&gt;Motivation&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#investigating-the-sequence-of-multiplications&quot; id=&quot;markdown-toc-investigating-the-sequence-of-multiplications&quot;&gt;Investigating the Sequence of Multiplications&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#a-first-intuitive-solution&quot; id=&quot;markdown-toc-a-first-intuitive-solution&quot;&gt;A first intuitive solution&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#xavier-initialization&quot; id=&quot;markdown-toc-xavier-initialization&quot;&gt;Xavier Initialization&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#why-sqrt512--intuition&quot; id=&quot;markdown-toc-why-sqrt512--intuition&quot;&gt;Why \(\sqrt(512)\)? | Intuition&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#why-sqrt512---proofs&quot; id=&quot;markdown-toc-why-sqrt512---proofs&quot;&gt;Why \(\sqrt(512)\)? |  Proofs&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#1-proof-that-y-sim-mathcaln0-512&quot; id=&quot;markdown-toc-1-proof-that-y-sim-mathcaln0-512&quot;&gt;1. Proof that \(Y \sim \mathcal{N}(0, 512)\)&lt;/a&gt;        &lt;ul&gt;
          &lt;li&gt;&lt;a href=&quot;#11-expectation-mean-of-y&quot; id=&quot;markdown-toc-11-expectation-mean-of-y&quot;&gt;1.1 Expectation (Mean) of Y&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;&lt;a href=&quot;#12-variance-of-y&quot; id=&quot;markdown-toc-12-variance-of-y&quot;&gt;1.2 Variance of Y&lt;/a&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#2-proof-that-y-is-sim--mathcaln0-1--when-a-sim--mathcaln0-1--512&quot; id=&quot;markdown-toc-2-proof-that-y-is-sim--mathcaln0-1--when-a-sim--mathcaln0-1--512&quot;&gt;2. Proof that Y is \(\sim  \mathcal{N}(0, 1)\)  when A \(\sim  \mathcal{N}(0, 1 / 512)\)&lt;/a&gt;        &lt;ul&gt;
          &lt;li&gt;&lt;a href=&quot;#21-expectation-mean-of-y&quot; id=&quot;markdown-toc-21-expectation-mean-of-y&quot;&gt;2.1 Expectation (Mean) of Y&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;&lt;a href=&quot;#22-variance-of-y&quot; id=&quot;markdown-toc-22-variance-of-y&quot;&gt;2.2 Variance of Y&lt;/a&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#summary&quot; id=&quot;markdown-toc-summary&quot;&gt;Summary&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;tl-dr&quot;&gt;Tl; dr&lt;/h1&gt;
&lt;p&gt;Neural networks involve long sequence of multiplications, usually between a matrix and a vector, say \(a*x\). The result of this sequence of multiplications will either have a huge magnitude or be reduced to 0. We can divide \(a\) by a number (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaling_factor&lt;/code&gt;) to scale down its magnitude to the right level. Proper init strategies help us in finding a good &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaling_factor&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;outline&quot;&gt;Outline&lt;/h1&gt;

&lt;p&gt;The problem of weight initialization is motivated by a simulation. The real cause of ill-behaved multiplication output is identified, and scaling the weight matrix using the Xavier initialization is presented as a solution. The rest of the writeup then provides experiments and proofs to explain why such an initialization worked.  The blog originated from a discussion during part 2 of the &lt;a href=&quot;https://course.fast.ai/index.html&quot;&gt;Fast AI course&lt;/a&gt; (Spring 2019 session), and parts of
simulations are taken from the course notebooks.&lt;/p&gt;

&lt;h1 id=&quot;motivation&quot;&gt;Motivation&lt;/h1&gt;

&lt;p&gt;Training (and inference) of a neural network involves a bunch of operations, and one of the most common of these operations is multiplication. Typically, the multiplication happens between matrices. In the case of &lt;em&gt;deep&lt;/em&gt; neural networks, we end up with a long sequence of such multiplications.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;input&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;network_layers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;activation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;weights&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bias&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;investigating-the-sequence-of-multiplications&quot;&gt;Investigating the Sequence of Multiplications&lt;/h2&gt;
&lt;p&gt;To begin our investigation, let’s take a random input vector, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and a random matrix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;.  Note that the numbers are sampled from a &lt;a href=&quot;https://pytorch.org/docs/stable/torch.html#torch.randn&quot;&gt;normal distribution&lt;/a&gt; with a mean 0 and variance 1 or as it is popularly known, \(\mathcal{N}(0, 1)\).&lt;/p&gt;

&lt;p&gt;We’ll multiply the vector &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and the matrix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;  100 times (as if the network had 100 layers), and see what gets out on the other side. &lt;strong&gt;Note that we don’t use any activation function for the sake of simplicity&lt;/strong&gt;.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;@&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;output:&lt;/p&gt;
&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;nan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; has a huge magnitude! It seems like the multiplication snowballed, and the magnitude of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; increased with each step, finally pushing the mean of  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a*x&lt;/code&gt; out of the limits of numbers in python (note that we are feeding &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; back to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a @ x&lt;/code&gt;).&lt;/p&gt;

&lt;h2 id=&quot;a-first-intuitive-solution&quot;&gt;A first intuitive solution&lt;/h2&gt;

&lt;p&gt;Intuitively, since the product of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is becoming large, we may start by reducing the magnitude of the matrix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;. The hope is that because of a smaller &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;, the product &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a * x&lt;/code&gt; won’t shoot up in magnitude.  Thus, we divide our matrix (i.e., each element of the matrix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;) by a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaling_factor&lt;/code&gt; of 100, and repeat the process.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;scaling_factor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;scaling_factor&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;@&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So we did solve the problem of magnitude &lt;em&gt;explosion&lt;/em&gt;, only to create another: the output now &lt;em&gt;vanishes&lt;/em&gt; to 0.&lt;/p&gt;

&lt;h1 id=&quot;xavier-initialization&quot;&gt;Xavier Initialization&lt;/h1&gt;

&lt;p&gt;We saw that using a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scaling_factor&lt;/code&gt; of 100 didn’t quite work. It reduced the product to 0. We had started with the problem of the magnitude exploding to infinity, and the scaling brought it down to 0. Surely, the right solution lies somewhere in the middle. That’s exactly what the &lt;a href=&quot;http://proceedings.mlr.press/v9/glorot10a.html&quot;&gt;Xavier initialization&lt;/a&gt; does: it helps us in finding a scaling factor that would get it right.&lt;/p&gt;

&lt;p&gt;The Xavier initialization suggests using a scaling factor of \(\sqrt(n\_in)\), where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n_in&lt;/code&gt; is the number of inputs to the matrix (or the dimension that’s common with the vector the matrix is being multiplied with).&lt;/p&gt;

&lt;p&gt;In our case, the number of inputs to the matrix \(a\) is 512. Thus, the scaling factor should be \(\sqrt(512)\). In other words, if we divide our matrix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; by \(\sqrt(512)\), we should not see either vanishing or exploding magnitudes. Let’s see if the Xavier init helps:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;math&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;scaling_factor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;scaling_factor&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;@&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0429&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.9888&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The magnitude of the product hasn’t exploded or vanished. In fact, the output has a nice mean (close to 0) and standard deviation (close to 1). Recall that the input was actually sampled from such a distribution. In a way, our solution managed to &lt;em&gt;retain&lt;/em&gt; the distribution of the inputs. That’s a really nice thing, because now we can perform a large number of such multiplications.&lt;/p&gt;

&lt;p&gt;Putting things in context, this translates to being allowed to train &lt;strong&gt;really&lt;/strong&gt; deep neural networks. Note that Xavier initialization is sufficient to solve the problem in this case because we did not use any activation function. If we had used, say a ReLu, the more recent &lt;a href=&quot;https://arxiv.org/abs/1502.01852&quot;&gt;Kaiming Initialization&lt;/a&gt; would have been more effective. So why did this work? What is so special about \(\sqrt(512)\) as a scaling factor?&lt;/p&gt;

&lt;h1 id=&quot;why-sqrt512--intuition&quot;&gt;Why \(\sqrt(512)\)? | Intuition&lt;/h1&gt;

&lt;p&gt;Before we start, let us look closely at our simulation, particularly the following line:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;@&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note that we are not changing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; at all. Thus, the only element that can cause trouble is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;, since it’s being updated. Seems like at some point in the multiplication sequence, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; starts getting high values, and thus the subsequent multiplications keep making things worse. To closely examine this phenomenon, let us denote the product of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;.&lt;/p&gt;

\[y = a * x\]

&lt;p&gt;As in our running examples, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; is a matrix of size  512 x 512, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is a vector of size 512, then output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; is a vector of the size 512.&lt;/p&gt;

&lt;p&gt;To be more explicit, &lt;em&gt;one element of y&lt;/em&gt; is calculated as follows:&lt;/p&gt;

\[y_{i} = a_{i,0} x_{0} + a_{i,1} x_{1} + \cdots + a_{i,n-1} x_{n-1} = \sum_{k=0}^{n-1} a_{i,k} x_{k}\]

&lt;p&gt;As we saw above, &lt;em&gt;something&lt;/em&gt; goes wrong with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; values. That something is the following:&lt;/p&gt;

&lt;p&gt;To compute one element of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;, we add 512 products of one element of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; by one element of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;.  What’s the mean and the variance of such a product? As we show later,  as long as the elements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and the elements in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; are &lt;a href=&quot;https://en.wikipedia.org/wiki/Independence_(probability_theory)&quot;&gt;independent&lt;/a&gt; (which they are in this case; one doesn’t affect the other), the mean is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; and the variance is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;512&lt;/code&gt;. That is, each element of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; is now taking the values &lt;em&gt;as if&lt;/em&gt; it was picked \(\mathcal{N}(0, 512)\)! This can also be seen experimentally as in the following code snippet. To avoid one-off errors, we repeat the experiment for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10000&lt;/code&gt; iterations.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#just like one row of a
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.13198307995796205&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;513.4638&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;22.6597&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In other words, each element of y is now picked from an erratic distribution, and that’s happening because we are adding a product of 512 elements, each picked from \(\mathcal{N}(0, 1)\).  We keep feeding these &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; elements again in the loop as the input, and thus things go haywire soon.&lt;/p&gt;

&lt;p&gt;Now, if we scale the weights of the matrix \(a\) and divide them by \(math.sqrt(512)\), we will be picking elements of \(a\) from a normal distribution with \(0\) mean and variance = \(512\) or \(\mathcal{N}(0, 512)\) (see the next section for a proof).&lt;/p&gt;

&lt;p&gt;This scaling will, in turn, give us a distribution of \(y\) in which each element has 0 mean and std = 1, thus allowing us to repeat the product has many times as we want. This is &lt;strong&gt;NOT&lt;/strong&gt; different from the intuitive solution we had discussed earlier. We were right in guessing that scaling one of the participants in the product may help, and it did. Xavier init helped us in finding out the &lt;em&gt;exact magnitude&lt;/em&gt; of the scaling factor: \(\sqrt(512)\), instead of what we had initially used: 100.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#just like one row of a
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.00671036799326539&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0186&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0092&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It works, and each element of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; (and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; as a whole) now has mean 0 and variance/std 1. We can thus keep multiplying the output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; repeatedly, without worrying about things changing a lot.&lt;/p&gt;

&lt;h1 id=&quot;why-sqrt512---proofs&quot;&gt;Why \(\sqrt(512)\)? |  Proofs&lt;/h1&gt;

&lt;p&gt;We are given that \(x\) and \(a\) are from a normal distribution with mean = 0 and variance = 1 (or \(\mathcal{N}(0, 1)\)). That is, to create \(x\), we pick 512 random numbers from \(\mathcal{N}(0, 1)\) (&lt;a href=&quot;https://pytorch.org/docs/stable/torch.html#torch.randn&quot;&gt;see&lt;/a&gt;). Similarly, to create \(a\), we pick (512 * 512) random numbers from \(\mathcal{N}(0, 1)\) . Then, the \(i^{th}\) element of \(y\) is calculated by multiplying 512 elements of \(a\) (i.e. \(a[i]\)) with 512 elements of \(x\).&lt;/p&gt;

\[y_{i} = a_{i,0} x_{0} + a_{i,1} x_{1} + + a_{i,511} x_{511} = \sum_{k=0}^{511} a_{i,k} x_{k}\]

&lt;h2 id=&quot;1-proof-that-y-sim-mathcaln0-512&quot;&gt;1. Proof that \(Y \sim \mathcal{N}(0, 512)\)&lt;/h2&gt;

&lt;p&gt;Let \(A\), \(X\) and \(Y\) be the random variables from which \(a\), \(x\) and \(y\) are sampled respectively. We know that one element of \(Y\) is created by multiplying 512 elements from \(A\) and \(X\) with each other. That is, we sample 512 elements from \(A\), 512 elements from \(X\), multiply them element by element, and add them. Thus, so far we have:&lt;/p&gt;

\[\begin{aligned}
&amp;amp; A \sim \mathcal{N}(0, 1) \\
&amp;amp; X \sim \mathcal{N}(0, 1) \\ 
&amp;amp; E[A] = 0 \\
&amp;amp; E[X] = 0 \\ 
&amp;amp; Var[A] = Std[A] = 1 \\ 
&amp;amp; Var[X] = Std[X] = 1 \\ 
\end{aligned}\]

&lt;p&gt;and 
\(\begin{aligned}
Y = \sum_{k=0}^{511} A*X
\end{aligned}\)&lt;/p&gt;

&lt;p&gt;Let’s start by calculating the mean of Y&lt;/p&gt;

&lt;h3 id=&quot;11-expectation-mean-of-y&quot;&gt;1.1 Expectation (Mean) of Y&lt;/h3&gt;

\[\begin{aligned}
E[Y] &amp;amp; = E[AX] \\
&amp;amp; = E[A] * E[X] = 0 &amp;amp; (\text{A and X are independent, and E[A] = E[X] = 0})
\end{aligned}\]

&lt;p&gt;(See &lt;a href=&quot;https://en.wikipedia.org/wiki/Expected_value#Basic_properties&quot;&gt;properties&lt;/a&gt; of expectation)&lt;/p&gt;

&lt;h3 id=&quot;12-variance-of-y&quot;&gt;1.2 Variance of Y&lt;/h3&gt;

&lt;p&gt;We know that \(Y\) is created by adding 512 elements sampled from \(A*X\). Thus, let’s first calculate the variance of \(A*X\). That is, what would be the variance if we pick one element randomly from \(A\) and \(X\) and then multiply them?&lt;/p&gt;

\[\begin{aligned}
Var[AX] &amp;amp; = Var(A)*(E(X))^2 + Var(X)*(E(A))^2 + Var(A)*Var(X) &amp;amp; (\text{A and X are independent}) \\
&amp;amp; = Var(A) * Var(X)\\
&amp;amp; = 1
\end{aligned}\]

&lt;p&gt;(&lt;a href=&quot;https://stats.stackexchange.com/questions/52646/variance-of-product-of-multiple-random-variables&quot;&gt;Reference for the variance property&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;We know that Y is formed by summing 512 such elements or&lt;/p&gt;

\[\begin{aligned}
Y = \sum_{k=0}^{511} A*X
\end{aligned}\]

&lt;p&gt;Thus&lt;/p&gt;

\[\begin{aligned}
Var[Y] &amp;amp; = Var[\sum_{k=0}^{511}A * X] \\
&amp;amp; = \sum_{k=0}^{511} Var[AX] &amp;amp;(\text{A and X are independent}) \\
&amp;amp; = \sum_{k=0}^{511} 1  &amp;amp;(\text{Var[AX] = 1 from above}) \\\\
&amp;amp; = 512
\end{aligned}\]

&lt;p&gt;(&lt;a href=&quot;https://en.wikipedia.org/wiki/Covariance#Properties&quot;&gt;Reference&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;In other words, \(Y \sim  \mathcal{N}(0, 512)\) which is terrible, since Y now varies a lot! The experiment is reproduced below for ready reference. &lt;em&gt;Each of the ys have a large variance!&lt;/em&gt; As they are fed to the subsequent layers, the product can only get worse, as we’ve seen.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#just like one row of a
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.10872242888212204&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;514.2963&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;2-proof-that-y-is-sim--mathcaln0-1--when-a-sim--mathcaln0-1--512&quot;&gt;2. Proof that Y is \(\sim  \mathcal{N}(0, 1)\)  when A \(\sim  \mathcal{N}(0, 1 / 512)\)&lt;/h2&gt;

&lt;p&gt;If we scale the weights of the matrix \(a\) and divide them by this \(\sqrt(512)\), we will be picking elements of \(a\) from a distribution with \(0\) mean and variance = (1 / 512) i.e.  \(\mathcal{N}(0, 1 / 512)\). This will in turn give us a distribution of \(y\) in which each element has 0 mean and std = 1, thus allowing us to repeat the product as many times as we want (or in other words, make our network deeper).&lt;/p&gt;

&lt;p&gt;We will now prove that dividing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; by \(\sqrt(512)\) leads to \(Y\) getting a better distribution.&lt;/p&gt;

&lt;p&gt;Now we have&lt;/p&gt;

\[\begin{aligned}
&amp;amp; A \sim  \mathcal{N}(0, 1 / 512) \\
&amp;amp; X \sim  \mathcal{N}(0, 1) \\
&amp;amp; E[A] = 0 \\
&amp;amp; E[X] = 0 \\
&amp;amp; Var[A] = 1 / 512, Std[A] = 1 / \sqrt(512) \\
&amp;amp; Var[X] = Std[X] = 1 \\
\end{aligned}\]

&lt;h3 id=&quot;21-expectation-mean-of-y&quot;&gt;2.1 Expectation (Mean) of Y&lt;/h3&gt;

\[\begin{aligned}
E[Y] &amp;amp; = E[AX] \\
&amp;amp; = E[A] * E[X] = 0 \ (\because E[A] = E[X] = 0)
\end{aligned}\]

&lt;h3 id=&quot;22-variance-of-y&quot;&gt;2.2 Variance of Y&lt;/h3&gt;

&lt;p&gt;As before, let’s first calculate the variance of \(AX\). That is, what would be the variance if we pick one element randomly from \(A\) and \(X\) and then multiply them?&lt;/p&gt;

\[\begin{aligned}
Var[AX] &amp;amp; = Var(A)*(E(X))^2 + Var(X)*(E(A))^2 + Var(A)*Var(X) \\
&amp;amp; = Var(A) * Var(X) = 1 / 512
\end{aligned}\]

&lt;p&gt;Now,&lt;/p&gt;

\[\begin{aligned}
Y = \sum_{k=0}^{511} A*X
\end{aligned}\]

&lt;p&gt;Thus,&lt;/p&gt;

\[\begin{aligned}
Var[Y] &amp;amp; = Var[\sum_{k=0}^{511}A * X] \\
&amp;amp; = \sum_{k=0}^{511} Var[AX] &amp;amp;(\text{A and X are independent}) \\
&amp;amp; = \sum_{k=0}^{511} 1 / 512  &amp;amp;(\text{Var[AX] = 1 from above}) \\\\
&amp;amp; = 1
\end{aligned}\]

&lt;p&gt;In other words, \(Y \sim \mathcal{N}(0, 1)\) which is what we wanted! Let’s do an experiment to make sure this holds:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10000&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_dim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#just like one row of a
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_iter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.008042885749042035&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.9856&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.9928&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Works! Each element of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; will thus be sampled from a well behaved distribution.  Here is the original simulation with the fix for a quick reference:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;scaling_factor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;randn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;512&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;scaling_factor&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt; 
	&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;@&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;language-js highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0121&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.1693&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;The operation of sequenced multiplication of matrices lies at the core of neural networks. We see that without proper initialization, inputs sampled from well-behaved distribution \(( \mathcal{N}(0, 1))\)  will vanish (over-scaling) or explode (under-scaling). Dividing weight matrix by \(\sqrt(num\_inputs)\) (num_inputs = 512 in the running example), known as  &lt;a href=&quot;http://proceedings.mlr.press/v9/glorot10a.html&quot;&gt;Xavier Initialization&lt;/a&gt;, helps in ensuring that the output of each of the multiplications is well-behaved, thus ensuring that the sequence of multiplications yields a reasonable output at each step. While Xavier initialization puts us on the right track, &lt;a href=&quot;https://arxiv.org/abs/1502.01852&quot;&gt;Kaiming Initialization&lt;/a&gt; provides the optimal scaling factor when ReLu is used as an activation (non-linearity) between multiplications in the network.&lt;/p&gt;

</description>
        <pubDate>Mon, 01 Apr 2019 00:00:00 +0000</pubDate>
        <link>https://madaan.github.io/init/</link>
        <guid isPermaLink="true">https://madaan.github.io/init/</guid>
      </item>
    
      <item>
        <title>The Curious Case of the Missing US 101 Exits</title>
        <description>&lt;h2 id=&quot;tl-dr&quot;&gt;TL; DR&lt;/h2&gt;

&lt;p&gt;Exits on the &lt;a href=&quot;https://en.wikipedia.org/wiki/U.S._Route_101&quot;&gt;US 101&lt;/a&gt; are not numbered sequentially, and are based on distance. Two subsequent exits that are 5 miles apart will have exit numbers that differ by 5.
Various visualizations show that the exits are densely packed in the cities and become increasingly sparse as we move north.&lt;/p&gt;

&lt;hr /&gt;

&lt;h6 class=&quot;no_toc&quot; id=&quot;contents&quot;&gt;Contents&lt;/h6&gt;

&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#tl-dr&quot; id=&quot;markdown-toc-tl-dr&quot;&gt;TL; DR&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#background&quot; id=&quot;markdown-toc-background&quot;&gt;Background&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#collecting-exit-numbers&quot; id=&quot;markdown-toc-collecting-exit-numbers&quot;&gt;Collecting Exit Numbers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#visualizing-the-gaps&quot; id=&quot;markdown-toc-visualizing-the-gaps&quot;&gt;Visualizing the Gaps&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#transformations&quot; id=&quot;markdown-toc-transformations&quot;&gt;Transformations&lt;/a&gt;        &lt;ul&gt;
          &lt;li&gt;&lt;a href=&quot;#1-presentabsent-array&quot; id=&quot;markdown-toc-1-presentabsent-array&quot;&gt;1. Present/Absent Array&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;&lt;a href=&quot;#2-running-sum&quot; id=&quot;markdown-toc-2-running-sum&quot;&gt;2. Running Sum&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;&lt;a href=&quot;#3-binning&quot; id=&quot;markdown-toc-3-binning&quot;&gt;3. Binning&lt;/a&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#spectrogram--with-presentabsent-array&quot; id=&quot;markdown-toc-spectrogram--with-presentabsent-array&quot;&gt;Spectrogram  with Present/Absent Array&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#spectrogram--with-running-sum&quot; id=&quot;markdown-toc-spectrogram--with-running-sum&quot;&gt;Spectrogram  with Running Sum&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#trendline&quot; id=&quot;markdown-toc-trendline&quot;&gt;Trendline&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#so-whats-happening&quot; id=&quot;markdown-toc-so-whats-happening&quot;&gt;So, What’s Happening?&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#the-large-missing-patch&quot; id=&quot;markdown-toc-the-large-missing-patch&quot;&gt;The Large Missing Patch&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#the-missing-exit-413&quot; id=&quot;markdown-toc-the-missing-exit-413&quot;&gt;The Missing Exit 413&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;I’ve known about the existence of the US 101 for about 2 years now; hopping on every now and then for moving around the Bay area. I’ve never paid close attention to the exit numbers and always assumed that the exits are assigned numbers in the serial order. For example, exit 400 will be between exits 399 and 401 etc. I was wrong.&lt;/p&gt;

&lt;p&gt;The truth dawned upon me as I started noticing the exit numbers while driving from Sunnyvale to Foster City a couple of nights ago. To kill the boredom and monotony, I started to try and calculate the square of an exit number before the next exit (&lt;em&gt;the exits on this particular stretch are centered around 400, so it’s easier; 405² = (400 + 5)² etc.&lt;/em&gt;). It didn’t take long before it became clear to me that not all the exits were present. Particularly shocking was the fact that the anomaly (?) existed quite close to the home as well. I passed exit 412 (Ralston/Oracle) and proceeded to take exit 414 (Hillsdale) for Foster City; there was no Exit 413. I’ll be damned.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;collecting-exit-numbers&quot;&gt;Collecting Exit Numbers&lt;/h2&gt;

&lt;p&gt;The list of exits was easy to obtain. I scraped &lt;a href=&quot;https://iexitapp.com/exits/California/US%20101/North/654&quot;&gt;these&lt;/a&gt;  &lt;a href=&quot;https://iexitapp.com/exits/California/US%20101/South/654&quot;&gt;links&lt;/a&gt; and obtained a list of the northbound and southbound exits (I’ve posted a cleaned list of exits &lt;a href=&quot;https://gist.github.com/madaan/3523e3dfac22b1fd2b184eea5ab09175&quot;&gt;here&lt;/a&gt;). For this analysis, I used only the exit numbers, and removed the alphabetical suffixes (414A, 414B becomes a single entry, 414). The exits obtained were between 1 to 734, and included California only (the US 101 extends to Oregon and Washington as well). I verified the correctness of the lists by spot checking a few exits using Google maps. It’s worth mentioning that there are a few instances where an exit is only northbound or southbound. For example, exit 415 (Kehoe Avenue), between 92 and 3rd Avenue San Mateo only happens on northbound 101. However, 95% of the exits have both a north and a south instance, and thus I’ll focus only on the northbound exits in this post in the interest of brevity. Finally, note that the exit numbers start from the south and increase as we move north: from exit 1  (4th Street in LA) to exit 734 (Patricks Point Drive, Trinidad).&lt;/p&gt;

&lt;hr /&gt;
&lt;h2 id=&quot;visualizing-the-gaps&quot;&gt;Visualizing the Gaps&lt;/h2&gt;

&lt;p&gt;The list of exits begins at 1 and ends at 734, and has 342 elements. Thus, there are 392 exits (53.40%) missing in the sequence. Here is a random snippet from the list of exits:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[381, 382, 383, 384, 385, 386, 388, 389, 391, 392, 393]&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From the sample, we can see that the exits 387 and 390 don’t exist.
I’ll apply some basic transformations on the list of exits to make it easier to see what’s going on. To illustrate the transformations, I’ll use the running example of a smaller list, [1, 2, 6] (exits 1, 2 and 6 are present, exits 3, 4 and 5 are absent). The transformations are discussed next.&lt;/p&gt;

&lt;h3 id=&quot;transformations&quot;&gt;Transformations&lt;/h3&gt;

&lt;h4 id=&quot;1-presentabsent-array&quot;&gt;1. Present/Absent Array&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Create an array by placing a “1” for every exit that’s present, and a “0” for every exit that’s absent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;E.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[1, 2, 6] =&amp;gt; [1, 1, 0, 0, 0, 1]&lt;/code&gt;.&lt;/p&gt;

&lt;h4 id=&quot;2-running-sum&quot;&gt;2. Running Sum&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Create the present/absent array, but put a “1” for the missing exits and a “0” for the present ones.&lt;/li&gt;
  &lt;li&gt;Record a running sum of the elements in the new array, but reset the running sum every time a “0” is encountered.&lt;/li&gt;
  &lt;li&gt;The running sum thus captures the &lt;em&gt;intensity&lt;/em&gt; of gaps. Longer gaps in the exits will lead to larger values of the running sum.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;E.g.  &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[1, 2, 6] =&amp;gt; [0, 0, 1, 1, 1, 0] =&amp;gt; [0, 0, 1, 2, 3, 0]&lt;/code&gt;&lt;/p&gt;

&lt;h4 id=&quot;3-binning&quot;&gt;3. Binning&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;Create the present/absent array.&lt;/li&gt;
  &lt;li&gt;Decide on a bin size, say N.&lt;/li&gt;
  &lt;li&gt;Create another array from the present/absent array, by summing every N elements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;E.g. for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N = 2&lt;/code&gt;, we get:
 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[1, 2, 6] =&amp;gt; [1, 1, 0, 0, 0, 1] =&amp;gt; [2, 0, 1]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The visualizations are presented next. All the visualizations were created using standard matplotlib functions (particularly, &lt;a href=&quot;https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html?highlight=matplotlib%20pyplot%20imshow#matplotlib.pyplot.imshow&quot;&gt;imshow&lt;/a&gt;). The trendline was generated using &lt;a href=&quot;https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html&quot;&gt;numpy polyfit&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;spectrogram--with-presentabsent-array&quot;&gt;Spectrogram  with Present/Absent Array&lt;/h3&gt;
&lt;p&gt;The following plot was generated using the present/absent array. There is a black line for every exit that’s present (i.e., for every 1 in the present/absent array).
&lt;img src=&quot;https://i.imgur.com/417NwQc.png&quot; alt=&quot;Spectrogram&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As can be clearly seen from the plot, the exits become sparser as we move north. There seems to be a large gap close to exit 550, with about 50 exits missing! Exits closer to the starting point in LA seems to be densely packed.&lt;/p&gt;

&lt;h3 id=&quot;spectrogram--with-running-sum&quot;&gt;Spectrogram  with Running Sum&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://i.imgur.com/CTwmH5c.png&quot; alt=&quot;Running Sum&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The above plot uses the value of the running sum to decide on the color of the line that’s plotted. No gaps (running sum = 0) are black, and the colors change from red to blue, yellow and ultimately white as the value of the running sum increases. The plot reinforces the message about sparsity towards the north. 
A neat insight that this plot makes more obvious is the distinction between cities and wildernesses. The black patches are the densely populated localities, and the red, blue, yellow patches are the gaps. The large gap of around 50 missing exits near exit 550 comes out very distinctly. We also see that exits closer to the starting point (LA) and around exit number 400 (Bay Area!) are pretty densely allocated as opposed to rest of the freeway.&lt;/p&gt;

&lt;h3 id=&quot;trendline&quot;&gt;Trendline&lt;/h3&gt;
&lt;p&gt;The next plot shows the binned present/absent array with a bin size of 25, along with a trend. The plot reinforces our notes from the above sections that the number of exits becomes smaller and smaller as we move north.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://i.imgur.com/Zap7gFC.png&quot; alt=&quot;Trendline&quot; /&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;so-whats-happening&quot;&gt;So, What’s Happening?&lt;/h2&gt;
&lt;p&gt;It turns out that these exits are not sequential, and are based to serve as a basis for measuring the distance. From the &lt;a href=&quot;https://en.wikipedia.org/wiki/Exit_numbers_in_the_United_States&quot;&gt;relevant wiki&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Freeway exits in the United States are usually numbered in two formats: distance-based and sequential.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, that’s the &lt;em&gt;mystery&lt;/em&gt;. The exit numbers are derived based on the distance between the consecutive exits. Finally, let’s see if the findings are backed by maps.&lt;/p&gt;

&lt;h3 id=&quot;the-large-missing-patch&quot;&gt;The Large Missing Patch&lt;/h3&gt;

&lt;p&gt;All of the plots revealed a large gap near the northern end of the 101. I looked up the exact exits from the list of northbound exits, and it turns out that exit 557 is followed by exit 609. That’s 52 exits missing, which should mean a distance of around 52 miles. That seems to be the case, as indicated by this snippet taken from &lt;a href=&quot;https://www.google.com/maps/dir/37.5269594,-122.2707093/37.5530477,-122.295662/@37.5394901,-122.2979977,14.2z/data=!4m2!4m1!3e0&quot;&gt;this route&lt;/a&gt;.
&lt;img src=&quot;https://i.imgur.com/xJ5DZtw.png&quot; alt=&quot;The Gaps&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;the-missing-exit-413&quot;&gt;The Missing Exit 413&lt;/h3&gt;

&lt;p&gt;Similarly, it seems that the distance between the Hillsdale and the Oracle exit is about 1.6 miles, and the distance between Oracle and the San Mateo exit is 2.3 miles, which explains the missing exit 413!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://i.imgur.com/GwSHW62.png&quot; alt=&quot;enter image description here&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Thanks for making it this far, and please don’t hesitate to share your questions/comments. Thanks to Saugata Chowdhury for the helpful brainstorming session.&lt;/p&gt;
</description>
        <pubDate>Wed, 31 Jan 2018 00:00:00 +0000</pubDate>
        <link>https://madaan.github.io/exits/</link>
        <guid isPermaLink="true">https://madaan.github.io/exits/</guid>
      </item>
    
      <item>
        <title>Training Char-RNNs for Transferring Name Styles</title>
        <description>&lt;style&gt;
.tablelines table, .tablelines td, .tablelines th {

  padding: 0; }
  table tr {
    border-top: 1px solid #cccccc;
    background-color: white;
    margin: 0;
    padding: 0; }
    table tr:nth-child(2n) {
      background-color: #f8f8f8; }
    table tr th {
      font-weight: bold;
      border: 1px solid #cccccc;
      text-align: left;
      margin: 0;
      padding: 6px 13px; }
    table tr td {
      border: 1px solid #cccccc;
      text-align: left;
      margin: 0;
      padding: 6px 13px; }
    table tr th :first-child, table tr td :first-child {
      margin-top: 0; }
    table tr th :last-child, table tr td :last-child {
      margin-bottom: 0; }
      
.highlight pre { background-color: #272822; }
.highlight .hll { background-color: #272822; }
.highlight .c { color: #75715e } /* Comment */
.highlight .err { color: #960050; background-color: #1e0010 } /* Error */
.highlight .k { color: #66d9ef } /* Keyword */
.highlight .l { color: #ae81ff } /* Literal */
.highlight .n { color: #f8f8f2 } /* Name */
.highlight .o { color: #f92672 } /* Operator */
.highlight .p { color: #f8f8f2 } /* Punctuation */
.highlight .cm { color: #75715e } /* Comment.Multiline */
.highlight .cp { color: #75715e } /* Comment.Preproc */
.highlight .c1 { color: #75715e } /* Comment.Single */
.highlight .cs { color: #75715e } /* Comment.Special */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .kc { color: #66d9ef } /* Keyword.Constant */
.highlight .kd { color: #66d9ef } /* Keyword.Declaration */
.highlight .kn { color: #f92672 } /* Keyword.Namespace */
.highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
.highlight .kr { color: #66d9ef } /* Keyword.Reserved */
.highlight .kt { color: #66d9ef } /* Keyword.Type */
.highlight .ld { color: #e6db74 } /* Literal.Date */
.highlight .m { color: #ae81ff } /* Literal.Number */
.highlight .s { color: #e6db74 } /* Literal.String */
.highlight .na { color: #a6e22e } /* Name.Attribute */
.highlight .nb { color: #f8f8f2 } /* Name.Builtin */
.highlight .nc { color: #a6e22e } /* Name.Class */
.highlight .no { color: #66d9ef } /* Name.Constant */
.highlight .nd { color: #a6e22e } /* Name.Decorator */
.highlight .ni { color: #f8f8f2 } /* Name.Entity */
.highlight .ne { color: #a6e22e } /* Name.Exception */
.highlight .nf { color: #a6e22e } /* Name.Function */
.highlight .nl { color: #f8f8f2 } /* Name.Label */
.highlight .nn { color: #f8f8f2 } /* Name.Namespace */
.highlight .nx { color: #a6e22e } /* Name.Other */
.highlight .py { color: #f8f8f2 } /* Name.Property */
.highlight .nt { color: #f92672 } /* Name.Tag */
.highlight .nv { color: #f8f8f2 } /* Name.Variable */
.highlight .ow { color: #f92672 } /* Operator.Word */
.highlight .w { color: #f8f8f2 } /* Text.Whitespace */
.highlight .mf { color: #ae81ff } /* Literal.Number.Float */
.highlight .mh { color: #ae81ff } /* Literal.Number.Hex */
.highlight .mi { color: #ae81ff } /* Literal.Number.Integer */
.highlight .mo { color: #ae81ff } /* Literal.Number.Oct */
.highlight .sb { color: #e6db74 } /* Literal.String.Backtick */
.highlight .sc { color: #e6db74 } /* Literal.String.Char */
.highlight .sd { color: #e6db74 } /* Literal.String.Doc */
.highlight .s2 { color: #e6db74 } /* Literal.String.Double */
.highlight .se { color: #ae81ff } /* Literal.String.Escape */
.highlight .sh { color: #e6db74 } /* Literal.String.Heredoc */
.highlight .si { color: #e6db74 } /* Literal.String.Interpol */
.highlight .sx { color: #e6db74 } /* Literal.String.Other */
.highlight .sr { color: #e6db74 } /* Literal.String.Regex */
.highlight .s1 { color: #e6db74 } /* Literal.String.Single */
.highlight .ss { color: #e6db74 } /* Literal.String.Symbol */
.highlight .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #f8f8f2 } /* Name.Variable.Class */
.highlight .vg { color: #f8f8f2 } /* Name.Variable.Global */
.highlight .vi { color: #f8f8f2 } /* Name.Variable.Instance */
.highlight .il { color: #ae81ff } /* Literal.Number.Integer.Long */

.highlight .gh { } /* Generic Heading &amp; Diff Header */
.highlight .gu { color: #75715e; } /* Generic.Subheading &amp; Diff Unified/Comment? */
.highlight .gd { color: #f92672; } /* Generic.Deleted &amp; Diff Deleted */
.highlight .gi { color: #a6e22e; } /* Generic.Inserted &amp; Diff Inserted */
&lt;/style&gt;

&lt;h6 class=&quot;no_toc&quot; id=&quot;contents&quot;&gt;Contents&lt;/h6&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#introduction&quot; id=&quot;markdown-toc-introduction&quot;&gt;Introduction&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#dataset&quot; id=&quot;markdown-toc-dataset&quot;&gt;Dataset&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#rnns&quot; id=&quot;markdown-toc-rnns&quot;&gt;RNNs&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#cross-seeding&quot; id=&quot;markdown-toc-cross-seeding&quot;&gt;Cross Seeding&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#looking-at-the-data&quot; id=&quot;markdown-toc-looking-at-the-data&quot;&gt;Looking at the Data&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#top-5-most-popular-names&quot; id=&quot;markdown-toc-top-5-most-popular-names&quot;&gt;Top 5 Most Popular Names&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#name-length-distributions&quot; id=&quot;markdown-toc-name-length-distributions&quot;&gt;Name Length Distributions&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#input-representation&quot; id=&quot;markdown-toc-input-representation&quot;&gt;Input Representation&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#a-encoding&quot; id=&quot;markdown-toc-a-encoding&quot;&gt;a) Encoding&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#b-standardization&quot; id=&quot;markdown-toc-b-standardization&quot;&gt;b) Standardization&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#c-embeddings&quot; id=&quot;markdown-toc-c-embeddings&quot;&gt;c) Embeddings&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#model&quot; id=&quot;markdown-toc-model&quot;&gt;Model&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#transferring-name-styles&quot; id=&quot;markdown-toc-transferring-name-styles&quot;&gt;Transferring Name Styles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#results&quot; id=&quot;markdown-toc-results&quot;&gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;h5 id=&quot;dataset&quot;&gt;Dataset&lt;/h5&gt;

&lt;p&gt;A few days ago I was enticed into thinking if there is an app for &lt;em&gt;generating&lt;/em&gt; names (&lt;strike&gt;no, not a startup idea&lt;/strike&gt;).
You know, something along the lines of “generate 10 Indian names”. I didn’t know the answer, but it got me thinking about names.
Before the flame of this random musing could die down, I found a &lt;a href=&quot;https://mbejda.github.io/&quot;&gt;dataset of names&lt;/a&gt;, thanks to
&lt;a href=&quot;https://github.com/mbejda&quot;&gt;mbejda&lt;/a&gt;. The dataset consists of Hispanic, Indian, Caucasian and African American names. I 
had the dataset, so something had to be done.&lt;/p&gt;

&lt;h5 id=&quot;rnns&quot;&gt;RNNs&lt;/h5&gt;

&lt;p&gt;Now perhaps you have read &lt;a href=&quot;http://karpathy.github.io/2015/05/21/rnn-effectiveness/&quot;&gt;Karpathy’s fantastic blog&lt;/a&gt; on RNNs and 
what they do.&lt;/p&gt;

&lt;p&gt;Here is one of quotes from the blog:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;We’ll train RNNs to generate text character by character and ponder the question “how is that even possible?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes, it’s pretty mind-blowing. The blog goes into the details of &lt;em&gt;how&lt;/em&gt; it’s done. Here’s a short summary of &lt;em&gt;what&lt;/em&gt; RNNs
do: RNNs can take a bunch of text, say T, and learn to generate new text that would &lt;em&gt;seem&lt;/em&gt; to be taken from T.
For example, we can train an RNN on all the works of William Shakespeare, and the RNN can, in turn, generate &lt;strong&gt;new text&lt;/strong&gt; that
would seem to be written by Shakespeare. The blog I’ve linked to contains this and many other interesting examples. The 
way RNNs do this is by learning to “predict the next character” in the sequence given the hitherto seen characters.&lt;/p&gt;

&lt;p&gt;Putting it together, I had:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A model that can learn to generate new samples from a piece of text.&lt;/li&gt;
  &lt;li&gt;A corpus of names.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So it would not be very inspiring if I say that we can use RNNs to generate new names. Yes, we can, and yes, I did. I got
an “Indian Name Generator” generating &lt;em&gt;Deepaks&lt;/em&gt; and &lt;em&gt;Nehas&lt;/em&gt;, a “Caucasian Name Generator” generating &lt;em&gt;Michaels&lt;/em&gt; and
&lt;em&gt;Jennifers&lt;/em&gt; and so on. It was &lt;em&gt;something&lt;/em&gt;, but as I said, not very interesting.&lt;/p&gt;

&lt;h5 id=&quot;cross-seeding&quot;&gt;Cross Seeding&lt;/h5&gt;

&lt;p&gt;Now, I didn’t feel like throwing all these RNNs away, so wondered what would happen if I feed, say, the “Indian Name Generator”
with a first few characters from a Caucasian name, and let it generate the rest? Will it try to &lt;strong&gt;create&lt;/strong&gt; a name that
sounds Indian? So I ran a bunch of these experiments, and present the somewhat more interesting results in this post.
I also used seeds from unconventional names, like those of Pokémons and wrestlers. It was fun to see all these different
RNNs take a stab at creating a name that sounds to be from their domain:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;seed&lt;/th&gt;
      &lt;th&gt;african_american&lt;/th&gt;
      &lt;th&gt;caucasian&lt;/th&gt;
      &lt;th&gt;hispanic&lt;/th&gt;
      &lt;th&gt;indian&lt;/th&gt;
      &lt;th&gt;all_races&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;undertaker&lt;/td&gt;
      &lt;td&gt;underta&lt;/td&gt;
      &lt;td&gt;undertall nix#&lt;/td&gt;
      &lt;td&gt;undertan starlir#&lt;/td&gt;
      &lt;td&gt;underta romero#&lt;/td&gt;
      &lt;td&gt;undertala#&lt;/td&gt;
      &lt;td&gt;undertayshawn king#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;aman madaan&lt;/td&gt;
      &lt;td&gt;aman mad&lt;/td&gt;
      &lt;td&gt;aman madadenis#&lt;/td&gt;
      &lt;td&gt;aman madich#&lt;/td&gt;
      &lt;td&gt;aman madro l gonzalez#&lt;/td&gt;
      &lt;td&gt;aman madhkaran#&lt;/td&gt;
      &lt;td&gt;aman madha#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jose luis&lt;/td&gt;
      &lt;td&gt;jose l&lt;/td&gt;
      &lt;td&gt;jose l graham#&lt;/td&gt;
      &lt;td&gt;jose l ramirez#&lt;/td&gt;
      &lt;td&gt;jose l morales#&lt;/td&gt;
      &lt;td&gt;jose lal sharma#&lt;/td&gt;
      &lt;td&gt;jose l rodriguez#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hideyoshi&lt;/td&gt;
      &lt;td&gt;hideyo&lt;/td&gt;
      &lt;td&gt;hideyon u bennett#&lt;/td&gt;
      &lt;td&gt;hideyo g morio#&lt;/td&gt;
      &lt;td&gt;hideyordo rodriguez#&lt;/td&gt;
      &lt;td&gt;hideyohar sharma#&lt;/td&gt;
      &lt;td&gt;hideyon d brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;dan fineman&lt;/td&gt;
      &lt;td&gt;dan f&lt;/td&gt;
      &lt;td&gt;dan f briggs#&lt;/td&gt;
      &lt;td&gt;dan f witharr#&lt;/td&gt;
      &lt;td&gt;dan flekrez#&lt;/td&gt;
      &lt;td&gt;dan farjat saini#&lt;/td&gt;
      &lt;td&gt;dan francersiii#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hulk hogan&lt;/td&gt;
      &lt;td&gt;hulk ho&lt;/td&gt;
      &lt;td&gt;hulk hornes#&lt;/td&gt;
      &lt;td&gt;hulk howstie#&lt;/td&gt;
      &lt;td&gt;hulk hoelles.maldonado#&lt;/td&gt;
      &lt;td&gt;hulk holoo chand singh#&lt;/td&gt;
      &lt;td&gt;hulk holu#&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;This post has 6 sections: The introduction just got over. We’ll now take a quick look at the data, followed by a discussion on 
how the input is converted to a representation that can be used for training these name generators. We’ll then look at
some details of the model, how the predictions are done and the name styles transferred, and finally present the results.&lt;/p&gt;

&lt;p&gt;The code, with cleaned + processed dataset, and notes on how to run the training and scoring processes, is located &lt;a href=&quot;https://github.com/madaan/char-rnn-names&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;looking-at-the-data&quot;&gt;Looking at the Data&lt;/h2&gt;
&lt;p&gt;Although there is no limit to the number of different analysis we can run, we’ll present only two here in the interest of
space (and the attention span): i) the most popular names and ii) the name length distributions. Looking at the most popular names
will give us a feel for the dataset, and the name length distributions were added to add more plots and make this post look
fancier (and it’s used somewhere down the line, too).&lt;/p&gt;

&lt;h5 id=&quot;top-5-most-popular-names&quot;&gt;Top 5 Most Popular Names&lt;/h5&gt;

&lt;p&gt;The following tables list the top 5 most popular names for each of the races. Please note that the first and the last
names are listed separately (e.g., Latoya Williams is &lt;strong&gt;not&lt;/strong&gt; the most popular African-American female name; Latoya is the
most popular first name for African-American females, and Williams is the most popular last name for African-American 
females).&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;African American&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Female  First Names&lt;/td&gt;
      &lt;td&gt;Female Last names&lt;/td&gt;
      &lt;td&gt;Male First Names&lt;/td&gt;
      &lt;td&gt;Male Last Names&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;latoya&lt;/td&gt;
      &lt;td&gt;williams&lt;/td&gt;
      &lt;td&gt;michael&lt;/td&gt;
      &lt;td&gt;johnson&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ashley&lt;/td&gt;
      &lt;td&gt;johnson&lt;/td&gt;
      &lt;td&gt;james&lt;/td&gt;
      &lt;td&gt;brown&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;patricia&lt;/td&gt;
      &lt;td&gt;brown&lt;/td&gt;
      &lt;td&gt;anthony&lt;/td&gt;
      &lt;td&gt;jones&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;angela&lt;/td&gt;
      &lt;td&gt;smith&lt;/td&gt;
      &lt;td&gt;willie&lt;/td&gt;
      &lt;td&gt;jackson&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mary&lt;/td&gt;
      &lt;td&gt;jackson&lt;/td&gt;
      &lt;td&gt;robert&lt;/td&gt;
      &lt;td&gt;davis&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Caucasian&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Female First Names&lt;/td&gt;
      &lt;td&gt;Female Last Names&lt;/td&gt;
      &lt;td&gt;Male  First Names&lt;/td&gt;
      &lt;td&gt;Male Last Names&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jennifer&lt;/td&gt;
      &lt;td&gt;smith&lt;/td&gt;
      &lt;td&gt;michael&lt;/td&gt;
      &lt;td&gt;johnson&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;amanda&lt;/td&gt;
      &lt;td&gt;brown&lt;/td&gt;
      &lt;td&gt;james&lt;/td&gt;
      &lt;td&gt;rodriguez&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;kimberly&lt;/td&gt;
      &lt;td&gt;williams&lt;/td&gt;
      &lt;td&gt;robert&lt;/td&gt;
      &lt;td&gt;davis&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jessica&lt;/td&gt;
      &lt;td&gt;miller&lt;/td&gt;
      &lt;td&gt;david&lt;/td&gt;
      &lt;td&gt;jones&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ashley&lt;/td&gt;
      &lt;td&gt;johnson&lt;/td&gt;
      &lt;td&gt;john&lt;/td&gt;
      &lt;td&gt;brown&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Hispanic&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Female  First Names&lt;/td&gt;
      &lt;td&gt;Female Last Names&lt;/td&gt;
      &lt;td&gt;Male First Names&lt;/td&gt;
      &lt;td&gt;Male  Last Names&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria&lt;/td&gt;
      &lt;td&gt;rodriguez&lt;/td&gt;
      &lt;td&gt;jose&lt;/td&gt;
      &lt;td&gt;rodriguez&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;melissa&lt;/td&gt;
      &lt;td&gt;gonzalez&lt;/td&gt;
      &lt;td&gt;juan&lt;/td&gt;
      &lt;td&gt;garcia&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jennifer&lt;/td&gt;
      &lt;td&gt;rivera&lt;/td&gt;
      &lt;td&gt;luis&lt;/td&gt;
      &lt;td&gt;martinez&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gloria&lt;/td&gt;
      &lt;td&gt;perez&lt;/td&gt;
      &lt;td&gt;carlos&lt;/td&gt;
      &lt;td&gt;rivera&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;elizabeth&lt;/td&gt;
      &lt;td&gt;garcia&lt;/td&gt;
      &lt;td&gt;jorge&lt;/td&gt;
      &lt;td&gt;hernandez&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Indian&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Female First Names&lt;/td&gt;
      &lt;td&gt;Female Last Names&lt;/td&gt;
      &lt;td&gt;Male First Names&lt;/td&gt;
      &lt;td&gt;Male Last Names&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;smt&lt;/td&gt;
      &lt;td&gt;devi&lt;/td&gt;
      &lt;td&gt;deepak&lt;/td&gt;
      &lt;td&gt;kumar&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pooja&lt;/td&gt;
      &lt;td&gt;pooja&lt;/td&gt;
      &lt;td&gt;rahul&lt;/td&gt;
      &lt;td&gt;singh&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;smt.&lt;/td&gt;
      &lt;td&gt;kumari&lt;/td&gt;
      &lt;td&gt;amit&lt;/td&gt;
      &lt;td&gt;sharma&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jyoti&lt;/td&gt;
      &lt;td&gt;jyoti&lt;/td&gt;
      &lt;td&gt;ram&lt;/td&gt;
      &lt;td&gt;lal&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;kumari&lt;/td&gt;
      &lt;td&gt;bai&lt;/td&gt;
      &lt;td&gt;sanjay&lt;/td&gt;
      &lt;td&gt;ram&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h5 id=&quot;name-length-distributions&quot;&gt;Name Length Distributions&lt;/h5&gt;

&lt;p&gt;The name length distributions are next plotted for each of the races. It seems like short names (perhaps without a surname)
are popular among Indians, giving rise to the minor mode in the distribution. Hispanic names tend to be longer, as indicated
by the fat tail following the mean. 15-ish seems to be the most popular name length across the races (here’s one thing
you can take away from the post).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/african_american_name_len_dist.png&quot; alt=&quot;African American Names&quot; /&gt; &lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/caucasian_name_len_dist.png&quot; alt=&quot;Caucasian&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/hispanic_name_len_dist.png&quot; alt=&quot;Hispanic&quot; /&gt; &lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/indian_name_len_dist.png&quot; alt=&quot;Indian&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;input-representation&quot;&gt;Input Representation&lt;/h2&gt;

&lt;p&gt;In this section, we will spend some time looking at how do we take a bunch of these names and convert them into a form
that can be fed to an RNN. We will get to that representation in three steps: encoding, standardization, and
embeddings.&lt;/p&gt;

&lt;h4 id=&quot;a-encoding&quot;&gt;a) Encoding&lt;/h4&gt;

&lt;p&gt;We convert each character in a (string) name to a number using the following mapping:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Character&lt;/th&gt;
      &lt;th&gt;Encoding&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;a-z&lt;/td&gt;
      &lt;td&gt;0-25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;” “ (Space)&lt;/td&gt;
      &lt;td&gt;26&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;”#” (End of Name)&lt;/td&gt;
      &lt;td&gt;27&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;. (Invalid Character)&lt;/td&gt;
      &lt;td&gt;28&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;All the names are converted to lowercase English alphabets, with a space separating different components of a name. Every
name ends with a special &lt;em&gt;name end character (“#”)&lt;/em&gt;. Every other character is mapped to a “.”, an invalid character substitute. 
For example, “joe” would be converted to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[9 14 4 27]&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id=&quot;b-standardization&quot;&gt;b) Standardization&lt;/h4&gt;

&lt;p&gt;Name lengths are anything but a constant, as seen from the length distributions. However, we are using &lt;a href=&quot;http://colah.github.io/posts/2015-08-Understanding-LSTMs/&quot;&gt;“unrolled” RNNs&lt;/a&gt;,
and thus we need to fix on a maximum name length. Names longer than the maximum name length will be truncated, and names
shorter than the maximum name length will be padded with “.” (invalid character). As discussed, we assume that every name ends
with a “#,” the name end character.  All of this happens in the following piece of code:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;encode_and_standardize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME_END&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#add the end of the name symbol for everyname
&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#encode
&lt;/span&gt;   &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name_len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_name_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;truncated_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_name_length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;truncated_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;char_to_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME_END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#must attach the name end
&lt;/span&gt;       &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;truncated_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;padded_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_name_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;padded_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fill&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INVALID_CHAR_CLASS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;padded_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name_len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;padded_name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that we retain the name end character (#) even after the truncation.&lt;/p&gt;

&lt;p&gt;So how do we arrive at the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_name_length&lt;/code&gt;? A simple way to do that is fix on a large number, like 100. However, that
would mean that our network will be wider than we perhaps want (most of the names will be smaller than 100 characters). This
will lead to lots of wasted computation and slower training times (&lt;a href=&quot;https://github.com/madaan/char-rnn-names/blob/master/src/features/char_codec.py#L13&quot;&gt;give it a try!&lt;/a&gt;).
Or, we could plot a distribution of the name lengths and pick something simple. We’ve already done that, and as you
can see, it seems like 25 will cover most of the cases, and we add one space to accommodate the name end marker, “#.” Thus,
all the names are standardized to length 26. At this point, the string name “joe” has become an array of 26 numbers, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[ 9 14 4 27 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28].&lt;/code&gt;&lt;/p&gt;

&lt;h4 id=&quot;c-embeddings&quot;&gt;c) Embeddings&lt;/h4&gt;

&lt;p&gt;So far, we have standardized each name to a fixed length, added a character to mark the end of the name, and encoded the
name from an array of chars to an array of integers.&lt;/p&gt;

&lt;p&gt;If you don’t know what embeddings are at all, I recommend checking out &lt;a href=&quot;https://www.tensorflow.org/programmers_guide/embedding&quot;&gt;this&lt;/a&gt; or &lt;a href=&quot;https://deeplearning4j.org/word2vec.html&quot;&gt;this&lt;/a&gt; link.&lt;/p&gt;

&lt;p&gt;The tl;dr of the technique is that each of the characters is mapped to an array of numbers. The array is called an
embedding, and the length of the array is the dimensionality of the embedding. For example, if we choose to use 5-dimensional
embeddings, it just means that every character is mapped to a 5-dimensional real vector. We can start with pre-trained
embeddings, or learn them as part of the training process, which is what our model does. In our setting, that means that
we will hopefully learn embeddings that makes it easier for us to predict the next character in the name.
With 5-dimensional embeddings, “joe” will now be a matrix of 26 rows and five columns.&lt;/p&gt;

&lt;p&gt;The following three lines of code is all that’s needed add embeddings. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;embeddings&lt;/code&gt; is a matrix, which has one row for 
each character in our vocabulary. Concretely, if we had used 5-dimensional embeddings, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;embeddings&lt;/code&gt; matrix would
have the dimensions &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;29 x 5&lt;/code&gt;; each of the characters would have a corresponding array of length 5. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tf.nn.embedding_lookup&lt;/code&gt;
takes in the input, which is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch_size x max_name_length&lt;/code&gt;, and maps (&lt;em&gt;looks up&lt;/em&gt;) each character in the standardized
name to an embedding, to yield an input with dimensions &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch_size x max_name_length x n_embeddings&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-py highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;placeholder&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_name_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;input&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;embeddings&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Variable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random_uniform&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vocab_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n_embeddings&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;embedded_names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;embedding_lookup&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;embeddings&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;#(?, max_name_length, n_embeddings)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Tensorboard can help in &lt;a href=&quot;https://www.tensorflow.org/versions/r1.1/get_started/embedding_viz&quot;&gt;visualizing embeddings&lt;/a&gt; using
PCA and t-SNE. The t-SNE visualization of the embedding matrix from the Indian name generator is as follows. As you can
see, the vowels are all close to each other, which hints at the fact that the learned embeddings wrap some linguistic 
properties of the names pertaining to the race, and are perhaps useful in generating new ones.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/embedding.png&quot; alt=&quot;Embeddings&quot; /&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;em&gt;Embeddings learned by the Indian Name Generator visualized using Tensorboard&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;

&lt;p&gt;The particular form of RNNs that we use for this exercise is an LSTM. &lt;a href=&quot;http://shop.oreilly.com/product/0636920052289.do&quot;&gt;This book&lt;/a&gt;
has a pretty good explanation of the LSTMs, and &lt;a href=&quot;http://colah.github.io/posts/2015-08-Understanding-LSTMs/&quot;&gt;this neat blog post&lt;/a&gt;
is another standard reference for the topic. A high-level overview of the model follows. Each character in the normalized
name is converted to the corresponding embedding vector, and the entire input name becomes a matrix or name embeddings.
The name embeddings are then fed to the first LSTM in the stack. The output from this first LSTM is fed to a second LSTM.
The second LSTM is connected to a dense layer, which emits a logits vector of length 29. An argmax over the logits vector
is used to calculate the loss and generation. A diagram of the computation graph generated by Tensorboard is as follows:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/model.png&quot; alt=&quot;Model Architecture&quot; /&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;em&gt;The overall model setup generated by tensorboard. The input Embeddings are fed to stacked LSTMs, which in turn feed to   a dense layer&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;An instance of the model with made up numbers is shown below. It’s a replica of &lt;a href=&quot;http://karpathy.github.io/assets/rnn/charseq.jpeg&quot;&gt;this diagram&lt;/a&gt; from &lt;a href=&quot;http://karpathy.github.io/2015/05/21/rnn-effectiveness/&quot;&gt;this blog&lt;/a&gt; I’ve already linked to.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/model_example.png&quot; alt=&quot;Model Example&quot; /&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;&lt;em&gt;A sample instance of the model. The input name is “Amy#” (# being the end of the name character). At each step, the network is expected to predict the next character. Thus, at step 1, the expected output is “M”, the 2nd character. At step 2, the expected output becomes the 3rd character, Y. As explained in the encoding section, the characters after the “#”, “.”, are invalid characters added for padding.&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;transferring-name-styles&quot;&gt;Transferring Name Styles&lt;/h2&gt;

&lt;p&gt;Since we have discussed a lot, let’s quickly recap before moving ahead. Some of the following may seem to be a repeat from
the introduction because it sort of is.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Dataset: We have a dataset of names from different races.&lt;/li&gt;
  &lt;li&gt;Generator: We have discussed a model that can be trained to predict the next character given a sequence of characters.
Let’s call this model the &lt;strong&gt;generator&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We train one generator per dataset. Thus, we have a model that has learned to predict the next few characters in an
Indian name given the first few, and so on. We then seed each of these generators with a few characters from a name,
say “Der” from “Derek,” and compare the results.&lt;/p&gt;

&lt;p&gt;The prediction process is illustrated in the figure below, followed by the code.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/madaan/char-rnn-names/master/docs/prediction_process.png&quot; alt=&quot;Prediction Process&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Code:&lt;/p&gt;

&lt;div class=&quot;language-py highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seed&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;initial_seed_offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_name_length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;feats&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;encode_and_standardize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reshape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_name_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sess&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feed_dict&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;feats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;initial_seed_offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CharCodec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;NAME_END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
&lt;p&gt;The results are compiled in the following table. The first column is the name, the second the seed used as an initial
input to the model. The subsequent columns list the names generated by the different generators using the given seed.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;seed&lt;/th&gt;
      &lt;th&gt;african_american&lt;/th&gt;
      &lt;th&gt;caucasian&lt;/th&gt;
      &lt;th&gt;hispanic&lt;/th&gt;
      &lt;th&gt;indian&lt;/th&gt;
      &lt;th&gt;all_races&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang wei&lt;/td&gt;
      &lt;td&gt;zhan&lt;/td&gt;
      &lt;td&gt;zhankhea l stencor#&lt;/td&gt;
      &lt;td&gt;zhane a nelson#&lt;/td&gt;
      &lt;td&gt;zhanole s estrada#&lt;/td&gt;
      &lt;td&gt;zhanna sankar#&lt;/td&gt;
      &lt;td&gt;zhanson e martin#&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In the above example, the seed used is “zhan,” from the Chinese name “zhang wei.” All the networks take this tricky seed
and generate a name that finally &lt;em&gt;looks&lt;/em&gt; like a name from the given model. For example, the African-American generator 
yields “zhankhea l stencor” (the “#” is the end of the name marker), the caucasian generator yields “zhane a nelson” and
so on.&lt;/p&gt;

&lt;p&gt;Results from &lt;a href=&quot;https://github.com/madaan/char-rnn-names/blob/master/data/test.txt&quot;&gt;this test file&lt;/a&gt; are as follows. I hope 
the post was useful, please feel free to contact me for any questions/comments at amn.madaan@gmail.com.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;name&lt;/th&gt;
      &lt;th&gt;seed&lt;/th&gt;
      &lt;th&gt;african_american&lt;/th&gt;
      &lt;th&gt;caucasian&lt;/th&gt;
      &lt;th&gt;hispanic&lt;/th&gt;
      &lt;th&gt;indian&lt;/th&gt;
      &lt;th&gt;all_races&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;derek vroom&lt;/td&gt;
      &lt;td&gt;de&lt;/td&gt;
      &lt;td&gt;derrick l curtis#&lt;/td&gt;
      &lt;td&gt;dennis r coleman#&lt;/td&gt;
      &lt;td&gt;dennis r suro#&lt;/td&gt;
      &lt;td&gt;deepak . pardeep#&lt;/td&gt;
      &lt;td&gt;derrick l brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;derek vroom&lt;/td&gt;
      &lt;td&gt;derek&lt;/td&gt;
      &lt;td&gt;derek a clark#&lt;/td&gt;
      &lt;td&gt;derek a hardellin#&lt;/td&gt;
      &lt;td&gt;derek m sanchez#&lt;/td&gt;
      &lt;td&gt;derekha  bhai#&lt;/td&gt;
      &lt;td&gt;derek a carter#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;derek vroom&lt;/td&gt;
      &lt;td&gt;derek vr&lt;/td&gt;
      &lt;td&gt;derek vright#&lt;/td&gt;
      &lt;td&gt;derek vraworit#&lt;/td&gt;
      &lt;td&gt;derek vrkilla.carcial#&lt;/td&gt;
      &lt;td&gt;derek vrasha#&lt;/td&gt;
      &lt;td&gt;derek vrow#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;dan fineman&lt;/td&gt;
      &lt;td&gt;da&lt;/td&gt;
      &lt;td&gt;david l morgan#&lt;/td&gt;
      &lt;td&gt;david l mccoy#&lt;/td&gt;
      &lt;td&gt;daniel carachure#&lt;/td&gt;
      &lt;td&gt;dalip shah#&lt;/td&gt;
      &lt;td&gt;david l brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;dan fineman&lt;/td&gt;
      &lt;td&gt;dan f&lt;/td&gt;
      &lt;td&gt;dan f briggs#&lt;/td&gt;
      &lt;td&gt;dan f witharr#&lt;/td&gt;
      &lt;td&gt;dan flekrez#&lt;/td&gt;
      &lt;td&gt;dan farjat saini#&lt;/td&gt;
      &lt;td&gt;dan francersiii#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;dan fineman&lt;/td&gt;
      &lt;td&gt;dan fine&lt;/td&gt;
      &lt;td&gt;dan finette tyrel#&lt;/td&gt;
      &lt;td&gt;dan fines#&lt;/td&gt;
      &lt;td&gt;dan finera.garcia#&lt;/td&gt;
      &lt;td&gt;dan fine#&lt;/td&gt;
      &lt;td&gt;dan fine#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;carolyn kennedy&lt;/td&gt;
      &lt;td&gt;car&lt;/td&gt;
      &lt;td&gt;carlos e jr robinson#&lt;/td&gt;
      &lt;td&gt;carlos r reyes#&lt;/td&gt;
      &lt;td&gt;carlos a guerra#&lt;/td&gt;
      &lt;td&gt;carta#&lt;/td&gt;
      &lt;td&gt;carlos a cardona#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;carolyn kennedy&lt;/td&gt;
      &lt;td&gt;carolyn&lt;/td&gt;
      &lt;td&gt;carolyn r miller#&lt;/td&gt;
      &lt;td&gt;carolyn a roe#&lt;/td&gt;
      &lt;td&gt;carolyno  jr rodriguez#&lt;/td&gt;
      &lt;td&gt;carolynir#&lt;/td&gt;
      &lt;td&gt;carolyn r martinez#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;carolyn kennedy&lt;/td&gt;
      &lt;td&gt;carolyn ken&lt;/td&gt;
      &lt;td&gt;carolyn kendry#&lt;/td&gt;
      &lt;td&gt;carolyn kendrick#&lt;/td&gt;
      &lt;td&gt;carolyn kendeles.rodriguez&lt;/td&gt;
      &lt;td&gt;carolyn kent#&lt;/td&gt;
      &lt;td&gt;carolyn kent#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;michael zuckerberg&lt;/td&gt;
      &lt;td&gt;mich&lt;/td&gt;
      &lt;td&gt;michael a chapman#&lt;/td&gt;
      &lt;td&gt;michael a mckinney#&lt;/td&gt;
      &lt;td&gt;michael a martinez#&lt;/td&gt;
      &lt;td&gt;michael#&lt;/td&gt;
      &lt;td&gt;michael a mccormick#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;michael zuckerberg&lt;/td&gt;
      &lt;td&gt;michael z&lt;/td&gt;
      &lt;td&gt;michael z chatman#&lt;/td&gt;
      &lt;td&gt;michael z baker#&lt;/td&gt;
      &lt;td&gt;michael z martinez#&lt;/td&gt;
      &lt;td&gt;michael zah#&lt;/td&gt;
      &lt;td&gt;michael z russell#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;michael zuckerberg&lt;/td&gt;
      &lt;td&gt;michael zucke&lt;/td&gt;
      &lt;td&gt;michael zuckenne#&lt;/td&gt;
      &lt;td&gt;michael zuckellan#&lt;/td&gt;
      &lt;td&gt;michael zucke#&lt;/td&gt;
      &lt;td&gt;michael zuckena#&lt;/td&gt;
      &lt;td&gt;michael zuckelli#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gary yao&lt;/td&gt;
      &lt;td&gt;ga&lt;/td&gt;
      &lt;td&gt;gary l hayes#&lt;/td&gt;
      &lt;td&gt;gary w jr davis#&lt;/td&gt;
      &lt;td&gt;gabriel a romero#&lt;/td&gt;
      &lt;td&gt;gaurav sharam#&lt;/td&gt;
      &lt;td&gt;gary l baker#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gary yao&lt;/td&gt;
      &lt;td&gt;gary&lt;/td&gt;
      &lt;td&gt;gary l hayes#&lt;/td&gt;
      &lt;td&gt;gary w jr davis#&lt;/td&gt;
      &lt;td&gt;gary avila#&lt;/td&gt;
      &lt;td&gt;gary#&lt;/td&gt;
      &lt;td&gt;gary l baker#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gary yao&lt;/td&gt;
      &lt;td&gt;gary y&lt;/td&gt;
      &lt;td&gt;gary y jr brown#&lt;/td&gt;
      &lt;td&gt;gary yancey#&lt;/td&gt;
      &lt;td&gt;gary y avares#&lt;/td&gt;
      &lt;td&gt;gary yadav#&lt;/td&gt;
      &lt;td&gt;gary yadav#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pranay chopra&lt;/td&gt;
      &lt;td&gt;pra&lt;/td&gt;
      &lt;td&gt;pradel lucas#&lt;/td&gt;
      &lt;td&gt;praie r saliska#&lt;/td&gt;
      &lt;td&gt;pramie rivera#&lt;/td&gt;
      &lt;td&gt;pramod kumar tiwari#&lt;/td&gt;
      &lt;td&gt;pramod kumar tiwari#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pranay chopra&lt;/td&gt;
      &lt;td&gt;pranay&lt;/td&gt;
      &lt;td&gt;pranay s mccarter#&lt;/td&gt;
      &lt;td&gt;pranayoos n santana#&lt;/td&gt;
      &lt;td&gt;pranay a gonzalez#&lt;/td&gt;
      &lt;td&gt;pranayavara bijgut#&lt;/td&gt;
      &lt;td&gt;pranayan devi#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;pranay chopra&lt;/td&gt;
      &lt;td&gt;pranay ch&lt;/td&gt;
      &lt;td&gt;pranay charlon#&lt;/td&gt;
      &lt;td&gt;pranay chess#&lt;/td&gt;
      &lt;td&gt;pranay chaveda#&lt;/td&gt;
      &lt;td&gt;pranay chawla#&lt;/td&gt;
      &lt;td&gt;pranay chand#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mayank bhardwaj&lt;/td&gt;
      &lt;td&gt;may&lt;/td&gt;
      &lt;td&gt;maydreona l miles#&lt;/td&gt;
      &lt;td&gt;maykel c calderon#&lt;/td&gt;
      &lt;td&gt;maynor padilla.estrada#&lt;/td&gt;
      &lt;td&gt;maya kumari#&lt;/td&gt;
      &lt;td&gt;maya devi#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mayank bhardwaj&lt;/td&gt;
      &lt;td&gt;mayank&lt;/td&gt;
      &lt;td&gt;mayank redison#&lt;/td&gt;
      &lt;td&gt;mayank j romero#&lt;/td&gt;
      &lt;td&gt;mayank h rodriguez#&lt;/td&gt;
      &lt;td&gt;mayank .aakash#&lt;/td&gt;
      &lt;td&gt;mayank pandey#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mayank bhardwaj&lt;/td&gt;
      &lt;td&gt;mayank bhar&lt;/td&gt;
      &lt;td&gt;mayank bharder#&lt;/td&gt;
      &lt;td&gt;mayank bharresse#&lt;/td&gt;
      &lt;td&gt;mayank bharque#&lt;/td&gt;
      &lt;td&gt;mayank bhardwaj#&lt;/td&gt;
      &lt;td&gt;mayank bhardwaj#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;aman madaan&lt;/td&gt;
      &lt;td&gt;am&lt;/td&gt;
      &lt;td&gt;amos e robinson#&lt;/td&gt;
      &lt;td&gt;amanda l hatcher#&lt;/td&gt;
      &lt;td&gt;amalio rivera#&lt;/td&gt;
      &lt;td&gt;amit kumar singh#&lt;/td&gt;
      &lt;td&gt;amanda l hutchinson#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;aman madaan&lt;/td&gt;
      &lt;td&gt;aman&lt;/td&gt;
      &lt;td&gt;aman j joseph#&lt;/td&gt;
      &lt;td&gt;aman p jr marshall#&lt;/td&gt;
      &lt;td&gt;aman f rivera#&lt;/td&gt;
      &lt;td&gt;aman saroj#&lt;/td&gt;
      &lt;td&gt;aman sharma#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;aman madaan&lt;/td&gt;
      &lt;td&gt;aman mad&lt;/td&gt;
      &lt;td&gt;aman madadenis#&lt;/td&gt;
      &lt;td&gt;aman madich#&lt;/td&gt;
      &lt;td&gt;aman madro l gonzalez#&lt;/td&gt;
      &lt;td&gt;aman madhkaran#&lt;/td&gt;
      &lt;td&gt;aman madha#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;priyanka lingawal&lt;/td&gt;
      &lt;td&gt;priy&lt;/td&gt;
      &lt;td&gt;priy loveranc#&lt;/td&gt;
      &lt;td&gt;priye j mergtone#&lt;/td&gt;
      &lt;td&gt;priy g martinez#&lt;/td&gt;
      &lt;td&gt;priyanka . pinki#&lt;/td&gt;
      &lt;td&gt;priyanka . pinki#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;priyanka lingawal&lt;/td&gt;
      &lt;td&gt;priyanka&lt;/td&gt;
      &lt;td&gt;priyankauguson#&lt;/td&gt;
      &lt;td&gt;priyanka l monagan#&lt;/td&gt;
      &lt;td&gt;priyankaune#&lt;/td&gt;
      &lt;td&gt;priyanka . pinki#&lt;/td&gt;
      &lt;td&gt;priyanka . pinki#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;priyanka lingawal&lt;/td&gt;
      &lt;td&gt;priyanka lin&lt;/td&gt;
      &lt;td&gt;priyanka linkley#&lt;/td&gt;
      &lt;td&gt;priyanka linto#&lt;/td&gt;
      &lt;td&gt;priyanka linorriz#&lt;/td&gt;
      &lt;td&gt;priyanka linditay#&lt;/td&gt;
      &lt;td&gt;priyanka linda#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;uma sawant&lt;/td&gt;
      &lt;td&gt;um&lt;/td&gt;
      &lt;td&gt;umar a hassan#&lt;/td&gt;
      &lt;td&gt;umid r mann#&lt;/td&gt;
      &lt;td&gt;umeliar m sanchez#&lt;/td&gt;
      &lt;td&gt;uma . barkha#&lt;/td&gt;
      &lt;td&gt;uma devi#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;uma sawant&lt;/td&gt;
      &lt;td&gt;uma s&lt;/td&gt;
      &lt;td&gt;uma stimer l jenkins#&lt;/td&gt;
      &lt;td&gt;uma s vancey#&lt;/td&gt;
      &lt;td&gt;uma s muniz#&lt;/td&gt;
      &lt;td&gt;uma shankar soni#&lt;/td&gt;
      &lt;td&gt;uma shankar#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;uma sawant&lt;/td&gt;
      &lt;td&gt;uma saw&lt;/td&gt;
      &lt;td&gt;uma sawrm crow#&lt;/td&gt;
      &lt;td&gt;uma sawyornig#&lt;/td&gt;
      &lt;td&gt;uma sawillas#&lt;/td&gt;
      &lt;td&gt;uma sawan#&lt;/td&gt;
      &lt;td&gt;uma saweer#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ankur dewan&lt;/td&gt;
      &lt;td&gt;an&lt;/td&gt;
      &lt;td&gt;anthony l mcclain#&lt;/td&gt;
      &lt;td&gt;anthony r horn#&lt;/td&gt;
      &lt;td&gt;antonio r hernandez#&lt;/td&gt;
      &lt;td&gt;anil kumar sain#&lt;/td&gt;
      &lt;td&gt;anthony j perry#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ankur dewan&lt;/td&gt;
      &lt;td&gt;ankur&lt;/td&gt;
      &lt;td&gt;ankurick reed#&lt;/td&gt;
      &lt;td&gt;ankur m davis#&lt;/td&gt;
      &lt;td&gt;ankur mondales#&lt;/td&gt;
      &lt;td&gt;ankur kumar#&lt;/td&gt;
      &lt;td&gt;ankur sharma#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ankur dewan&lt;/td&gt;
      &lt;td&gt;ankur de&lt;/td&gt;
      &lt;td&gt;ankur dewin#&lt;/td&gt;
      &lt;td&gt;ankur dehalf#&lt;/td&gt;
      &lt;td&gt;ankur delapamill#&lt;/td&gt;
      &lt;td&gt;ankur devi#&lt;/td&gt;
      &lt;td&gt;ankur devi#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;rahul gupta&lt;/td&gt;
      &lt;td&gt;ra&lt;/td&gt;
      &lt;td&gt;raymond l hopps#&lt;/td&gt;
      &lt;td&gt;randall l rice#&lt;/td&gt;
      &lt;td&gt;ramon a feliciano#&lt;/td&gt;
      &lt;td&gt;ram sagar#&lt;/td&gt;
      &lt;td&gt;raymond l holland#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;rahul gupta&lt;/td&gt;
      &lt;td&gt;rahul&lt;/td&gt;
      &lt;td&gt;rahuleshor c robbin#&lt;/td&gt;
      &lt;td&gt;rahul a jr gonzalez#&lt;/td&gt;
      &lt;td&gt;rahul g martinez#&lt;/td&gt;
      &lt;td&gt;rahul sharma#&lt;/td&gt;
      &lt;td&gt;rahul sharma#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;rahul gupta&lt;/td&gt;
      &lt;td&gt;rahul gu&lt;/td&gt;
      &lt;td&gt;rahul gudert#&lt;/td&gt;
      &lt;td&gt;rahul guenoa#&lt;/td&gt;
      &lt;td&gt;rahul gutineza#&lt;/td&gt;
      &lt;td&gt;rahul gupta#&lt;/td&gt;
      &lt;td&gt;rahul gupta#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang wei&lt;/td&gt;
      &lt;td&gt;zh&lt;/td&gt;
      &lt;td&gt;zhivago rodgers#&lt;/td&gt;
      &lt;td&gt;zhenya v deckarri#&lt;/td&gt;
      &lt;td&gt;zhecory colinza#&lt;/td&gt;
      &lt;td&gt;zhini devi#&lt;/td&gt;
      &lt;td&gt;zhenya santana#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang wei&lt;/td&gt;
      &lt;td&gt;zhan&lt;/td&gt;
      &lt;td&gt;zhankhea l stencor#&lt;/td&gt;
      &lt;td&gt;zhane a nelson#&lt;/td&gt;
      &lt;td&gt;zhanole s estrada#&lt;/td&gt;
      &lt;td&gt;zhanna sankar#&lt;/td&gt;
      &lt;td&gt;zhanson e martin#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang wei&lt;/td&gt;
      &lt;td&gt;zhang&lt;/td&gt;
      &lt;td&gt;zhang m forbeson#&lt;/td&gt;
      &lt;td&gt;zhang j henderson#&lt;/td&gt;
      &lt;td&gt;zhang g diaz.hernandez#&lt;/td&gt;
      &lt;td&gt;zhang depo#&lt;/td&gt;
      &lt;td&gt;zhang r johnson#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;wang xiu ying&lt;/td&gt;
      &lt;td&gt;wan&lt;/td&gt;
      &lt;td&gt;wanda j malone#&lt;/td&gt;
      &lt;td&gt;wanda j dentoscionle#&lt;/td&gt;
      &lt;td&gt;wanda rodriguez#&lt;/td&gt;
      &lt;td&gt;wani bhavi#&lt;/td&gt;
      &lt;td&gt;wanda j davis#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;wang xiu ying&lt;/td&gt;
      &lt;td&gt;wang x&lt;/td&gt;
      &lt;td&gt;wang x jamison#&lt;/td&gt;
      &lt;td&gt;wang x bailey#&lt;/td&gt;
      &lt;td&gt;wang x rodriguez#&lt;/td&gt;
      &lt;td&gt;wang xai d.o lageh ri par#&lt;/td&gt;
      &lt;td&gt;wang xiluseiia mond#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;wang xiu ying&lt;/td&gt;
      &lt;td&gt;wang xiu&lt;/td&gt;
      &lt;td&gt;wang xiu douglas#&lt;/td&gt;
      &lt;td&gt;wang xiu r mccall#&lt;/td&gt;
      &lt;td&gt;wang xiu castillo#&lt;/td&gt;
      &lt;td&gt;wang xiu shaker#&lt;/td&gt;
      &lt;td&gt;wang xiu santing#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hideyoshi&lt;/td&gt;
      &lt;td&gt;hi&lt;/td&gt;
      &lt;td&gt;hillard j london#&lt;/td&gt;
      &lt;td&gt;hilario m gomez#&lt;/td&gt;
      &lt;td&gt;hiram a maldonado#&lt;/td&gt;
      &lt;td&gt;himanshu . annu#&lt;/td&gt;
      &lt;td&gt;himanshu . annu#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hideyoshi&lt;/td&gt;
      &lt;td&gt;hide&lt;/td&gt;
      &lt;td&gt;hideajrake redmon#&lt;/td&gt;
      &lt;td&gt;hidel f martinez#&lt;/td&gt;
      &lt;td&gt;hidel garcia#&lt;/td&gt;
      &lt;td&gt;hidee chaudhry#&lt;/td&gt;
      &lt;td&gt;hidena#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hideyoshi&lt;/td&gt;
      &lt;td&gt;hideyo&lt;/td&gt;
      &lt;td&gt;hideyon u bennett#&lt;/td&gt;
      &lt;td&gt;hideyo g morio#&lt;/td&gt;
      &lt;td&gt;hideyordo rodriguez#&lt;/td&gt;
      &lt;td&gt;hideyohar sharma#&lt;/td&gt;
      &lt;td&gt;hideyon d brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang xiu ying&lt;/td&gt;
      &lt;td&gt;zha&lt;/td&gt;
      &lt;td&gt;zharran e austin#&lt;/td&gt;
      &lt;td&gt;zhays j notz#&lt;/td&gt;
      &lt;td&gt;zhanole s estrada#&lt;/td&gt;
      &lt;td&gt;zhanna sankar#&lt;/td&gt;
      &lt;td&gt;zhayni essidion#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang xiu ying&lt;/td&gt;
      &lt;td&gt;zhang x&lt;/td&gt;
      &lt;td&gt;zhang x hall#&lt;/td&gt;
      &lt;td&gt;zhang x herty#&lt;/td&gt;
      &lt;td&gt;zhang x corrova#&lt;/td&gt;
      &lt;td&gt;zhang xeta#&lt;/td&gt;
      &lt;td&gt;zhang x i barraman#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;zhang xiu ying&lt;/td&gt;
      &lt;td&gt;zhang xiu&lt;/td&gt;
      &lt;td&gt;zhang xiu donner#&lt;/td&gt;
      &lt;td&gt;zhang xiu r brack#&lt;/td&gt;
      &lt;td&gt;zhang xiu cardenas#&lt;/td&gt;
      &lt;td&gt;zhang xiu w.o#&lt;/td&gt;
      &lt;td&gt;zhang xiu partadill#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hashimoto&lt;/td&gt;
      &lt;td&gt;ha&lt;/td&gt;
      &lt;td&gt;harry l jr caine#&lt;/td&gt;
      &lt;td&gt;harold d minor#&lt;/td&gt;
      &lt;td&gt;harold a cardenas.valenci#&lt;/td&gt;
      &lt;td&gt;harish chand dua#&lt;/td&gt;
      &lt;td&gt;harold l harris#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hashimoto&lt;/td&gt;
      &lt;td&gt;hash&lt;/td&gt;
      &lt;td&gt;hashantae walleggter#&lt;/td&gt;
      &lt;td&gt;hashia l sannero.ferzill.c&lt;/td&gt;
      &lt;td&gt;hashus rodriguez.gutierre#&lt;/td&gt;
      &lt;td&gt;hashim#&lt;/td&gt;
      &lt;td&gt;hashim#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hashimoto&lt;/td&gt;
      &lt;td&gt;hashim&lt;/td&gt;
      &lt;td&gt;hashimmela j slove#&lt;/td&gt;
      &lt;td&gt;hashim naviguenteras#&lt;/td&gt;
      &lt;td&gt;hashimaer m gonzalez.matha&lt;/td&gt;
      &lt;td&gt;hashim#&lt;/td&gt;
      &lt;td&gt;hashim#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;shinzo abe&lt;/td&gt;
      &lt;td&gt;sh&lt;/td&gt;
      &lt;td&gt;shantae d dennis#&lt;/td&gt;
      &lt;td&gt;shannon l bennett#&lt;/td&gt;
      &lt;td&gt;shawn d rodriguez#&lt;/td&gt;
      &lt;td&gt;shankar lal gopalani#&lt;/td&gt;
      &lt;td&gt;shane a brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;shinzo abe&lt;/td&gt;
      &lt;td&gt;shinz&lt;/td&gt;
      &lt;td&gt;shinzella b dowers#&lt;/td&gt;
      &lt;td&gt;shinzeyne a scharden#&lt;/td&gt;
      &lt;td&gt;shinz a medina#&lt;/td&gt;
      &lt;td&gt;shinz yadav#&lt;/td&gt;
      &lt;td&gt;shinza delvacy#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;shinzo abe&lt;/td&gt;
      &lt;td&gt;shinzo&lt;/td&gt;
      &lt;td&gt;shinzo  iii hardy#&lt;/td&gt;
      &lt;td&gt;shinzo a dejesunnell.igos#&lt;/td&gt;
      &lt;td&gt;shinzo d rodriguez#&lt;/td&gt;
      &lt;td&gt;shinzo . raju#&lt;/td&gt;
      &lt;td&gt;shinzo . ammo#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria guadalupe&lt;/td&gt;
      &lt;td&gt;mar&lt;/td&gt;
      &lt;td&gt;marcus d cook#&lt;/td&gt;
      &lt;td&gt;mark a beasley#&lt;/td&gt;
      &lt;td&gt;mario a martinez#&lt;/td&gt;
      &lt;td&gt;mariyam . sonam#&lt;/td&gt;
      &lt;td&gt;mark a barnes#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria guadalupe&lt;/td&gt;
      &lt;td&gt;maria g&lt;/td&gt;
      &lt;td&gt;maria green#&lt;/td&gt;
      &lt;td&gt;maria g pierce#&lt;/td&gt;
      &lt;td&gt;maria g serrano#&lt;/td&gt;
      &lt;td&gt;maria garg#&lt;/td&gt;
      &lt;td&gt;maria g castillo#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria guadalupe&lt;/td&gt;
      &lt;td&gt;maria guada&lt;/td&gt;
      &lt;td&gt;maria guada#&lt;/td&gt;
      &lt;td&gt;maria guadaguez#&lt;/td&gt;
      &lt;td&gt;maria guadal.pont#&lt;/td&gt;
      &lt;td&gt;maria guadal#&lt;/td&gt;
      &lt;td&gt;maria guada#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jose luis&lt;/td&gt;
      &lt;td&gt;jo&lt;/td&gt;
      &lt;td&gt;john l graham#&lt;/td&gt;
      &lt;td&gt;joseph a bernston#&lt;/td&gt;
      &lt;td&gt;jose a correa#&lt;/td&gt;
      &lt;td&gt;joyti sharma#&lt;/td&gt;
      &lt;td&gt;joseph a brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jose luis&lt;/td&gt;
      &lt;td&gt;jose&lt;/td&gt;
      &lt;td&gt;joseph l henry#&lt;/td&gt;
      &lt;td&gt;joseph a bernston#&lt;/td&gt;
      &lt;td&gt;jose a correa#&lt;/td&gt;
      &lt;td&gt;josender  sharma#&lt;/td&gt;
      &lt;td&gt;joseph a brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jose luis&lt;/td&gt;
      &lt;td&gt;jose l&lt;/td&gt;
      &lt;td&gt;jose l graham#&lt;/td&gt;
      &lt;td&gt;jose l ramirez#&lt;/td&gt;
      &lt;td&gt;jose l morales#&lt;/td&gt;
      &lt;td&gt;jose lal sharma#&lt;/td&gt;
      &lt;td&gt;jose l rodriguez#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;veronica&lt;/td&gt;
      &lt;td&gt;ve&lt;/td&gt;
      &lt;td&gt;vernon l hankerson#&lt;/td&gt;
      &lt;td&gt;vernon r locke#&lt;/td&gt;
      &lt;td&gt;veronica hernandez#&lt;/td&gt;
      &lt;td&gt;veer bhan#&lt;/td&gt;
      &lt;td&gt;vernon l brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;veronica&lt;/td&gt;
      &lt;td&gt;vero&lt;/td&gt;
      &lt;td&gt;veronica m butler#&lt;/td&gt;
      &lt;td&gt;veronica b jamison#&lt;/td&gt;
      &lt;td&gt;veronica hernandez#&lt;/td&gt;
      &lt;td&gt;veronika bhinder#&lt;/td&gt;
      &lt;td&gt;veronica l brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;veronica&lt;/td&gt;
      &lt;td&gt;veroni&lt;/td&gt;
      &lt;td&gt;veronica m butler#&lt;/td&gt;
      &lt;td&gt;veronica b jamison#&lt;/td&gt;
      &lt;td&gt;veronica hernandez#&lt;/td&gt;
      &lt;td&gt;veronika bhinder#&lt;/td&gt;
      &lt;td&gt;veronica l brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;juan carlos&lt;/td&gt;
      &lt;td&gt;ju&lt;/td&gt;
      &lt;td&gt;julius mcdaniels#&lt;/td&gt;
      &lt;td&gt;justin a moore#&lt;/td&gt;
      &lt;td&gt;juan c montalba#&lt;/td&gt;
      &lt;td&gt;julfi . sonu#&lt;/td&gt;
      &lt;td&gt;justin l barnette#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;juan carlos&lt;/td&gt;
      &lt;td&gt;juan&lt;/td&gt;
      &lt;td&gt;juan p laster#&lt;/td&gt;
      &lt;td&gt;juan c martinez#&lt;/td&gt;
      &lt;td&gt;juan c montalba#&lt;/td&gt;
      &lt;td&gt;juan morji#&lt;/td&gt;
      &lt;td&gt;juan c castillo#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;juan carlos&lt;/td&gt;
      &lt;td&gt;juan car&lt;/td&gt;
      &lt;td&gt;juan cardonas#&lt;/td&gt;
      &lt;td&gt;juan carranza#&lt;/td&gt;
      &lt;td&gt;juan carrillo#&lt;/td&gt;
      &lt;td&gt;juan carodhari#&lt;/td&gt;
      &lt;td&gt;juan cardona#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;rosa maria&lt;/td&gt;
      &lt;td&gt;ro&lt;/td&gt;
      &lt;td&gt;robert l moore#&lt;/td&gt;
      &lt;td&gt;robert l bresteane#&lt;/td&gt;
      &lt;td&gt;roberto c munoz rivera#&lt;/td&gt;
      &lt;td&gt;rohit agrawal#&lt;/td&gt;
      &lt;td&gt;robert l jr marshall#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;rosa maria&lt;/td&gt;
      &lt;td&gt;rosa&lt;/td&gt;
      &lt;td&gt;rosa a hobbs#&lt;/td&gt;
      &lt;td&gt;rosa m demerd#&lt;/td&gt;
      &lt;td&gt;rosa m torres#&lt;/td&gt;
      &lt;td&gt;rosa kalah#&lt;/td&gt;
      &lt;td&gt;rosa a johnson#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;rosa maria&lt;/td&gt;
      &lt;td&gt;rosa ma&lt;/td&gt;
      &lt;td&gt;rosa mannt#&lt;/td&gt;
      &lt;td&gt;rosa martinez#&lt;/td&gt;
      &lt;td&gt;rosa mata#&lt;/td&gt;
      &lt;td&gt;rosa mani#&lt;/td&gt;
      &lt;td&gt;rosa massey#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;francisco javier&lt;/td&gt;
      &lt;td&gt;fran&lt;/td&gt;
      &lt;td&gt;frank j brown#&lt;/td&gt;
      &lt;td&gt;frank e carballo#&lt;/td&gt;
      &lt;td&gt;francisco j mendez#&lt;/td&gt;
      &lt;td&gt;frana#&lt;/td&gt;
      &lt;td&gt;frank l jr bennett#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;francisco javier&lt;/td&gt;
      &lt;td&gt;francisc&lt;/td&gt;
      &lt;td&gt;francisco a nunez#&lt;/td&gt;
      &lt;td&gt;francisco gonzales#&lt;/td&gt;
      &lt;td&gt;francisco j mendez#&lt;/td&gt;
      &lt;td&gt;francischan#&lt;/td&gt;
      &lt;td&gt;francisco j mendez#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;francisco javier&lt;/td&gt;
      &lt;td&gt;francisco ja&lt;/td&gt;
      &lt;td&gt;francisco javen#&lt;/td&gt;
      &lt;td&gt;francisco jaza.torrez#&lt;/td&gt;
      &lt;td&gt;francisco jaigue#&lt;/td&gt;
      &lt;td&gt;francisco jain#&lt;/td&gt;
      &lt;td&gt;francisco jaramaz.ramos#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria elena&lt;/td&gt;
      &lt;td&gt;ma&lt;/td&gt;
      &lt;td&gt;marcus d cook#&lt;/td&gt;
      &lt;td&gt;mark a beasley#&lt;/td&gt;
      &lt;td&gt;mario a martinez#&lt;/td&gt;
      &lt;td&gt;manish kumar  mandal#&lt;/td&gt;
      &lt;td&gt;mark a barnes#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria elena&lt;/td&gt;
      &lt;td&gt;maria&lt;/td&gt;
      &lt;td&gt;maria murray#&lt;/td&gt;
      &lt;td&gt;maria l west#&lt;/td&gt;
      &lt;td&gt;maria g serrano#&lt;/td&gt;
      &lt;td&gt;mariam  mumtaz#&lt;/td&gt;
      &lt;td&gt;maria l lassinorino#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;maria elena&lt;/td&gt;
      &lt;td&gt;maria el&lt;/td&gt;
      &lt;td&gt;maria ellis#&lt;/td&gt;
      &lt;td&gt;maria elweow#&lt;/td&gt;
      &lt;td&gt;maria elaramdo#&lt;/td&gt;
      &lt;td&gt;maria elam#&lt;/td&gt;
      &lt;td&gt;maria eller#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;brianna&lt;/td&gt;
      &lt;td&gt;b&lt;/td&gt;
      &lt;td&gt;brandon m fleming#&lt;/td&gt;
      &lt;td&gt;brandon l holland#&lt;/td&gt;
      &lt;td&gt;bryan santana#&lt;/td&gt;
      &lt;td&gt;bharat bhushan#&lt;/td&gt;
      &lt;td&gt;brian k bradley#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;brianna&lt;/td&gt;
      &lt;td&gt;bri&lt;/td&gt;
      &lt;td&gt;brian k brickhound#&lt;/td&gt;
      &lt;td&gt;brian k saitham#&lt;/td&gt;
      &lt;td&gt;brian n acosta#&lt;/td&gt;
      &lt;td&gt;brij mohan thakur#&lt;/td&gt;
      &lt;td&gt;brian k bradley#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;brianna&lt;/td&gt;
      &lt;td&gt;brian&lt;/td&gt;
      &lt;td&gt;brian k brickhound#&lt;/td&gt;
      &lt;td&gt;brian k saitham#&lt;/td&gt;
      &lt;td&gt;brian n acosta#&lt;/td&gt;
      &lt;td&gt;brianti#&lt;/td&gt;
      &lt;td&gt;brian k bradley#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;chloe&lt;/td&gt;
      &lt;td&gt;c&lt;/td&gt;
      &lt;td&gt;charles e hall#&lt;/td&gt;
      &lt;td&gt;christopher m brannon#&lt;/td&gt;
      &lt;td&gt;carlos a guerra#&lt;/td&gt;
      &lt;td&gt;chander shekhar#&lt;/td&gt;
      &lt;td&gt;christopher m mcclendon#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;chloe&lt;/td&gt;
      &lt;td&gt;ch&lt;/td&gt;
      &lt;td&gt;charles e hall#&lt;/td&gt;
      &lt;td&gt;christopher m brannon#&lt;/td&gt;
      &lt;td&gt;christian rivera#&lt;/td&gt;
      &lt;td&gt;chander shekhar#&lt;/td&gt;
      &lt;td&gt;christopher m mcclendon#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;chloe&lt;/td&gt;
      &lt;td&gt;chl&lt;/td&gt;
      &lt;td&gt;chlente chasbin#&lt;/td&gt;
      &lt;td&gt;chloe p johnson#&lt;/td&gt;
      &lt;td&gt;chlistonae a funsez#&lt;/td&gt;
      &lt;td&gt;chlepal#&lt;/td&gt;
      &lt;td&gt;chletu s jr marshall#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;destiny&lt;/td&gt;
      &lt;td&gt;d&lt;/td&gt;
      &lt;td&gt;david l morgan#&lt;/td&gt;
      &lt;td&gt;david l mccoy#&lt;/td&gt;
      &lt;td&gt;daniel carachure#&lt;/td&gt;
      &lt;td&gt;deepak . pardeep#&lt;/td&gt;
      &lt;td&gt;david l brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;destiny&lt;/td&gt;
      &lt;td&gt;des&lt;/td&gt;
      &lt;td&gt;desmond t baker#&lt;/td&gt;
      &lt;td&gt;desiree l barnett#&lt;/td&gt;
      &lt;td&gt;dessie santiago#&lt;/td&gt;
      &lt;td&gt;desh raj#&lt;/td&gt;
      &lt;td&gt;desmond t harris#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;destiny&lt;/td&gt;
      &lt;td&gt;desti&lt;/td&gt;
      &lt;td&gt;destiny j rios#&lt;/td&gt;
      &lt;td&gt;destiny straton#&lt;/td&gt;
      &lt;td&gt;destin santiago#&lt;/td&gt;
      &lt;td&gt;desti#&lt;/td&gt;
      &lt;td&gt;destiny j brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jeremiah&lt;/td&gt;
      &lt;td&gt;je&lt;/td&gt;
      &lt;td&gt;jermaine l martin#&lt;/td&gt;
      &lt;td&gt;jeremy m morrison#&lt;/td&gt;
      &lt;td&gt;jesus m salinas#&lt;/td&gt;
      &lt;td&gt;jeet pal  .jitu#&lt;/td&gt;
      &lt;td&gt;jeremy l coleman#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jeremiah&lt;/td&gt;
      &lt;td&gt;jere&lt;/td&gt;
      &lt;td&gt;jeremy a lee#&lt;/td&gt;
      &lt;td&gt;jeremy m morrison#&lt;/td&gt;
      &lt;td&gt;jeremy o burgos#&lt;/td&gt;
      &lt;td&gt;jerender singh#&lt;/td&gt;
      &lt;td&gt;jeremy l coleman#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jeremiah&lt;/td&gt;
      &lt;td&gt;jeremi&lt;/td&gt;
      &lt;td&gt;jeremiah d preister#&lt;/td&gt;
      &lt;td&gt;jeremiah m conner#&lt;/td&gt;
      &lt;td&gt;jeremias v gomez#&lt;/td&gt;
      &lt;td&gt;jeremi  rana#&lt;/td&gt;
      &lt;td&gt;jeremiah d brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;josiah&lt;/td&gt;
      &lt;td&gt;j&lt;/td&gt;
      &lt;td&gt;james l caldwell#&lt;/td&gt;
      &lt;td&gt;joseph a bernston#&lt;/td&gt;
      &lt;td&gt;jose a correa#&lt;/td&gt;
      &lt;td&gt;jai kishan gupta#&lt;/td&gt;
      &lt;td&gt;joseph a brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;josiah&lt;/td&gt;
      &lt;td&gt;jos&lt;/td&gt;
      &lt;td&gt;joseph l henry#&lt;/td&gt;
      &lt;td&gt;joseph a bernston#&lt;/td&gt;
      &lt;td&gt;jose a correa#&lt;/td&gt;
      &lt;td&gt;josin jomon#&lt;/td&gt;
      &lt;td&gt;joseph a brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;josiah&lt;/td&gt;
      &lt;td&gt;josi&lt;/td&gt;
      &lt;td&gt;josiah m beachem#&lt;/td&gt;
      &lt;td&gt;josian montanez#&lt;/td&gt;
      &lt;td&gt;josie a rivera#&lt;/td&gt;
      &lt;td&gt;josin jomon#&lt;/td&gt;
      &lt;td&gt;josiah m martinez#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;undertaker&lt;/td&gt;
      &lt;td&gt;un&lt;/td&gt;
      &lt;td&gt;undray anderson#&lt;/td&gt;
      &lt;td&gt;uney s mendes#&lt;/td&gt;
      &lt;td&gt;unesto brito#&lt;/td&gt;
      &lt;td&gt;unknown . monu#&lt;/td&gt;
      &lt;td&gt;undrae m mcclellan#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;undertaker&lt;/td&gt;
      &lt;td&gt;under&lt;/td&gt;
      &lt;td&gt;underially r ivy#&lt;/td&gt;
      &lt;td&gt;underico cabez.garcia#&lt;/td&gt;
      &lt;td&gt;under p maldonado#&lt;/td&gt;
      &lt;td&gt;under jain#&lt;/td&gt;
      &lt;td&gt;underick j collins#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;undertaker&lt;/td&gt;
      &lt;td&gt;underta&lt;/td&gt;
      &lt;td&gt;undertall nix#&lt;/td&gt;
      &lt;td&gt;undertan starlir#&lt;/td&gt;
      &lt;td&gt;underta romero#&lt;/td&gt;
      &lt;td&gt;undertala#&lt;/td&gt;
      &lt;td&gt;undertayshawn king#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;yokozuna&lt;/td&gt;
      &lt;td&gt;yo&lt;/td&gt;
      &lt;td&gt;yolanda y kryger#&lt;/td&gt;
      &lt;td&gt;yosmanis a cruz#&lt;/td&gt;
      &lt;td&gt;yovanny z bautista#&lt;/td&gt;
      &lt;td&gt;yogesh chahar#&lt;/td&gt;
      &lt;td&gt;yogesh chandra jo#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;yokozuna&lt;/td&gt;
      &lt;td&gt;yoko&lt;/td&gt;
      &lt;td&gt;yokondrae o hurdolk#&lt;/td&gt;
      &lt;td&gt;yoko d chambers#&lt;/td&gt;
      &lt;td&gt;yokonnihua f seplano#&lt;/td&gt;
      &lt;td&gt;yoko#&lt;/td&gt;
      &lt;td&gt;yokoshania c carter#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;yokozuna&lt;/td&gt;
      &lt;td&gt;yokozu&lt;/td&gt;
      &lt;td&gt;yokozull bowell#&lt;/td&gt;
      &lt;td&gt;yokozuma c adams#&lt;/td&gt;
      &lt;td&gt;yokozua l uerda#&lt;/td&gt;
      &lt;td&gt;yokozuddina#&lt;/td&gt;
      &lt;td&gt;yokozua j harris#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;andre the giant&lt;/td&gt;
      &lt;td&gt;and&lt;/td&gt;
      &lt;td&gt;andre l arnold#&lt;/td&gt;
      &lt;td&gt;andrew j denny#&lt;/td&gt;
      &lt;td&gt;andres r guajardo#&lt;/td&gt;
      &lt;td&gt;andhav#&lt;/td&gt;
      &lt;td&gt;andrew j brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;andre the giant&lt;/td&gt;
      &lt;td&gt;andre t&lt;/td&gt;
      &lt;td&gt;andre t mccray#&lt;/td&gt;
      &lt;td&gt;andre t mccoy#&lt;/td&gt;
      &lt;td&gt;andre t salazar#&lt;/td&gt;
      &lt;td&gt;andre topta#&lt;/td&gt;
      &lt;td&gt;andre t morgan#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;andre the giant&lt;/td&gt;
      &lt;td&gt;andre the g&lt;/td&gt;
      &lt;td&gt;andre the gilmer#&lt;/td&gt;
      &lt;td&gt;andre the gorsom#&lt;/td&gt;
      &lt;td&gt;andre the garcia#&lt;/td&gt;
      &lt;td&gt;andre the gadar#&lt;/td&gt;
      &lt;td&gt;andre the getti#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;big show&lt;/td&gt;
      &lt;td&gt;bi&lt;/td&gt;
      &lt;td&gt;billy r jr collier#&lt;/td&gt;
      &lt;td&gt;billy j ball#&lt;/td&gt;
      &lt;td&gt;billy e caballero reyes#&lt;/td&gt;
      &lt;td&gt;birender kuma .r yadav#&lt;/td&gt;
      &lt;td&gt;billy j banks#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;big show&lt;/td&gt;
      &lt;td&gt;big&lt;/td&gt;
      &lt;td&gt;big a mitchell#&lt;/td&gt;
      &lt;td&gt;big p o carson#&lt;/td&gt;
      &lt;td&gt;big mariano#&lt;/td&gt;
      &lt;td&gt;big singh#&lt;/td&gt;
      &lt;td&gt;big nemi#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;big show&lt;/td&gt;
      &lt;td&gt;big sh&lt;/td&gt;
      &lt;td&gt;big shardley j jr atturwow&lt;/td&gt;
      &lt;td&gt;big shulte#&lt;/td&gt;
      &lt;td&gt;big shergorio luz#&lt;/td&gt;
      &lt;td&gt;big shan#&lt;/td&gt;
      &lt;td&gt;big sharma#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hulk hogan&lt;/td&gt;
      &lt;td&gt;hu&lt;/td&gt;
      &lt;td&gt;hubert l hunt#&lt;/td&gt;
      &lt;td&gt;hugh walthall#&lt;/td&gt;
      &lt;td&gt;humberto m malagon#&lt;/td&gt;
      &lt;td&gt;husainpreet kour#&lt;/td&gt;
      &lt;td&gt;hugh a holt#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hulk hogan&lt;/td&gt;
      &lt;td&gt;hulk&lt;/td&gt;
      &lt;td&gt;hulk  iv herrit#&lt;/td&gt;
      &lt;td&gt;hulk leath#&lt;/td&gt;
      &lt;td&gt;hulk rodriguez.gonzale#&lt;/td&gt;
      &lt;td&gt;hulk mohd#&lt;/td&gt;
      &lt;td&gt;hulk  kumari#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;hulk hogan&lt;/td&gt;
      &lt;td&gt;hulk ho&lt;/td&gt;
      &lt;td&gt;hulk hornes#&lt;/td&gt;
      &lt;td&gt;hulk howstie#&lt;/td&gt;
      &lt;td&gt;hulk hoelles.maldonado#&lt;/td&gt;
      &lt;td&gt;hulk holoo chand singh#&lt;/td&gt;
      &lt;td&gt;hulk holu#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;picachu&lt;/td&gt;
      &lt;td&gt;p&lt;/td&gt;
      &lt;td&gt;patrick l lee#&lt;/td&gt;
      &lt;td&gt;paul a branch#&lt;/td&gt;
      &lt;td&gt;pedro j salazarlopez#&lt;/td&gt;
      &lt;td&gt;parveen kaur . simi#&lt;/td&gt;
      &lt;td&gt;paul j boyd#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;picachu&lt;/td&gt;
      &lt;td&gt;pic&lt;/td&gt;
      &lt;td&gt;picky edmores#&lt;/td&gt;
      &lt;td&gt;pick d sansom#&lt;/td&gt;
      &lt;td&gt;picer g melendez#&lt;/td&gt;
      &lt;td&gt;picay . sonu#&lt;/td&gt;
      &lt;td&gt;pick d moran#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;picachu&lt;/td&gt;
      &lt;td&gt;picac&lt;/td&gt;
      &lt;td&gt;picace lewis#&lt;/td&gt;
      &lt;td&gt;picacco turdlere u alvie#&lt;/td&gt;
      &lt;td&gt;picacio burgos#&lt;/td&gt;
      &lt;td&gt;picachiram akwal#&lt;/td&gt;
      &lt;td&gt;picace a james#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mewtwo&lt;/td&gt;
      &lt;td&gt;m&lt;/td&gt;
      &lt;td&gt;marcus d cook#&lt;/td&gt;
      &lt;td&gt;michael a mckinney#&lt;/td&gt;
      &lt;td&gt;mario a martinez#&lt;/td&gt;
      &lt;td&gt;manish kumar  mandal#&lt;/td&gt;
      &lt;td&gt;mark a barnes#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mewtwo&lt;/td&gt;
      &lt;td&gt;mew&lt;/td&gt;
      &lt;td&gt;mewille m rodriguez#&lt;/td&gt;
      &lt;td&gt;mewald j donnynons#&lt;/td&gt;
      &lt;td&gt;mewel achevedo#&lt;/td&gt;
      &lt;td&gt;mewa aggarwal#&lt;/td&gt;
      &lt;td&gt;mewasui . sanjali#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;mewtwo&lt;/td&gt;
      &lt;td&gt;mewt&lt;/td&gt;
      &lt;td&gt;mewtis f jr baken#&lt;/td&gt;
      &lt;td&gt;mewthed shewnert#&lt;/td&gt;
      &lt;td&gt;mewtel diaz#&lt;/td&gt;
      &lt;td&gt;mewta#&lt;/td&gt;
      &lt;td&gt;mewtilles dison#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jigglypuff&lt;/td&gt;
      &lt;td&gt;ji&lt;/td&gt;
      &lt;td&gt;jimmy l jr floyd#&lt;/td&gt;
      &lt;td&gt;jimmy d booher#&lt;/td&gt;
      &lt;td&gt;jimmy r torres#&lt;/td&gt;
      &lt;td&gt;jitender sehrawat#&lt;/td&gt;
      &lt;td&gt;jimmy l barnett#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jigglypuff&lt;/td&gt;
      &lt;td&gt;jiggl&lt;/td&gt;
      &lt;td&gt;jiggl jordan#&lt;/td&gt;
      &lt;td&gt;jigglan a layhorn#&lt;/td&gt;
      &lt;td&gt;jigglan r alvaro#&lt;/td&gt;
      &lt;td&gt;jiggla#&lt;/td&gt;
      &lt;td&gt;jiggl r porter#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jigglypuff&lt;/td&gt;
      &lt;td&gt;jigglyp&lt;/td&gt;
      &lt;td&gt;jigglyphal#&lt;/td&gt;
      &lt;td&gt;jigglyp t cedeno#&lt;/td&gt;
      &lt;td&gt;jigglype vernarda#&lt;/td&gt;
      &lt;td&gt;jigglypri . papa#&lt;/td&gt;
      &lt;td&gt;jigglypu g martin#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;bulbasaur&lt;/td&gt;
      &lt;td&gt;bu&lt;/td&gt;
      &lt;td&gt;burnell mckenney#&lt;/td&gt;
      &lt;td&gt;buddy l cook#&lt;/td&gt;
      &lt;td&gt;bulfrano garcia#&lt;/td&gt;
      &lt;td&gt;budh prakash#&lt;/td&gt;
      &lt;td&gt;buddy l mccray#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;bulbasaur&lt;/td&gt;
      &lt;td&gt;bulb&lt;/td&gt;
      &lt;td&gt;bulber l hargus#&lt;/td&gt;
      &lt;td&gt;bulbernon k manond#&lt;/td&gt;
      &lt;td&gt;bulbim m rolon#&lt;/td&gt;
      &lt;td&gt;bulbul#&lt;/td&gt;
      &lt;td&gt;bulbul#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;bulbasaur&lt;/td&gt;
      &lt;td&gt;bulbas&lt;/td&gt;
      &lt;td&gt;bulbas  jr littles#&lt;/td&gt;
      &lt;td&gt;bulbas.san s layvoro#&lt;/td&gt;
      &lt;td&gt;bulbashumald gasialisanosh&lt;/td&gt;
      &lt;td&gt;bulbas#&lt;/td&gt;
      &lt;td&gt;bulbas b rivera#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;charizard&lt;/td&gt;
      &lt;td&gt;ch&lt;/td&gt;
      &lt;td&gt;charles e hall#&lt;/td&gt;
      &lt;td&gt;christopher m brannon#&lt;/td&gt;
      &lt;td&gt;christian rivera#&lt;/td&gt;
      &lt;td&gt;chander shekhar#&lt;/td&gt;
      &lt;td&gt;christopher m mcclendon#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;charizard&lt;/td&gt;
      &lt;td&gt;char&lt;/td&gt;
      &lt;td&gt;charles e hall#&lt;/td&gt;
      &lt;td&gt;charles e jr hartsfield#&lt;/td&gt;
      &lt;td&gt;charles r murillo#&lt;/td&gt;
      &lt;td&gt;charan singh . minchu#&lt;/td&gt;
      &lt;td&gt;charles a brown#&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;charizard&lt;/td&gt;
      &lt;td&gt;chariz&lt;/td&gt;
      &lt;td&gt;charizelo m laster#&lt;/td&gt;
      &lt;td&gt;charizy a berry#&lt;/td&gt;
      &lt;td&gt;chariz l martinez#&lt;/td&gt;
      &lt;td&gt;chariz#&lt;/td&gt;
      &lt;td&gt;chariz c hill#&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
</description>
        <pubDate>Sat, 18 Nov 2017 00:00:00 +0000</pubDate>
        <link>https://madaan.github.io/names/</link>
        <guid isPermaLink="true">https://madaan.github.io/names/</guid>
      </item>
    
      <item>
        <title>Everyone Can Write Bad Code</title>
        <description>&lt;h3 id=&quot;its-simpler-than-you-think&quot;&gt;It’s Simpler than You Think&lt;/h3&gt;

&lt;p&gt;Writing bad code is simpler than you think. Yada-yadas will make you believe that it takes special skills to craft junk code, that only a “chosen few” among us are capable of doing it. They may trick you into thinking that it takes a lot of apathy, the ability to hand wave, the special gut feeling to make bad assumptions, and a keen, sharp focus on the short term (or smaller picture). Sounds like a lot, doesn’t it?&lt;/p&gt;

&lt;p&gt;Well, to tell you the truth, it’s not that hard. For those of you who strive to write poor code, with a special focus on screwing up on &lt;em&gt;big data&lt;/em&gt; jobs, I have compiled a set of action items that will help you in embarking on this journey.&lt;/p&gt;

&lt;p&gt;A word of warning is in order. As you read through the 16 points, you may get demotivated. You may think you’ll never be one of &lt;em&gt;them&lt;/em&gt;. To fight such dark thoughts, you may want to think about the developers you had written off as  &lt;em&gt;good&lt;/em&gt;; haven’t you seen them doing one of these? If they can do it, so can you!&lt;/p&gt;

&lt;h3 id=&quot;0-dont-unit-test&quot;&gt;0. Don’t Unit Test&lt;/h3&gt;

&lt;h3 id=&quot;1-dont-waste-time-thinking-about-names&quot;&gt;1. Don’t Waste Time Thinking About Names&lt;/h3&gt;

&lt;h3 id=&quot;2-documentation-is-useless&quot;&gt;2. Documentation is Useless&lt;/h3&gt;

&lt;h3 id=&quot;3-either-comment-nowhere-or-comment-everywhere&quot;&gt;3. Either Comment Nowhere, or Comment Everywhere&lt;/h3&gt;

&lt;h3 id=&quot;4-if-you-think-it-will-work-it-will-work&quot;&gt;4. If You Think it Will Work, it Will Work&lt;/h3&gt;

&lt;h3 id=&quot;5-early-optimization-is-evil-so-write-stupid-code&quot;&gt;5. Early Optimization is Evil so Write Stupid Code&lt;/h3&gt;

&lt;h3 id=&quot;6-head-start-start-coding-before-you-know-whats-to-be-done&quot;&gt;6. Head start: Start Coding Before You Know What’s to be Done&lt;/h3&gt;

&lt;h3 id=&quot;7-specifications-less-than-100-pages-are-useless&quot;&gt;7. Specifications less than 100 pages are useless&lt;/h3&gt;

&lt;h3 id=&quot;8-the-first-test-should-be-on-a-billion-lines-in-distributed-mode&quot;&gt;8. The First Test should be on a Billion Lines, in Distributed Mode&lt;/h3&gt;

&lt;h3 id=&quot;9-your-work-is-good-enough-to-live-on-master-branch-at-all-times&quot;&gt;9. Your Work is Good Enough to Live on &lt;em&gt;master&lt;/em&gt; branch at All Times&lt;/h3&gt;

&lt;h3 id=&quot;a-no-one-can-review-your-code-once-its-written-ideally-not-even-you&quot;&gt;A. No One Can Review Your Code Once It’s Written, Ideally Not Even You&lt;/h3&gt;

&lt;h3 id=&quot;b-if-the-following-hugedataflatmapreducebykeygroupbykeymap_tostring-throws-a-oom-spark-must-be-a-fad&quot;&gt;B. If the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hugeData.flatMap().reduceByKey().groupByKey().map(_.toString)&lt;/code&gt; throws a OOM, Spark must be a fad.&lt;/h3&gt;

&lt;h3 id=&quot;c-big-o-matters-for-people-in-school-taking-algorithms-courses&quot;&gt;C. Big O Matters for People In School taking Algorithms Courses&lt;/h3&gt;

&lt;h3 id=&quot;d-if-the-job-finished-the-data-has-to-be-right&quot;&gt;D. If the Job Finished, the Data has to be Right&lt;/h3&gt;

&lt;h3 id=&quot;e-if-the-job-crashes-on-a-100-cores-all-you-need-is-200-cores&quot;&gt;E. If the Job Crashes on a 100 Cores, All You Need is 200 Cores&lt;/h3&gt;

&lt;h3 id=&quot;f-dont-waste-time-looking-at-the-data&quot;&gt;F. Don’t Waste Time Looking at the Data&lt;/h3&gt;

&lt;p&gt;thatsIt. now_go_rock!&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Edit:&lt;/em&gt;&lt;/strong&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=16333921&quot;&gt;Hacker News Discussion&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 15 Jun 2017 00:00:00 +0000</pubDate>
        <link>https://madaan.github.io/wbc/</link>
        <guid isPermaLink="true">https://madaan.github.io/wbc/</guid>
      </item>
    
  </channel>
</rss>